Back to blog
Technology12 minFebruary 21, 2026

By Eduardo Cisternas

Case Study: How we orchestrate 7 AI agents with multiple LLMs for legal investigations

Neural network and artificial intelligence

**When a Single AI Model Falls Short**

When we started building Audty's AI assistant, we did what most companies do: we connected a single large language model to our application and asked it to handle everything. It worked… sort of. Responses were generic, too slow, and costs were high.

Today, our platform uses a specialized agent architecture where each agent works as part of a team, using the optimal model and information level for its specific task. The result: more accurate responses, 60% faster and 45% more cost-effective.

This article explains how we achieved this, protecting key technical details while sharing the principles that led to our success.

---

**The Problem: One LLM Can't Do It All**

Consider what goes into a workplace investigation:

Summarizing a 50-page case.
Assessing legal risk according to Chilean regulations.
Planning an investigation with legal deadlines.
Drafting official reports.
Cross-referencing testimonies to evaluate credibility.
Identifying needs for psychological support.
Answering general labor law questions.

When a single model tries to handle everything, each query carries massive context (laws, policies, evidence), leading to:

Wasted tokens.
Vague responses (the model doesn't know what to prioritize).
High latency (up to 30 seconds).
Elevated costs.
Hallucinations (inventing legal articles).

---

**The Solution: An Architecture Inspired by Human Teams**

In a law firm, you don't ask the same lawyer to do everything: there are specialists for summaries, risk assessment, drafting reports… We applied the same logic to AI.

**General Architecture**

The system is organized around an intelligent router that analyzes user intent and directs the query to the most appropriate specialized agent. Each agent has access to a different context level (basic, standard, or full) and uses the AI model that best balances speed, accuracy, and cost for its function.

**Specialized Agents**

While the exact number and names of agents are part of our intellectual property, we can describe their generic roles:

Synthesizer: Summarizes facts and builds a clear case timeline.
Risk Assessor: Analyzes the severity of the situation and provides a substantiated evaluation.
Investigation Planner: Designs an investigation plan that complies with legal deadlines.
Credibility Analyst: Cross-references testimonies and evidence to help assess coherence.
Psychological Advisor: Identifies the need for early support for involved individuals.
Legal Analyst: Performs rigorous legal analysis, citing specific articles.
Report Drafter: Generates draft official documents with the required structure.

Each agent receives only the information it needs: basic case data, or this plus relevant legislation, or the full context (including internal policies and case law). This avoids query overload and reduces costs.

**Why We Use More Than One AI Model**

No single model is perfect for everything. Some are extremely fast and cheap; others are slower but have advanced reasoning capabilities. Our architecture combines the best of several leading providers:

An ultra-fast, lightweight model for initial routing.
General-purpose models for tasks requiring speed (summaries, general chat).
Models with advanced reasoning capabilities for deep legal analysis and report drafting.

This combination lets us pay only for the power we truly need at each step.

**Intelligent Orchestration: The Conductor**

The system's core is a manager that decides in real-time which model should respond for each agent, based on the organization's configuration. If a provider fails (due to rate limits or errors), the manager automatically switches to a backup model without the user noticing. It also caches each client's preferences for greater efficiency.

**Tailored Context Levels**

Not all agents need access to the full legal database. That's why we created three context tiers:

Basic: Only case data. Ideal for summaries and drafting.
Standard: Adds legal deadlines and a selection of articles related to the subject matter.
Full: Includes full legal database search, internal policies, and case law.

This approach drastically reduces cost and latency because each query sends just the right amount of information.

**Smart Retrieval with Fallback**

To give agents access to current regulations, we use a retrieval-augmented generation (RAG) system that locates the most relevant articles for each query. If the primary engine fails, a secondary system based on traditional methods takes over, ensuring we never lack legal context.

**Legal Safeguards First**

Every response generated by our agents is subject to inviolable rules (guardrails) that ensure due process:

Jurisdiction and competence are always verified.
Conditional language preserves the presumption of innocence.
Both inculpatory and exculpatory elements are analyzed.
Events are classified according to Chilean legal taxonomy.

These rules prevent the AI from asserting guilt, ignoring the respondent's perspective, or inappropriately using terms like "victim" or "aggressor" categorically.

---

**Measurable Results**

Migrating from a single model to a multi-agent architecture has delivered:

Cost reduction: 45% to 68% less per query.
Lower latency: From 12 seconds average to just 3.5 seconds.
Routing accuracy: 94% of queries reach the correct agent.
Legal coverage: 100% of responses include verifiable article citations.
Availability: 99.5% thanks to automatic backup systems.
Token optimization: 87% less irrelevant context.

---

**Lessons Learned**

1.Not everything needs the most expensive model.
1.The right context matters more than the model.
1.Fallback systems are essential.
1.Legal safeguards are critical infrastructure.
1.Flexibility is key.

---

**Conclusion**

Using a single AI model for everything is like asking a generalist lawyer to handle a complex criminal case: they can do it, but it won't be optimal. Multi-agent orchestration with multiple AI providers enables:

Specialization: each agent masters its domain.
Efficiency: only necessary context is consumed.
Resilience: if one provider fails, another takes over.
Flexibility: each client adjusts their ideal combination.
Cost-effectiveness: you pay only for what you use.

At Audty, this architecture allows us to offer professional-grade AI assistance for workplace investigations in Chile, complying with due process and at an accessible cost for companies of all sizes.

---

*Want to see this architecture in action? Request a demo at *[*audty.cl*](https://audty.cl/)* and discover how specialized AI can transform your workplace investigation management.*

Need help with compliance?

Audty automates the entire legal compliance process for your company.

Request Free Demo