Case Study: How we orchestrate 7 AI agents with multiple LLMs for legal investigations

When a Single AI Model Falls Short

When we started building Audty's AI assistant, we did what most companies do: we connected a single large language model to our application and asked it to handle everything. It worked… sort of. Responses were generic, too slow, and costs were high.

Today, our platform uses a specialized agent architecture where each agent works as part of a team, using the optimal model and information level for its specific task. The result: more accurate responses, 60% faster and 45% more cost-effective.

This article explains how we achieved this, protecting key technical details while sharing the principles that led to our success.

---

The Problem: One LLM Can't Do It All

Consider what goes into a workplace investigation:

•Summarizing a 50-page case.

•Assessing legal risk according to Chilean regulations.

•Planning an investigation with legal deadlines.

•Drafting official reports.

•Cross-referencing testimonies to evaluate credibility.

•Identifying needs for psychological support.

•Answering general labor law questions.

When a single model tries to handle everything, each query carries massive context (laws, policies, evidence), leading to:

•Wasted tokens.

•Vague responses (the model doesn't know what to prioritize).

•High latency (up to 30 seconds).

•Elevated costs.

•Hallucinations (inventing legal articles).

---

The Solution: An Architecture Inspired by Human Teams

In a law firm, you don't ask the same lawyer to do everything: there are specialists for summaries, risk assessment, drafting reports… We applied the same logic to AI.

General Architecture

The system is organized around an intelligent router that analyzes user intent and directs the query to the most appropriate specialized agent. Each agent has access to a different context level (basic, standard, or full) and uses the AI model that best balances speed, accuracy, and cost for its function.

Specialized Agents

While the exact number and names of agents are part of our intellectual property, we can describe their generic roles:

•Synthesizer: Summarizes facts and builds a clear case timeline.

•Risk Assessor: Analyzes the severity of the situation and provides a substantiated evaluation.

•Investigation Planner: Designs an investigation plan that complies with legal deadlines.

•Credibility Analyst: Cross-references testimonies and evidence to help assess coherence.

•Psychological Advisor: Identifies the need for early support for involved individuals.

•Legal Analyst: Performs rigorous legal analysis, citing specific articles.

•Report Drafter: Generates draft official documents with the required structure.

Each agent receives only the information it needs: basic case data, or this plus relevant legislation, or the full context (including internal policies and case law). This avoids query overload and reduces costs.

Why We Use More Than One AI Model

No single model is perfect for everything. Some are extremely fast and cheap; others are slower but have advanced reasoning capabilities. Our architecture combines the best of several leading providers:

•An ultra-fast, lightweight model for initial routing.

•General-purpose models for tasks requiring speed (summaries, general chat).

•Models with advanced reasoning capabilities for deep legal analysis and report drafting.

This combination lets us pay only for the power we truly need at each step.

Intelligent Orchestration: The Conductor

The system's core is a manager that decides in real-time which model should respond for each agent, based on the organization's configuration. If a provider fails (due to rate limits or errors), the manager automatically switches to a backup model without the user noticing. It also caches each client's preferences for greater efficiency.

Tailored Context Levels

Not all agents need access to the full legal database. That's why we created three context tiers:

•Basic: Only case data. Ideal for summaries and drafting.

•Standard: Adds legal deadlines and a selection of articles related to the subject matter.

•Full: Includes full legal database search, internal policies, and case law.

This approach drastically reduces cost and latency because each query sends just the right amount of information.

Smart Retrieval with Fallback

To give agents access to current regulations, we use a retrieval-augmented generation (RAG) system that locates the most relevant articles for each query. If the primary engine fails, a secondary system based on traditional methods takes over, ensuring we never lack legal context.

Legal Safeguards First

Every response generated by our agents is subject to inviolable rules (guardrails) that ensure due process:

•Jurisdiction and competence are always verified.

•Conditional language preserves the presumption of innocence.

•Both inculpatory and exculpatory elements are analyzed.

•Events are classified according to Chilean legal taxonomy.

These rules prevent the AI from asserting guilt, ignoring the respondent's perspective, or inappropriately using terms like "victim" or "aggressor" categorically.

---

Measurable Results

Migrating from a single model to a multi-agent architecture has delivered:

•Cost reduction: 45% to 68% less per query.

•Lower latency: From 12 seconds average to just 3.5 seconds.

•Routing accuracy: 94% of queries reach the correct agent.

•Legal coverage: 100% of responses include verifiable article citations.

•Availability: 99.5% thanks to automatic backup systems.

•Token optimization: 87% less irrelevant context.

---

Lessons Learned

1.Not everything needs the most expensive model.

1.The right context matters more than the model.

1.Fallback systems are essential.

1.Legal safeguards are critical infrastructure.

1.Flexibility is key.

---

Conclusion

Using a single AI model for everything is like asking a generalist lawyer to handle a complex criminal case: they can do it, but it won't be optimal. Multi-agent orchestration with multiple AI providers enables:

•Specialization: each agent masters its domain.

•Efficiency: only necessary context is consumed.

•Resilience: if one provider fails, another takes over.

•Flexibility: each client adjusts their ideal combination.

•Cost-effectiveness: you pay only for what you use.

At Audty, this architecture allows us to offer professional-grade AI assistance for workplace investigations in Chile, complying with due process and at an accessible cost for companies of all sizes.

---

*Want to see this architecture in action? Request a demo at *[*audty.cl*](https://audty.cl/)* and discover how specialized AI can transform your workplace investigation management.*

Case Study: How we orchestrate 7 AI agents with multiple LLMs for legal investigations

When a Single AI Model Falls Short

The Problem: One LLM Can't Do It All

The Solution: An Architecture Inspired by Human Teams

General Architecture

Specialized Agents

Why We Use More Than One AI Model

Intelligent Orchestration: The Conductor

Tailored Context Levels

Smart Retrieval with Fallback

Legal Safeguards First

Measurable Results

Lessons Learned

Conclusion

Need help with compliance?

Related articles

What is Ley Karin and how does it affect your company?

Mandatory Reporting Channel: Which companies must have one in 2026?

Case Study: How we orchestrate 7 AI agents with multiple LLMs for legal investigations

**When a Single AI Model Falls Short**

**The Problem: One LLM Can't Do It All**

**The Solution: An Architecture Inspired by Human Teams**

**General Architecture**

**Specialized Agents**

**Why We Use More Than One AI Model**

**Intelligent Orchestration: The Conductor**

**Tailored Context Levels**

**Smart Retrieval with Fallback**

**Legal Safeguards First**

**Measurable Results**

**Lessons Learned**

**Conclusion**

Need help with compliance?

Related articles

What is Ley Karin and how does it affect your company?

Mandatory Reporting Channel: Which companies must have one in 2026?

When a Single AI Model Falls Short

The Problem: One LLM Can't Do It All

The Solution: An Architecture Inspired by Human Teams

General Architecture

Specialized Agents

Why We Use More Than One AI Model

Intelligent Orchestration: The Conductor

Tailored Context Levels

Smart Retrieval with Fallback

Legal Safeguards First

Measurable Results

Lessons Learned

Conclusion