Architecture

How to deploy AI agents in business: the operational guide

The technical and organizational steps to go from an agent idea to a system in production in a company. Architecture choices, tools, recurring pitfalls.

Deploying an AI agent in business is not deploying a chatbot. It's building an autonomous system that executes tasks end-to-end, with memory, tools, and guardrails. This guide describes the method in 5 steps, tested across 95+ agents in production.

Step 1 - Define the functional scope

Before writing a single line of code, answer 5 questions:

  • What task exactly should the agent execute? Not 'help the sales team', but 'send a 6-email sequence over 80 days to a qualified prospect'

  • What's the trigger? An event, a schedule, a human request?

  • What data sources are required? CRM, emails, document base, external APIs?

  • What action tools? Send an email, create a task, update a record?

  • What are the cases where the human takes back control? Define guardrails explicitly

A poorly scoped agent fails. The time spent on this step (typically 2 to 4 hours per agent) saves weeks of later debugging.

Step 2 - Build contextual memory

An agent needs three types of memory:

Organizational memory (static)

The stable context of your company: offerings, personas, tone, business rules. Typically stored in structured .md files, indexed for semantic search (RAG).

Session memory (dynamic)

The current state of the task: which prospects already contacted, what was the last message sent, what's the next scheduled action. Typically stored in the CRM (Hubspot, Salesforce) or a relational database.

Lessons memory (self-improving)

What the agent has learned from previous runs: which message patterns work, which errors not to repeat, which edge cases need a human. Stored in a .md file the agent re-reads at the start of each run. This is the layer that turns a static agent into a system that improves.

Step 3 - Choose the technical architecture

Three structural choices to make:

The base LLM

Claude Sonnet 4.5 (Anthropic) is recommended for most cases in 2026: best performance/cost ratio, 200K-token context window, very good on structured tasks. GPT-4.1 (OpenAI) is a solid alternative. For simple high-volume tasks, Claude Haiku or GPT-4.1 mini are sufficient at 10x lower cost.

The orchestrator

The tool that runs the agent, manages tools, and loops the steps. Three options: (1) Claude Agent SDK or OpenAI Agents SDK for custom code, (2) n8n for visual workflows, (3) specialized platforms like Relevance AI or Lindy.ai for low-code. The choice depends on complexity and your internal skills.

The MCP protocol

MCP (Model Context Protocol), launched late 2024 by Anthropic and become standard in 2025-2026, lets you connect an agent to external tools (Gmail, Hubspot, Slack, Drive) natively, without going through Zapier or Make. Significant gain in robustness and speed. All Albus Factory agents use MCP natively.

Step 4 - Develop and iterate

An agent isn't 'ready' at the end of the first development. It's ready after 2 to 4 weeks of iteration in real conditions. Best practices:

A human in the loop at the start

For the first 2 weeks, the agent must not act alone. It produces drafts a human validates before sending. This catches early errors without consequences. Then progressively, supervision is reduced as trust is established.

Collect lessons systematically

After each run, the agent writes its lessons into a .md file: what worked, what case wasn't anticipated, what error was made. This file is re-read on the next run. In 4 months at Albus, this mechanism prevented hundreds of error repetitions.

Monitor in production

KPI dashboards (volume processed, success rate, latency, API cost), anomaly alerts (error rate up, abnormal latency, volume drop), weekly run review. An agent silently drifting costs more than an agent that visibly crashes.

Step 5 - Industrialize and scale

Once the agent is stabilized (typically at M+2), three industrialization axes:

Progressive autonomy

Reduce human supervision on standard cases, maintain it on high-stakes cases. Document escalation rules precisely.

Multiplying agents

Once the first agent of a function is in production, deploy complementary agents. Example: after the SDR Orchestrator, deploy the Conversation Handler, then the Account Manager, then Task Audit. Each new agent consumes the existing memory and integrates with the system.

Transfer to internal teams

From 5-10 agents in production, train an internal team to operate: track KPI, debug simple incidents, propose minor evolutions. The external expert remains for architecture and new strategic agents.

The 5 most common pitfalls

  • Wanting a 100% autonomous agent from day 1 - recipe for disaster

  • Skipping memory construction - agents become generic

  • Using public AI models without data processing agreement - GDPR risk

  • Stacking tools (Make + Zapier + n8n + Python) - unmanageable technical debt

  • Not documenting - when the person who built it leaves, the whole system becomes fragile

The recommended 2026 stack

  • LLM: Claude Sonnet 4.5 (main), Claude Haiku (high-frequency simple tasks)

  • Orchestrator: Claude Agent SDK for robustness, n8n for simple workflows

  • Integration protocol: MCP native everywhere

  • CRM and state: Hubspot (B2B standard), Salesforce (if already in place)

  • Data enrichment: Apollo (B2B), Clearbit (enterprise)

  • Communication: Slack for alerting, Gmail or Outlook via MCP for emailing

  • Monitoring: custom dashboards + Sentry for errors

A well-deployed AI agent lasts 3 to 5 years in production with minor maintenance. A poorly deployed agent lasts 3 to 5 weeks before being abandoned. The difference is in the method, not in the technology.