---
title: "Deployment to Production"
description: "Step-by-step guide for deploying your agency in a production environment."
icon: "rocket-launch"
sidebarTitle: "Deploy"
---

**Recommended:** Use the [Starter Template](/welcome/getting-started/starter-template) for production. It ships with FastAPI endpoints, auth, and a clean project layout.

## Required Environment Variables

Before deploying, ensure these are set in your production environment:

| Variable | Required | Description |
|----------|----------|-------------|
| `OPENAI_API_KEY` | Yes | Your OpenAI API key |
| `APP_TOKEN` | Recommended | Authentication token for FastAPI endpoints |

<Note>
Thread persistence uses callbacks you define to store threads in any database you choose.
</Note>

<Note>
  This guide assumes you have already created an agency. If you haven't, check out the [Getting Started](/welcome/installation) guide.
</Note>

<Warning>
  Before deploying, ensure you have thoroughly tested all tools and agents. Run the test cases in each tool file and verify the agency works end-to-end using demo methods.
</Warning>

## Deployment Process

<Steps>

<Step title="Step 1: Persist Conversation Threads" icon="message-dots">

By default, every time you create a new `Agency()`, it starts a fresh conversation thread. In production, you usually need to resume prior conversations or handle multiple users.

<Info>
Persist the full conversation history for each chat, including user-facing turns and agent-to-agent handoffs.
</Info>

Chat persistence is handled through callback functions passed to the Agency constructor:

```python
from agents import TResponseInputItem
from agency_swarm import Agency


def save_threads(messages: list[TResponseInputItem], chat_id: str) -> None:
    save_threads_to_db(chat_id, messages)

def load_threads(chat_id: str) -> list[TResponseInputItem]:
    return load_threads_from_db(chat_id)

agency = Agency(
    agent1,
    agent2,
    communication_flows=[(agent1, agent2)],
    load_threads_callback=lambda: load_threads(chat_id),
    save_threads_callback=lambda messages: save_threads(messages, chat_id),
)
```

<Warning>
If you switch model providers for an existing saved chat, old tool/event items may no longer replay correctly. Start a new chat, or keep only `{role, content}` messages.
</Warning>

</Step>

<Step title="Step 2: Configure FastAPI Endpoints" icon="diagram-project">

Use FastAPI in one of two ways:

- Single agency: call `agency.run_fastapi(...)` from an `Agency` instance.
- Multiple agencies and/or standalone tools: use top-level `run_fastapi(agencies=..., tools=[...])`.

<Info>
There can be multiple agencies in one server, and each agency key becomes its own endpoint prefix.
</Info>

```python
from agency_swarm import Agency, Agent, function_tool, run_fastapi

@function_tool
def health_check() -> str:
    return "ok"

def create_support_agency(load_threads_callback=None):
    support = Agent(name="Support", instructions="You are a support agent.")
    return Agency(
        support,
        name="support",
        load_threads_callback=load_threads_callback,
    )

def create_sales_agency(load_threads_callback=None):
    sales = Agent(name="Sales", instructions="You are a sales agent.")
    return Agency(
        sales,
        name="sales",
        load_threads_callback=load_threads_callback,
    )

run_fastapi(
    agencies={
        "support": create_support_agency,
        "sales": create_sales_agency,
    },
    tools=[health_check],
    app_token_env="APP_TOKEN",
    cors_origins=["https://your-app.example"],
)
```

<Note>
`run_fastapi(agencies=...)` injects `load_threads_callback` per request (for `chat_history`) and does not inject `save_threads_callback`.
If you need server-side persistence writes, wire that explicitly in your application flow.
</Note>

This creates separate agency endpoints plus tool endpoints, for example:

- `/support/get_response` and `/support/get_response_stream`
- `/sales/get_response` and `/sales/get_response_stream`
- `/tool/health_check`

FastAPI details:

- [Setting Up FastAPI Endpoints](/additional-features/fastapi-integration#setting-up-fastapi-endpoints)
- [Authentication](/additional-features/fastapi-integration#authentication)
- [Implementation reference (multiple agencies and tools)](/additional-features/fastapi-integration#implementation-reference)
- [API Usage Example](/additional-features/fastapi-integration#api-usage-example)

If you need tools hosted separately from your agency service, expose tools as APIs and connect them with [OpenAPI schemas](/core-framework/tools/openapi-schemas), or use [MCP Integration](/core-framework/tools/mcp-integration).

</Step>

<Step title="Step 3: Deploy the Service" icon="rocket-launch">

Use the [Starter Template](/welcome/getting-started/starter-template) as your production base. It already includes FastAPI wiring and deployment defaults.

- Create a repo from the template
- Set `OPENAI_API_KEY` and `APP_TOKEN`
- Follow the template README to deploy

If you are wiring your own server, see [FastAPI Integration](/additional-features/fastapi-integration) for endpoint and parameter details (`host`, `port`, `app_token_env`, `cors_origins`, `enable_agui`).

</Step>
</Steps>