input-guardrails.mdx 9.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288
  1. ---
  2. title: "Input Guardrails"
  3. description: "Validate incoming messages before they reach the agent."
  4. icon: "arrow-right-to-bracket"
  5. ---
  6. Input guardrails validate incoming messages **before** they reach the agent. They screen both user input and inter-agent communication.
  7. ## Simplified Input Processing
  8. Agency Swarm automatically extracts text content from messages, so your guardrails receive clean text instead of full message objects.
  9. ## Function Signature
  10. Each input guardrail receives three parameters:
  11. ```python
  12. from agency_swarm import Agent, GuardrailFunctionOutput, RunContextWrapper, input_guardrail
  13. @input_guardrail
  14. async def my_input_guardrail(
  15. context: RunContextWrapper,
  16. agent: Agent,
  17. user_input: str | list[str],
  18. ) -> GuardrailFunctionOutput:
  19. """Validate user input."""
  20. return GuardrailFunctionOutput(output_info="", tripwire_triggered=False)
  21. ```
  22. **Parameters:**
  23. - `context`: Run context wrapper with access to shared state.
  24. - `agent`: The Agent instance receiving the input.
  25. - `user_input`: Extracted text content.
  26. - Single message: a string containing the message content.
  27. - Multiple consecutive messages: a list of strings, one per message.
  28. **Return:**
  29. - `GuardrailFunctionOutput` with:
  30. - `tripwire_triggered` (bool): `True` if validation failed.
  31. - `output_info` (str): Guidance message returned to the caller.
  32. <Note>
  33. File and image inputs inside messages are not passed to input guardrails.
  34. </Note>
  35. ## Input Types
  36. When a user sends multiple messages:
  37. ```json
  38. [
  39. {"role": "user", "content": "Hi"},
  40. {"role": "user", "content": "How are you?"}
  41. ]
  42. ```
  43. Your guardrail receives:
  44. ```python
  45. ["Hi", "How are you?"]
  46. ```
  47. This allows you to process each new input message individually or validate them as a group.
  48. ## Basic Input Guardrail
  49. ```python
  50. from agency_swarm import Agent, GuardrailFunctionOutput, RunContextWrapper, input_guardrail
  51. @input_guardrail
  52. async def require_task_prefix(
  53. context: RunContextWrapper, agent: Agent, user_input: str | list[str]
  54. ) -> GuardrailFunctionOutput:
  55. text = user_input if isinstance(user_input, str) else " ".join(user_input)
  56. blocked = not text.startswith("Request:")
  57. return GuardrailFunctionOutput(
  58. output_info="Prefix your request with 'Request:' describing what you need." if blocked else "",
  59. tripwire_triggered=blocked,
  60. )
  61. agent = Agent(
  62. name="CustomerSupportAgent",
  63. instructions="You are a helpful customer support agent.",
  64. input_guardrails=[require_task_prefix],
  65. )
  66. ```
  67. ## Practical Example: Filtering Off-Topic Questions
  68. Use input guardrails to keep agents focused on their domain. This example delegates relevance decisions to an evaluator agent:
  69. ```python
  70. from agency_swarm import (
  71. Agent,
  72. GuardrailFunctionOutput,
  73. ModelSettings,
  74. Reasoning,
  75. RunContextWrapper,
  76. input_guardrail,
  77. )
  78. from pydantic import BaseModel
  79. class RelevanceDecision(BaseModel):
  80. is_relevant: bool
  81. reason: str
  82. guardrail_agent = Agent(
  83. name="GuardrailAgent",
  84. instructions=(
  85. "You screen incoming messages for a customer-support assistant. "
  86. "Treat questions about account access, billing, and troubleshooting as relevant. "
  87. "Flag any unrelated requests as irrelevant."
  88. ),
  89. model="gpt-5.4-mini",
  90. model_settings=ModelSettings(reasoning=Reasoning(effort="low")),
  91. output_type=RelevanceDecision,
  92. )
  93. @input_guardrail
  94. async def require_support_topic(
  95. context: RunContextWrapper, agent: Agent, user_input: str | list[str]
  96. ) -> GuardrailFunctionOutput:
  97. candidate = user_input if isinstance(user_input, str) else "\n".join(user_input)
  98. guardrail_result = await guardrail_agent.get_response(candidate, context=context.context)
  99. decision = RelevanceDecision.model_validate(guardrail_result.final_output)
  100. if not decision.is_relevant:
  101. return GuardrailFunctionOutput(
  102. output_info="Only support questions are allowed. Ask about billing, account access, or troubleshooting.",
  103. tripwire_triggered=True,
  104. )
  105. return GuardrailFunctionOutput(output_info="", tripwire_triggered=False)
  106. support_agent = Agent(
  107. name="CustomerSupportAgent",
  108. instructions="You help customers resolve account, billing, and troubleshooting issues.",
  109. model="gpt-5.4-mini",
  110. input_guardrails=[require_support_topic],
  111. raise_input_guardrail_error=False, # Non-strict mode: guidance returned as assistant message
  112. )
  113. ```
  114. See the full example at [`examples/guardrails_input.py`](https://github.com/VRSEN/agency-swarm/blob/main/examples/guardrails_input.py).
  115. ## Non-strict vs Strict Mode
  116. Input guardrails support two modes that control how guidance is delivered. Use `raise_input_guardrail_error` to control this behavior.
  117. ### Non-strict Mode (Default)
  118. **Setting:** `raise_input_guardrail_error=False`
  119. In non-strict mode, guardrail guidance flows naturally as assistant output:
  120. - Guidance is returned as `final_output` (non-streaming) or `message_output_created` event (streaming).
  121. - No exception is raised.
  122. - Guidance persists as an assistant message with `message_origin="input_guardrail_message"`.
  123. ### Strict Mode
  124. **Setting:** `raise_input_guardrail_error=True`
  125. In strict mode, guardrail failures abort the turn immediately:
  126. - `InputGuardrailTripwireTriggered` is raised.
  127. - Guidance persists as a system message with `message_origin="input_guardrail_error"`.
  128. - The turn is aborted before the agent processes input.
  129. - The caller must handle the exception.
  130. <Accordion title="Strict mode usage example">
  131. ```python
  132. from agency_swarm import Agent, InputGuardrailTripwireTriggered
  133. agent = Agent(
  134. name="CustomerSupportAgent",
  135. instructions="You are a helpful customer support agent.",
  136. input_guardrails=[require_task_prefix],
  137. raise_input_guardrail_error=True,
  138. )
  139. try:
  140. response = await agency.get_response("Hello!")
  141. except InputGuardrailTripwireTriggered as exc:
  142. print(f"Validation failed: {exc.guardrail_result.output_info}")
  143. ```
  144. </Accordion>
  145. ### Comparison Table
  146. | Mode | `raise_input_guardrail_error` | Caller sees | Persisted entry | Role | Use case |
  147. |------|-------------------------------|-------------|-----------------|------|----------|
  148. | **Non-strict** | `False` (default) | Guardrail text as `final_output` or streaming event | Assistant message (`input_guardrail_message`) | `assistant` | Conversational flows, helpful guidance |
  149. | **Strict** | `True` | `InputGuardrailTripwireTriggered` exception | System message (`input_guardrail_error`) | `system` | Hard requirements, compliance, security |
  150. <Accordion title="Should I use non-strict or strict mode?">
  151. **Use non-strict mode when:**
  152. - You want a conversational user experience.
  153. - Agents are communicating with each other internally.
  154. - Guardrail feedback is helpful guidance, not a hard block.
  155. - You do not want to write exception handling code.
  156. **Use strict mode when:**
  157. - You are enforcing non-negotiable requirements.
  158. - Security or compliance rules must block processing.
  159. - You want explicit control over error handling.
  160. - The caller should know immediately that validation failed.
  161. </Accordion>
  162. <Accordion title="Streaming behavior example">
  163. ```text
  164. RunItemStreamEvent(
  165. name='message_output_created',
  166. item=MessageOutputItem(
  167. raw_item=ResponseOutputMessage(
  168. id='msg_input_guardrail_guidance',
  169. content=[ResponseOutputText(text="Prefix your request...")],
  170. role='assistant',
  171. status='completed'
  172. )
  173. )
  174. )
  175. ```
  176. </Accordion>
  177. ## Guardrails in Message History
  178. Each guardrail trigger is recorded in chat history with a guidance entry. Every entry carries `message_origin` to identify which guardrail fired.
  179. For most use cases, `role`, `content`, and `message_origin` are enough. Additional metadata is mainly for tracing multi-agent runs.
  180. ### Message Origin Values
  181. - `input_guardrail_message`: Input guardrail in non-strict mode.
  182. - `input_guardrail_error`: Input guardrail in strict mode.
  183. - `output_guardrail_error`: Output guardrail failure (always a system message).
  184. ### Persistence Behavior
  185. | Mode | `raise_input_guardrail_error` | Streaming Event | Persisted Entry |
  186. |------|-------------------------------|-----------------|-----------------|
  187. | **Non-strict** | `False` (default) | `message_output_created` with guidance text | Assistant message, `message_origin="input_guardrail_message"` |
  188. | **Strict** | `True` | `{"type": "error", "content": guidance}` | System message, `message_origin="input_guardrail_error"` |
  189. <Note>
  190. `validation_attempts` does not apply to input guardrails. Input guardrails trigger immediately on validation failure.
  191. </Note>
  192. ### Message History After Guardrails Trip
  193. When an input guardrail trips, agent-to-agent request messages remain in history alongside guardrail guidance. This preserves context so calling agents can adjust their approach.
  194. Output guardrail messages also persist in history to guide retry attempts.
  195. <Accordion title="Example message history entries (illustrative)">
  196. ```json
  197. [
  198. {
  199. "role": "assistant",
  200. "content": "Please, prefix your request with 'Support:' describing what you need.",
  201. "message_origin": "input_guardrail_message",
  202. "agent": "CustomerSupportAgent"
  203. },
  204. {
  205. "role": "assistant",
  206. "content": "When chatting with this agent, provide your name first.",
  207. "message_origin": "input_guardrail_message",
  208. "agent": "DatabaseAgent",
  209. "callerAgent": "CustomerSupportAgent"
  210. },
  211. {
  212. "role": "system",
  213. "content": "Do not include email addresses in your response.",
  214. "message_origin": "output_guardrail_error",
  215. "agent": "DatabaseAgent",
  216. "callerAgent": "CustomerSupportAgent"
  217. }
  218. ]
  219. ```
  220. </Accordion>
  221. ## Internal Agent Communication
  222. For many agent-to-agent flows, non-strict mode (`raise_input_guardrail_error=False`) is easier to work with because guidance is returned inline instead of raising exceptions mid-chain.
  223. <Warning>
  224. Due to the nature of handoffs, using `Handoff` for agent-to-agent communication can bypass input guardrails between agents.
  225. </Warning>