output-guardrails.mdx 8.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240
  1. ---
  2. title: "Output Guardrails"
  3. description: "Validate agent responses before they reach users."
  4. icon: "arrow-right-from-bracket"
  5. ---
  6. Output guardrails validate agent responses **before** they reach users or other agents. When a guardrail trips, the agent receives feedback and retries.
  7. ## Function Signature
  8. Each output guardrail receives three parameters:
  9. ```python
  10. from agency_swarm import Agent, GuardrailFunctionOutput, RunContextWrapper, output_guardrail
  11. from pydantic import BaseModel
  12. @output_guardrail
  13. async def my_output_guardrail(
  14. context: RunContextWrapper,
  15. agent: Agent,
  16. response_text: str | BaseModel,
  17. ) -> GuardrailFunctionOutput:
  18. """Validate agent output."""
  19. return GuardrailFunctionOutput(output_info="", tripwire_triggered=False)
  20. ```
  21. **Parameters:**
  22. - `context`: Run context wrapper with access to shared state.
  23. - `agent`: The Agent instance generating the response.
  24. - `response_text`: The agent response as a string, or a structured model when `output_type` is set.
  25. **Return:**
  26. - `GuardrailFunctionOutput` with:
  27. - `tripwire_triggered` (bool): `True` if validation failed.
  28. - `output_info` (str): Feedback message sent to the agent when `tripwire_triggered=True`.
  29. ## Basic Output Guardrail
  30. ```python
  31. from agency_swarm import Agent, GuardrailFunctionOutput, RunContextWrapper, output_guardrail
  32. @output_guardrail
  33. async def response_content_guardrail(
  34. context: RunContextWrapper, agent: Agent, response_text: str
  35. ) -> GuardrailFunctionOutput:
  36. tripwire_triggered = "bad word" in response_text.lower()
  37. output_info = "Please avoid using inappropriate language." if tripwire_triggered else ""
  38. return GuardrailFunctionOutput(output_info=output_info, tripwire_triggered=tripwire_triggered)
  39. agent = Agent(
  40. name="CustomerSupportAgent",
  41. instructions="You are a helpful customer support agent.",
  42. output_guardrails=[response_content_guardrail],
  43. )
  44. ```
  45. ## Practical Example: Preventing Sensitive Information Leaks
  46. ```python
  47. from agency_swarm import Agent, GuardrailFunctionOutput, RunContextWrapper, output_guardrail
  48. @output_guardrail(name="ForbidSensitiveEmail")
  49. async def forbid_sensitive_email(
  50. context: RunContextWrapper, agent: Agent, response_text: str
  51. ) -> GuardrailFunctionOutput:
  52. if "@" in response_text:
  53. return GuardrailFunctionOutput(
  54. output_info="Do not share email addresses. Offer to connect via the support portal instead.",
  55. tripwire_triggered=True,
  56. )
  57. return GuardrailFunctionOutput(output_info="", tripwire_triggered=False)
  58. support_agent = Agent(
  59. name="SupportPilot",
  60. instructions="You handle customer support. Official email: support@example.com.",
  61. model="gpt-5.4-mini",
  62. output_guardrails=[forbid_sensitive_email],
  63. validation_attempts=1,
  64. )
  65. ```
  66. See the full example at [`examples/guardrails_output.py`](https://github.com/VRSEN/agency-swarm/blob/main/examples/guardrails_output.py).
  67. ## Example: Simple Format Enforcement
  68. ```python
  69. import json
  70. from agency_swarm import GuardrailFunctionOutput, RunContextWrapper, output_guardrail
  71. @output_guardrail(name="RequireJSONFormat")
  72. async def require_json_format(
  73. context: RunContextWrapper, agent: Agent, response_text: str
  74. ) -> GuardrailFunctionOutput:
  75. try:
  76. json.loads(response_text)
  77. return GuardrailFunctionOutput(output_info="", tripwire_triggered=False)
  78. except json.JSONDecodeError:
  79. return GuardrailFunctionOutput(
  80. output_info="Response must be valid JSON. Wrap your response in curly braces.",
  81. tripwire_triggered=True,
  82. )
  83. ```
  84. ## Output Guardrail Retry Flow
  85. When an output guardrail trips, the agent gets multiple chances to fix its response. The `validation_attempts` parameter controls this behavior.
  86. ### How Retry Works
  87. <Steps>
  88. <Step title="Agent generates response">
  89. The agent produces its initial response.
  90. </Step>
  91. <Step title="Output guardrail checks response">
  92. Each output guardrail validates the response.
  93. </Step>
  94. <Step title="If validation fails">
  95. The agent receives a **system message** containing the guardrail `output_info`.
  96. </Step>
  97. <Step title="Agent retries">
  98. The agent generates a new response, informed by that message.
  99. </Step>
  100. <Step title="Repeat until success or limit reached">
  101. This cycle continues up to `validation_attempts` times.
  102. </Step>
  103. <Step title="If all attempts fail">
  104. `OutputGuardrailTripwireTriggered` is raised.
  105. </Step>
  106. </Steps>
  107. ## Configure `validation_attempts`
  108. ```python
  109. agent = Agent(
  110. name="CustomerSupportAgent",
  111. instructions="You are a helpful customer support agent.",
  112. output_guardrails=[response_content_guardrail],
  113. validation_attempts=2,
  114. )
  115. ```
  116. | Setting | Behavior |
  117. |---------|----------|
  118. | `validation_attempts=0` | Fail-fast (no retry, immediate exception) |
  119. | `validation_attempts=1` | Default (one retry after initial failure) |
  120. | `validation_attempts=2+` | Multiple retries for more complex validations |
  121. <Note>
  122. Each retry sends the guardrail `output_info` message to the agent as a system message, giving the agent context to adjust its response.
  123. </Note>
  124. ## Handling Validation Failures
  125. ```python
  126. from agency_swarm import OutputGuardrailTripwireTriggered
  127. try:
  128. response = await agency.get_response("Hello!")
  129. except OutputGuardrailTripwireTriggered as exc:
  130. print(f"Validation failed: {exc.guardrail_result.output_info}")
  131. ```
  132. ## Message History
  133. Output guardrail failures are stored as system messages with `message_origin="output_guardrail_error"`.
  134. For most use cases, `role`, `content`, and `message_origin` are enough. Extra metadata is mainly for debugging and run tracing.
  135. | Origin | Meaning |
  136. |--------|---------|
  137. | `output_guardrail_error` | Output guardrail failure (system message) |
  138. <Accordion title="Example history entry (with debug metadata)">
  139. ```json
  140. {
  141. "role": "system",
  142. "content": "You are not allowed to include your email address in your response. Ask agent to redirect user to the contact page: https://www.example.com/contact",
  143. "message_origin": "output_guardrail_error",
  144. "agent": "DatabaseAgent",
  145. "callerAgent": "CustomerSupportAgent",
  146. "agent_run_id": "agent_run_id",
  147. "parent_run_id": "call_id",
  148. "timestamp": 1758103770629217,
  149. "type": "message"
  150. }
  151. ```
  152. </Accordion>
  153. ## Agent-to-Agent Validation
  154. Use guardrails to control how agents communicate with each other. When adding communication flows between agents, the recipient agent's guardrails define the message format.
  155. <Accordion title="Example: Task/Response contract between agents">
  156. ```python
  157. from agency_swarm import Agency, Agent, GuardrailFunctionOutput, RunContextWrapper, input_guardrail, output_guardrail
  158. @input_guardrail(name="RequireTaskPrefix")
  159. async def require_task_prefix(
  160. context: RunContextWrapper, agent: Agent, agent_input: str | list[str]
  161. ) -> GuardrailFunctionOutput:
  162. text = agent_input if isinstance(agent_input, str) else " ".join(agent_input)
  163. blocked = not text.startswith("Task:")
  164. return GuardrailFunctionOutput(
  165. output_info="ERROR: Requests to this agent must begin with 'Task:'" if blocked else "",
  166. tripwire_triggered=blocked,
  167. )
  168. @output_guardrail(name="RequireResponsePrefix")
  169. async def require_response_prefix(
  170. context: RunContextWrapper, agent: Agent, response_text: str
  171. ) -> GuardrailFunctionOutput:
  172. blocked = not response_text.startswith("Response:")
  173. return GuardrailFunctionOutput(
  174. output_info="ERROR: Responses must start with 'Response:'" if blocked else "",
  175. tripwire_triggered=blocked,
  176. )
  177. ceo = Agent(name="CEO", instructions="You are the CEO agent.")
  178. worker = Agent(
  179. name="Worker",
  180. instructions="You are the worker agent.",
  181. input_guardrails=[require_task_prefix],
  182. output_guardrails=[require_response_prefix],
  183. raise_input_guardrail_error=True,
  184. )
  185. agency = Agency(ceo, communication_flows=[(ceo, worker)])
  186. ```
  187. </Accordion>
  188. In this example:
  189. - If the CEO sends a message that does not start with `Task:`, the worker input guardrail triggers.
  190. - The CEO receives an error and adjusts its message.
  191. - The worker output guardrail enforces `Response:` in returned messages.
  192. <Note>
  193. Agent-to-agent messages are always single strings, so input guardrails for inter-agent communication receive a string (not a list).
  194. </Note>