Building Guardrails in the OpenAI Agent SDK

This OpenAI Agent post “Building Guardrails in the OpenAI Agent SDK” will explain how to implement a gurdrail system that protact the Agent from misuse.

What Are Guardrails?
Example: A Python-Only Guardrail
Integrating Guardrails Into the Main Agent
Testing the Guardrail
Why Guardrails Matter
Guardrails as Part of a Larger Agent Design
Conclusion

In my previous article on the OpenAI Agent SDK, I showed how to build an AI agent using the SDK, We looked at an example where an agent could generate secure passwords, fetch the current time, and hand off coding questions to a specialist Python tutor sub-agent.

That introduction laid the foundation for building powerful, multi-purpose agents. But in production environments, there’s an equally important question: how do you keep agents focused and safe?

This is where guardrails come in.

What Are Guardrails?

Guardrails are constraints that define what an agent should or should not do. They act as protective boundaries around the agent’s behavior, ensuring that it only processes valid or relevant input. Without them, an agent might attempt to answer queries outside its intended scope, leading to incorrect, misleading, or even risky outputs.

For example, imagine you’ve deployed a Python tutoring agent for a learning platform. You wouldn’t want students asking it about fitness routines, political debates, or general trivia—the agent should only handle Python coding questions.

The OpenAI Agent SDK provides a mechanism for exactly this: input guardrails. These allow you to classify incoming prompts and decide whether they should be processed—or blocked.

Building Guardrails in the OpenAI Agent SDK

Example: A Python-Only Guardrail

Let’s look at the implementation of a simple guardrail that ensures the user’s query relates to Python code. If the question is unrelated, the guardrail trips a “tripwire” and the request is blocked.

Here’s the heart of the code:

class PythonOutput(BaseModel):
    is_python: bool
    reasoning: str

guardrail_agent = Agent(
    name="Guardrail check",
    instructions="Check if the user is asking about python code.",
    output_type=PythonOutput,
)

async def python_guardrail(ctx, agent, input_data):
    result = await Runner.run(guardrail_agent, input_data, context=ctx.context)
    final_output = result.final_output_as(PythonOutput)
    return GuardrailFunctionOutput(
        output_info=final_output,
        tripwire_triggered=not final_output.is_python,
    )

A dedicated guardrail_agent is used to classify whether a question is Python-related. Its output is structured as a PythonOutput model, returning both a Boolean (is_python) and a reasoning string for transparency.

If is_python is False, the guardrail function returns a GuardrailFunctionOutput with tripwire_triggered=True, effectively blocking the request.

Integrating Guardrails Into the Main Agent

Once defined, the guardrail is attached to the main agent via InputGuardrail. Here’s how it looks:

triage_agent = Agent(
    name="My Agent",
    instructions=(
        "You are a helpful agent. You can generate passwords, "
        "provide the current time, and hand off to a Python tutor agent "
        "for code-related queries."
    ),
    tools=[generate_password, get_time],
    handoffs=[python_tutor_agent],
    input_guardrails=[
        InputGuardrail(guardrail_function=python_guardrail),
    ],
)

This ensures that all input passed to the agent is checked before execution.

If the input doesn’t pass the Python-only check, the SDK raises an InputGuardrailTripwireTriggered exception. Developers can catch this exception and decide how to handle it—for example, logging it, sending a polite error message to the user, or redirecting them to another resource.

Testing the Guardrail

Here’s a quick test:

try:
    result = await Runner.run(triage_agent, "What is an air dyne bike?")
    print(result.final_output)
except InputGuardrailTripwireTriggered as e:
    print("Guardrail blocked this input:", e)

try:
    result = await Runner.run(triage_agent, "Explain what a class is in Python.")
    print(result.final_output)
except InputGuardrailTripwireTriggered as e:
    print("Guardrail blocked this input:", e)

The first input, about a fitness bike, is blocked by the guardrail.
The second input, asking about Python classes, passes through and is handled by the Python tutor sub-agent.

This demonstrates how guardrails enforce scope while still allowing flexibility within their boundaries.

Why Guardrails Matter

Adding guardrails provides several benefits:

Focus and Relevance
Agents won’t drift into answering unrelated questions, improving user trust and experience.
Safety and Compliance
In enterprise settings, guardrails help prevent outputs that violate business rules or compliance requirements.
Transparency and Debugging
Because the guardrail’s decision is based on structured output, developers can log both the classification (is_python) and the reasoning for auditing.
Extensibility
The same pattern can be extended. For example, you could create guardrails for:
- Restricting to specific domains (e.g., finance only).
- Blocking sensitive topics.
- Filtering based on user roles or permissions.

Guardrails as Part of a Larger Agent Design

In my earlier post about the OpenAI Agent SDK, we explored how agents can combine tools and handoffs to create flexible multi-agent workflows. Guardrails add another layer: control and governance.

Think of it this way:

Tools give your agent capabilities.
Handoffs let it collaborate with specialists.
Guardrails keep it safe and on track.

Together, they form a powerful framework for building production-ready AI agents.

Conclusion

Guardrails in the OpenAI Agent SDK are not just technical niceties—they’re essential for building safe, reliable, and business-ready AI systems.

By implementing even a simple Python-only guardrail, as we did in this example, you can enforce strict domain boundaries and prevent misuse. As your agent ecosystem grows, combining tools, handoffs, and guardrails will give you the right balance between power and safety.

If you’re experimenting with the Agent SDK, I encourage you to start small—add a guardrail to your existing agent from the previous tutorial. From there, think about what “rules of engagement” your agents need in your specific environment. Guardrails give you the mechanism to implement them cleanly and transparently.

Discover more from CPI Consulting -Specialist Azure Consultancy

Subscribe to get the latest posts sent to your email.

Building Guardrails in the OpenAI Agent SDK

Table of contents

What Are Guardrails?

Example: A Python-Only Guardrail

Integrating Guardrails Into the Main Agent

Testing the Guardrail

Why Guardrails Matter

Guardrails as Part of a Larger Agent Design

Conclusion

Discover more from CPI Consulting -Specialist Azure Consultancy

Submit a Comment Cancel reply

Recent Posts

Categories

Top Posts