How OpenAI's New Safety Program Changes Enterprise AI Risk Profiles

On 25 March 2026, OpenAI launched a public Safety Bug Bounty program — a dedicated program for identifying AI safety and abuse risks that sit outside the scope of traditional security vulnerabilities. It covers prompt injection, agentic risks, data exfiltration, and platform integrity issues.

For enterprise security leaders, this is a significant development. It means the vendor is formalising external adversarial testing of the exact attack categories that most enterprise AI deployments are exposed to but few have structured processes to test for.

What the Safety Bug Bounty Covers

The program is focused on three categories of AI-specific risk.

Agentic risks. This includes third-party prompt injection attacks where attacker-controlled text hijacks a user’s AI agent to perform harmful actions or leak sensitive data. It also covers scenarios where agentic products like ChatGPT Agent perform disallowed actions at scale. Importantly, the program now explicitly covers MCP (Model Context Protocol) risk — a growing concern as agent ecosystems expand.

Proprietary information leakage. This covers model generations that inadvertently return proprietary information, including reasoning traces and internal system data that should not be exposed through normal usage.

Account and platform integrity. This includes bypassing anti-automation controls, manipulating account trust signals, and evading account restrictions — attack vectors that are particularly relevant for organisations running AI at enterprise scale.

The program complements OpenAI’s existing Security Bug Bounty and runs alongside periodic private campaigns focused on specific harm types, including biorisk content issues.

Why This Changes Enterprise Risk Conversations

Enterprise security teams have spent the past two years trying to understand how AI-specific risks fit into existing frameworks. Prompt injection, data exfiltration through AI agents, and model manipulation are not traditional security vulnerabilities. They do not map neatly to CVEs. They are not covered by standard penetration testing methodologies.

The Safety Bug Bounty program matters because it validates three things that enterprise security leaders have been arguing internally.

AI safety risks are real and distinct from security risks. The existence of a dedicated program, separate from the standard security bounty, confirms that these risk categories require specialised attention and are not adequately covered by conventional approaches.

External adversarial testing is necessary. If the company that builds the models acknowledges it needs external researchers to find safety flaws, enterprise organisations should not assume their internal testing is sufficient either. The program is OpenAI admitting that adversarial AI testing at scale requires community participation.

Prompt injection is an acknowledged threat vector. The explicit inclusion of prompt injection as an in-scope category — with a 50 percent reproducibility threshold for valid reports — gives enterprise security teams a vendor-validated reference for prioritising prompt injection defence in their own environments.

Implications for Enterprise AI Security Programs

Organisations that deploy OpenAI models, whether through ChatGPT Enterprise, the API, or agentic products, should use this development to strengthen their own AI security posture in three specific ways.

Update threat models to include agentic risk. If your organisation uses or plans to use AI agents that interact with external data sources, tools, or services, your threat model should explicitly account for prompt injection, data exfiltration through agent actions, and scope creep in agent permissions. The Safety Bug Bounty categories provide a useful starting taxonomy.

Establish internal AI red-teaming capability. The Safety Bug Bounty program is for researchers testing OpenAI’s products. Your organisation still needs to test how AI models behave within your specific environment, with your data, your prompts, and your integrations. Internal AI red-teaming — even at a basic level — is becoming a necessary security function.

Map findings to Essential 8 and ACSC guidance. For Australian organisations, the risk categories covered by the Safety Bug Bounty map to several Essential 8 controls. Application control is relevant for restricting which AI agents can interact with production data. Restricting administrative privileges applies directly to agent permission scoping. Patching applications extends to keeping AI models and integration layers current.

The Broader Signal for Enterprise AI Governance

The Safety Bug Bounty program is part of a broader pattern. OpenAI published the Model Spec the same day — a behavioural specification for model governance. Taken together, these releases signal that AI safety is transitioning from a research concern to an operational governance category.

For mid-market organisations, this creates both a challenge and an opportunity. The challenge is that AI security and safety are now expected capabilities, not optional enhancements. The opportunity is that vendor-provided frameworks and external testing programs create a foundation that enterprise teams can build on rather than starting from scratch.

What to Do This Quarter

Conduct an AI-specific risk assessment. Use the Safety Bug Bounty categories — agentic risk, information leakage, platform integrity — as a starting framework. Identify which risks apply to your AI deployments and where your current security controls have gaps.

Add AI safety to your vendor evaluation criteria. When assessing AI vendors, ask whether they maintain external safety testing programs, publish results or metrics, and have dedicated processes for AI-specific risk categories. The existence of such programs is a meaningful governance signal.

Brief your security team on prompt injection. If your security team is not yet familiar with prompt injection as an attack vector, this is the quarter to fix that. The techniques are well-documented, the risk is real, and the vendor ecosystem has now formally acknowledged it.

Our team works with Australian mid-market organisations to build AI-specific security and governance capabilities that complement existing security programs. If your organisation is deploying AI at scale and your security posture has not yet incorporated AI-specific risk categories, we would welcome the conversation.

How OpenAI’s New Safety Program Changes Enterprise AI Risk Profiles

What the Safety Bug Bounty Covers

Why This Changes Enterprise Risk Conversations

Implications for Enterprise AI Security Programs

The Broader Signal for Enterprise AI Governance

What to Do This Quarter

Submit a Comment Cancel reply

Recent Posts

Categories

Top Posts

How OpenAI’s New Safety Program Changes Enterprise AI Risk Profiles

What the Safety Bug Bounty Covers

Why This Changes Enterprise Risk Conversations

Implications for Enterprise AI Security Programs

The Broader Signal for Enterprise AI Governance

What to Do This Quarter

Submit a Comment Cancel reply

Recent Posts

Categories

Subscribe

Top Posts