Skip to main content

OpenAI has released Aardvark, a software security tool powered by GPT‑5.

Maintaining software security is a major challenge, as each year, tens of thousands of new vulnerabilities are discovered in both enterprise and open-source software. OpenAI aims to help security teams address this issue with Aardvark.

Aardvark is an autonomous agent that helps developers and security teams discover and address vulnerabilities at scale. Unlike traditional approaches that rely on fuzzing or software composition analysis, Aardvark uses advanced reasoning powered by Large Language Models (LLMs) to understand code behaviour. It examines code much like a human security researcher would.

It begins by analyzing entire repositories to create a threat model that reflects the project’s security objectives and design. It then monitors commits for changes, scans for vulnerabilities, and annotates the code to explain its findings. When a potential vulnerability is detected, Aardvark tries to confirm it in a sandboxed environment and provides a clear explanation of the steps taken. Once the vulnerability is confirmed, it integrates with OpenAI Codex to generate proposed patches for human review.

Aardvark’s Workflow

Source: OpenAI

Aardvark also integrates smoothly with existing workflows, including GitHub and Codex. While designed for security, it has also been found to uncover other bugs, such as logic errors, incomplete fixes, and privacy issues.

The tool has been tested internally at OpenAI and with external alpha partners. In benchmark tests on known repositories, Aardvark identified 92% of known and synthetically introduced vulnerabilities, showing both high recall and real-world effectiveness. It has also been applied to open-source projects, where it found numerous vulnerabilities that were responsibly disclosed, including ten that received official CVE identifiers.

Aardvark is available in beta mode, and OpenAI is inviting select partners to gain early access and collaborate with the team on detection accuracy, validation workflows, and reporting.

About the author: