Google’s OSS-Fuzz initiative recently found and reported 26 new vulnerabilities to open-source project maintainers. Among these was a critical vulnerability in OpenSSL (CVE-2024-9143), a foundation of internet infrastructure.
Since its launch 8 years ago, OSS-Fuzz has reported and helped fix over 11,000 vulnerabilities across various projects. However, the integration of Large Language Models (LLMs) in its fuzz target generation has enhanced the scope and depth of its analysis. Notably, as in the case of the OpenSSL flaw, which had remained undetected for nearly two decades despite extensive fuzzing by traditional methods.
Introduced in August 2023, the AI-Powered OSS-Fuzz project aimed to integrate LLMs into the fuzzing workflow. These models were tasked with generating fuzz targets—similar to unit tests that are designed to probe code for vulnerabilities.
The development team’s goal was to completely automate the manual process of developing a fuzz target end to end, including the following steps:
- Drafting an initial fuzz target.
- Fixing any compilation issues that arise.
- Running the fuzz target to see how it performs, and fixing any obvious mistakes causing runtime issues.
- Running the corrected fuzz target for a longer period of time, and triaging any crashes to determine the root cause.
- Fixing vulnerabilities.
That same month, they succeeded in getting the LLMs to handle the first two steps. They were able to use an iterative process to generate a fuzz target with a simple prompt.
By January 2024, Google had open-sourced the framework for this AI-driven fuzz target generation. Over time, improvements in context generation and iterative feedback allowed the team to extend fuzzing coverage from 160 to 272 C/C++ projects on OSS-Fuzz, adding over 370,000 lines of new code coverage.
This advancement enabled the tool to find critical vulnerabilities in projects like OpenSSL and cJSON. Both projects had vulnerabilities that could not be identified traditionally.
The limitations of human-written fuzz targets are a result of their focus on maximizing line coverage.
Line coverage refers to the measure of how many lines of code in a program are executed during testing. In fuzz testing, maximizing line coverage means ensuring that as many lines of the codebase as possible are tested to identify potential vulnerabilities.
While traditional human-written fuzz targets often focus on covering all lines of code, they may miss vulnerabilities in less-travelled code paths. In contrast, AI-driven fuzzing explores new code paths and configurations, uncovering flaws that might otherwise go unnoticed.
OSS-Fuzz automates the steps typically handled by developers:
- Drafting Initial Fuzz Targets: The LLM analyses source code, documentation, and unit tests to draft fuzz targets.
- Fixing Compilation Issues: When a fuzz target fails to compile, the LLM addresses errors iteratively.
- Testing and Debugging: The LLM identifies runtime issues in the fuzz target, ensuring stability for prolonged testing.
- Triaging Crashes: By analysing stack traces and related context, the LLM determines if crashes signify genuine vulnerabilities.
Sample Crash Triage
Credit: Google
Google’s ultimate goal is to fully automate this process, including generating patches for identified vulnerabilities. They are also hoping to continue collaborating on this area with other researchers.
Google researchers have reached a significant milestone in automated vulnerability discovery. This research has the potential to revolutionise how software vulnerabilities are found and patched. By automating and enhancing vulnerability discovery processes, tools like OSS-Fuzz are empowering defenders to stay ahead of attackers.