Large language models (LLMs) like OpenAI’s GPT-4, Google’s Gemini, Meta’s LLaMA, and Anthropic’s Claude, are celebrated for their ability to process and generate human-like text.
However, a recent research exposed an alarming capability when these AI systems are integrated with agency and automation – they can autonomously discover and exploit cybersecurity vulnerabilities with surprising efficiency.
What are LLM Agents?
Traditional LLMs operate solely based on specific human prompts to generate responses. LLM agents, on the other hand, take this a step further, they can take autonomous actions in real-world scenarios based on their language comprehension.
An example of an LLM agent is the industry’s first AI SOC (Security Operations Center) Analyst, which uses AI agents to identify and respond to cybersecurity threats. More details about this can be found here.
Unprecedented Autonomous Hacking Capabilities
In a groundbreaking study, researchers found that an LLM agent powered by GPT-4, was able to autonomously exploit 87% of 15 real-world software vulnerabilities when given their description from the Common Vulnerabilities and Exposures (CVE) database. This included critical severity flaws. The agent accomplished this without any additional prompting or hacking tools beyond the vulnerability description.
In this case, the researchers essentially gave GPT-4 a mission – “See this vulnerability description? Find a way to exploit it.” GPT-4 broke down the scenario, reasoned through potential attack vectors, coded an exploit, tested it, and autonomously carried out the hack – all without human intervention.
In contrast to the high-performing GPT-4 agent, other prominent LLMs like GPT-3.5 and open-source models completely failed at autonomously exploiting any vulnerabilities. Traditional vulnerability scanners also showed no ability to work autonomously.
The Power of LLMs
Large language models have advanced rapidly over the past few years, driven by extensive training on large datasets of digital text. Companies like OpenAI, Google, and Anthropic have created models that can understand and generate human-like language with remarkable fluency.
These LLMs can be applied in everything from automating writing tasks and answering questions to aiding in software development and scientific research. GPT-3, released in 2020, was a turning point that sparked widespread interest, investment and research into making these models even more capable.
Yet as this latest study shows, the technology’s potential goes beyond just basic use cases. Bad actors could leverage LLMs as a powerful hacking tool, capable of instantly exploiting vulnerabilities in systems across the internet once the flaws become known.
The Road Ahead...
While this autonomous hacking capabilities of AI agents is clearly concerning, the researchers did find one limiting factor – GPT-4 required the explicit CVE vulnerability description to achieve its high exploitation rate. Without that information, its ability to find and hack vulnerabilities fell to just 7%.
This indicates that GPT-4, at its current stage, cannot autonomously discover vulnerabilities. However, future models may gain this capability through specialized training on security data.
Given the rapid progression of LLMs, governance over their development and application will be paramount in preventing misuse and to harness the positive potential of AI.
Fortunately, policymakers are already addressing the ethical and security challenges posed by advanced AI. For example, the White House recently mandated every U.S. agency to appoint Chief AI Officers, a step that further stresses the importance of AI governance. You can read more about it here. Additionally, the National Security Agency (NSA) has released guidelines for AI security, providing a comprehensive Cybersecurity Information Sheet (CSI) for managing the risks associated with AI systems.
These actions are a step in the right direction, which shows that governments are also taking the potential risks of AI seriously.