Artificial intelligence company Anthropic announced it has been working with the Department of Energy’s nuclear security experts to test whether its AI models could be manipulated to reveal sensitive nuclear information, the company shared with Axios.
The partnership with the National Nuclear Security Administration (NNSA), which began in April, represents what Anthropic believes is the first-ever testing of an advanced AI model in a classified environment. The program involves “red team” exercises where NNSA specialists attempt to probe Anthropic’s Claude AI models for potential vulnerabilities related to nuclear weapons information.
Initially focused on Claude 3 Sonnet, the program has now been extended through February to evaluate Anthropic’s newer Claude 3.5 Sonnet model. The company worked with Amazon Web Services to prepare its AI systems for secure government testing.
“AI is one of those game-changing technologies, and is at the top of the agenda in so many of our conversations,” said Wendin Smith, NNSA’s associate administrator and deputy undersecretary for counterterrorism and counterproliferation. “There’s a national security imperative in evaluating and testing AI’s ability to generate outputs that could potentially represent nuclear or radiological risks.”
While specific findings remain classified due to security concerns, Anthropic plans to share insights with scientific laboratories to enable broader testing. The initiative follows agreements signed by both Anthropic and OpenAI with the AI Safety Institute in August to test their models for national security risks before public release.
“The federal government has unique expertise needed to evaluate AI systems for certain national security risks,” said Marina Favaro, Anthropic’s national security policy lead. “This work will help developers build stronger safeguards for frontier AI systems that advance responsible innovation and American leadership.”