OpenAI has revealed its efforts in detecting and terminating accounts linked to five covert influence operations (IOs) that were attempting to spread misinformation and propaganda by leveraging its AI systems.
The malicious campaigns, originating from Russia, China, Iran, and Israel, were using OpenAI’s models to generate content in various languages, conduct open-source research, debug code, and create fake social media accounts and profiles. However, OpenAI’s proactive measures ensured that these IOs failed to gain significant traction or reach authentic audiences.
According to OpenAI’s report, the IOs tried to manipulate public opinion on topics such as Russia’s invasion of Ukraine, the Gaza conflict, Indian elections, and criticisms of the Chinese government. They used tactics like generating comments, articles, and website content, as well as translating and proofreading texts using OpenAI’s language models.
These campaigns further demonstrated how AI can be used for malicious purposes, but OpenAI’s defensive measures like model safety systems and AI-powered investigation tools were effective in disrupting the IOs’ operations.
One key indicator of the AI-generated contents was the presence of error messages from OpenAI’s models, such as “As an AI language model,” “Not a recognized word,” “I’m sorry, I cannot generate inappropriate or offensive content,” “Cannot provide a phrase,” and “Violates OpenAI’s content policy.”
These refusal messages were sometimes unknowingly published by the threat actors themselves on social media platforms like Twitter, as seen in these examples.
“Our investigations showed that these actors were as prone to human error as previous generations have been,” OpenAI stated, citing instances like the examples above where the IOs accidentally published these refusal/error messages, exposing the content as AI-generated.
OpenAI emphasized the importance of industry collaboration, and open-source research in combating these threats. The company has also shared detailed threat indicators with peers in the AI industry to increase the impact of the disruptions on these malicious actors.
Despite the IOs’ efforts, none of the campaigns scored higher than a 2 on the Brookings Breakout Scale, which measures the impact of covert influence operations. This means that while the fake content appeared on multiple platforms, it failed to break out into authentic communities or achieve meaningful engagement.