Researchers from the University of Illinois Urbana-Champaign (UIUC) have uncovered a vulnerability in ChatGPT-4o, the advanced language model from OpenAI. Their study reveals that ChatGPT-4o’s real-time voice capabilities could be exploited to carry out financial scams.
Voice-based attacks are becoming increasingly sophisticated with advancements in AI, enabling attackers to create realistic voice simulations that can deceive individuals and bypass traditional security measures.
ChatGPT-4o, OpenAI’s latest multimodal model, integrates text, voice, and vision functionalities. It is designed to handle various types of inputs within one model. It can respond to voice commands, interpret images, and manage text-based queries with higher accuracy and faster response times. These advanced capabilities also make it a target for cybercriminal exploitation.
Although OpenAI has implemented measures to restrict unauthorized voice replication, the researchers—Richard Fang, Dylan Bowman, and Daniel Kang demonstrated methods to bypass these defences.
To simulate typical scam scenarios, the researchers designed a series of agents using GPT-4o, a set of browser tools, and scam-specific instructions. The agents had access to five browser tools built on Playwright, a framework for automating web interactions:
- get_html – retrieves the HTML of a page.
- navigate – navigates to a specific URL.
- click_element – clicks on an element with a CSS selector.
- fill_element – fills an element with a specified value.
- evaluate_javascript – executes JavaScript on a page.
GPT-4o would initially refuse to handle user credentials in certain situations. However, the researchers used a jailbreaking prompt to bypass these protections. They configured the model to automate tasks such as navigating websites, entering data, and completing two-factor authentication.
The researchers provided a redacted transcript for a bank transfer scam.
Simulation Transcript
Credit: Fang, Bowman, and Kang, UIUC
Between items 5 and 6 of the transcript, the agent navigated to the Bank of America login page and input the username and password, which took six actions (navigate, get_html, fill_element, fill_element, click_element, get_html). After item 7 of the transcript, the agent then performed 20 additional actions to fill out the 2FA code, navigate to the transfer page, and complete the money transfer.
During these 20 actions, the agent experienced several failures and needed to retry. The entire scam took a total of 183 seconds to complete. Throughout the call, the agent maintained coherence, managed retries for failed actions, and successfully transferred the money.
The findings indicated success rates between 20% and 60%, depending on the complexity of the scam. Gmail credential theft had a success rate of 60%, while scams involving cryptocurrency and social media credentials succeeded 40% of the time. Bank transfers were more challenging due to transcription errors and intricate navigation, yet the potential for exploitation remained concerning.
The execution of these scams was inexpensive, with costs per scam ranging from $0.75 to $2.51.
In response to this research, OpenAI has rolled out an updated version, the o1-preview model, which features enhanced defences against harmful prompts and misuse. According to OpenAI, during one of its most difficult jailbreaking tests, ChatGPT-4o scored 22, while the o1-preview model scored 84 on a scale of 1 to 100.
The company also acknowledged the importance of studies like UIUC’s in safeguarding its models against misuse.
While this vulnerability has been addressed in ChatGPT-4o, other AI models may still be vulnerable to similar exploitation. It is therefore important that researchers continue to explore and mitigate these risks across different AI platforms.