AI Hallucination, also known as Confabulation, is one of the major challenges of Large Language Models (LLMs). This is the ability of the LLMs to entirely or partially ‘make up things’ that are wrong, misleading and most times biased. It affects the integrity of AI-powered systems. It also raises concern over the ethical and security risk this poses to the individuals or organizations who integrate these systems in their operations.
OpenAI, one of the biggest AI research institutes and popular for the development of ChatGPT, has been pointed out to also experiencing this flaw. Whisper, OpenAI’s AI-powered tool used to transcribe and translate text into multiple languages has been discovered to also hallucinate. While this may be a powerful tool for transcribing conversations the system sometimes invents some text or and entire sentences that were never actually spoken.
According to recent studies, approximately one percent of Whisper’s transcriptions contain completely made up phrases. Even more troubling, about 38 percent of these hallucinations include potentially harmful content, from false medical information to misleading statements about violence or authority.
A University of Michigan researcher found AI hallucinations in eight out of ten public meeting transcriptions. Another machine learning engineer discovered false information in more than half of the transcriptions they analyzed across 100 hours of audio. One developer reported finding fabricated content in nearly all of their 26,000 test transcriptions.
This is a significant problem because despite OpenAI’s explicit warnings against using Whisper in “high-risk domains,” over 30,000 medical professionals and 40 healthcare systems have already started using Whisper-based tools for patient consultations. Alondra Nelson, former leader of the White House Office of Science and Technology Policy, points out, that such mistakes particularly in the hospitals could cause serious consequences, as nobody wants a misdiagnosis
While researchers aren’t entirely sure, they’ve noticed that these AI Hallucination often occur during pauses in speech, when there is background noise, or when music is playing. It is as if the AI tries to fill in these gaps with what it thinks should be there, rather than what is actually there. OpenAI has acknowledged these risks and still are carrying out research on how this combat this challenge.
The rapid adoption of these tools across various sectors despites their security flaws suggests that many organizations might be prioritizing convenience over reliability. Although AI is constantly evolving and improving in its performance, it is important that proper safeguards are implemented especially for critical sectors where mistakes could be dangerous. Cyber Security professionals should ensure that they properly test AI systems before using them in their organizations. Regardless of how much speed and convenience it takes to carry out an operation using AI, we must endeavor to confirm the validity of the result produced, understanding that AI systems are not perfect.