AI hallucinations and prevention in business: PwC

Bret Greenstein

Principal, Data and Analytics, PwC US

Email

Ege Gürdeniz

Principal, Cyber, Risk and Regulatory, PwC US

Email

Ilana Golbin

Director and Responsible AI Lead, PwC US

Email

If your company is like most, generative AI (GenAI) is already at work. Probably, you’ve stood up a few use cases. Maybe (as we have done at PwC) you’ve already scaled up some. And, most likely, your enterprise software now has embedded GenAI capabilities, which your people are using every day. But as GenAI becomes more critical to daily operations, a common question persists: Can I trust it?

Trust in GenAI requires all the traditional drivers of trust in tech: governance, security, compliance and privacy. As with traditional AI, you also need to mitigate bias. But GenAI also adds a new risk, “hallucinations” — outputs that seem plausible but have no basis in reality.

Traditional AI sometimes hallucinates too, but more rarely, and it’s usually operated by specialists who can catch these errors. With GenAI, hallucinations can — in some specific instances — pose a risk, especially for GenAI users who don't have deep AI experience.

The good news is you can manage this risk — and do so in ways that speed up your AI initiatives, not slow them down. Here’s what you need to know about hallucinations and how a robust Responsible AI approach can mitigate their risk.

Bret Greenstein

Principal, Data and Analytics, PwC US

Ege Gürdeniz

Principal, Cyber, Risk and Regulatory, PwC US

Ilana Golbin

Director and Responsible AI Lead, PwC US

If your company is like most, generative AI (GenAI) is already at work. Probably, you’ve stood up a few use cases. Maybe (as we have done at PwC) you’ve already scaled up some. And, most likely, your enterprise software now has embedded GenAI capabilities, which your people are using every day. But as GenAI becomes more critical to daily operations, a common question persists: Can I trust it?

Trust in GenAI requires all the traditional drivers of trust in tech: governance, security, compliance and privacy. As with traditional AI, you also need to mitigate bias. But GenAI also adds a new risk, “hallucinations” — outputs that seem plausible but have no basis in reality.

Traditional AI sometimes hallucinates too, but more rarely, and it’s usually operated by specialists who can catch these errors. With GenAI, hallucinations can — in some specific instances — pose a risk, especially for GenAI users who don't have deep AI experience.

The good news is you can manage this risk — and do so in ways that speed up your AI initiatives, not slow them down. Here’s what you need to know about hallucinations and how a robust Responsible AI approach can mitigate their risk.

When and why do AI hallucinations occur?

GenAI is designed to provide outputs most likely to be correct, based on the data and prompts provided. It will often “try its best” even when it has overly complex or incomplete data. This creativity helps give GenAI the flexibility to accomplish many different tasks, including some that only a human could previously do. But, GenAI’s creativity can sometimes go too far— most commonly when one or more of these situations occur:

Your prompt is vague. If your prompt doesn’t give GenAI enough context, it may try too hard to fill in the blanks. "Tell me HR guidance on diversity in recruitment,” for example, may provoke more hallucinations than “Provide me a complete list of our company’s current policies on DEI when recruiting recent college graduates for this specific role.”
Your training data is poor. If the data that your GenAI provider used to train a model was insufficient for your chosen use case, the risk of hallucinations will rise. If your tax team, for example, uses a generalist GenAI model rather than a tax-specific one, you may need to customize it further on tax data to increase reliability.
Your data is poor. For GenAI to generate high-value outputs tailored to your needs, it needs data that is unique to your enterprise — but this data too may be inadequate. If, for example, you use GenAI to power a Q&A chatbot, but it doesn’t have access to technical data about your products, it may give customers or employees inaccurate answers.
You give it the wrong tasks. GenAI can do so much that it’s easy to forget it’s not right for everything. Certain tasks (such as complex mathematics) are better suited for traditional AI or other technologies. If GenAI is told to do something it’s not equipped to do well, hallucinations can result.

How to mitigate the risk of AI hallucinations

Mitigating the risks of hallucinations has three main components, which are part of Responsible AI:

Deploying solutions less likely to generate hallucinations
Training people to reduce them
Creating ways to spot them before they do harm

To keep down the number of hallucinations, choose the right GenAI solution for the right use case — and be sure that it has adequate, reliable data. Identify and call out those use cases where data may be not good enough. You still may choose to proceed with these use cases, if their value is high enough, but they should be flagged for extra oversight.

Prepare your people with upskilling and templates to give GenAI suitable prompts. Set up guardrails to help people use GenAI only for its intended tasks.

Technology can help catch hallucinations. Retrieval-augmented generation (RAG) can enable GenAI to “double-check” an authoritative knowledge base, and application programming interfaces (APIs) can connect to approved content. Teach your people how to verify GenAI’s outputs, create channels to report suspect results, and establish a risk-based system to review these outputs. Finally, have people train GenAI, based on its mistakes, so it will make fewer going forward.

5 tangible actions to help mitigate AI hallucinations right now

Even though hallucinations are a moving target, you can manage the risk. If well-deployed, these Responsible AI measures won’t be onerous and they won’t slow you down. On the contrary, AI initiatives can proceed more rapidly when they cause fewer mistakes and don’t require costly fixes and do-overs. Here are measures you can take:

Add a risk lens to use case selection. A structured process to select and prioritize GenAI use cases is integral to an AI factory and GenAI success. This process should consider not only business value, but also potential risks, including hallucinations.
Evaluate each risk. As part of a thorough risk-assessment framework for GenAI, assess both how hallucinations might manifest in a given use case and what damage they might cause.
Create hallucination-specific controls and tests. The details will depend on your risk-assessment framework, but common measures include connecting GenAI to context-specific data, creating guardrails (through both software and prompt templates for people) and testing multiple prompt scenarios.
Help your people. Educate your workforce on how to reduce hallucinations and spot them when they occur. Emphasize the need for human review and verification, as part of a tech-powered, human-led approach: People should always both oversee GenAI rigorously and make high-value or high-risk decisions themselves.
Stay up to date. Stay informed about how GenAI, GenAI hallucinations and hallucination mitigation are evolving. Consider not just technological advancements, but also any new industry standards and regulatory updates.

What’s next for AI hallucinations

We don't expect hallucinations to go away. We do expect them to change in nature, while companies’ approach to them becomes more sophisticated.

As GenAI models improve, they will be better able to recognize nuances in language, track long-term dependencies in text, understand the intent and context of a user’s request and address mathematical challenges. They will become ever more multimodal, incorporating multiple forms of data such as text, images and audio. All these improvements (and others too) could help reduce or eliminate some hallucinations that are common today.

But as people use GenAI for more tasks — in medical diagnoses, market and economic forecasts, fundamental scientific research and more — new kinds of hallucinations will likely emerge and the cycle will continue.

Hallucinations are, in short, an inevitable part of GenAI’s “creative” process: Like a person, as GenAI tries to innovate and find new insights, it will err sometimes. Companies should be ready to catch these mistakes, so GenAI too can “fail fast” and do better next time.