LLM Hallucinations

Examples, and methods on how to minimize them

Hey everyone,

This week I’d like to talk about LLM Hallucinations. We’ve all been there: We come up with a great idea, connect a model API, write a prompt that passes the vibe-check and think we’re good to go.

However, LLMs had another plan for us. Although they’re incredibly fast and do pretty well on most things, they have one practical limitation: hallucinations.

The two most common hallucinations are:

  • Hallucinations in a dialog form: when a given LLM misses an information from a past message and makes an answer up;

  • Hallucinations that produce an untrue statement that contradicts a well-known fact.

Well, that’s okay - but how do we actually reduce them, and can we really?

There are some techniques, and while they’re useful, they’re not a set and forget thing. You need to constantly “stress-test” your LLMs with edge cases and with your own knowledge.

Advanced Prompting

One very obvious approach is: let’s actually create better prompts to mitigate these hallucinations. Heck, even just writing: “If you don’t know the answer to a question, please don’t share false information.“ might do the trick!

But, we’re smarter than that.

Two very powerful prompting techniques are: few-shot and chain-of-thought prompting. I wrote about them in more details here and here.

With these techniques, we provide examples and the reasoning path in the prompt, so that the LLM can give better outputs (especially for edge cases!).

Data augmentation:

Now, we’re getting serious. Data augmentation is simply the process of providing extra information to the model, using any kind of data that we’d like the model to reference.

The goal here is to provide “dynamic context” that fits the LLM window context, rather than pass some static data for each LLM call.

Are they asking about a specific product information? We’d like to call a vector database, pull that product information, and then add it as context to our prompt.

But, how can we actually do that?

We do that by storing ALL of our proprietary data in a very fast database, that we call a “vector database”. Then using some very powerful LLM models which we call “embedding” models and some data manipulation strategies we search through that pile of information and bring back only the context that we care about.

This is called RAG or Retrieval Based Augmentation. (link)

And this “data augmentation” doesn’t stop here. Nowadays, LLMs are really good at being instructed on how to use external tools.

So let’s say that you store all of your product information behind an API in one of your suite of products. Instead of storing your data in a vector database, you can just instruct an LLM to hit your API and get a JSON output with a schema that should match a schema that you give as an example in your prompt.

In the LLM world this is referenced as Function calling(link), a term coined by OpenAI, but now Anthropic has it as well (at least in beta!).

Fine tuning

All of this becomes even more interesting when you have an LLM in production. This is where you hit next level.

At this point you can collect all of your user’s feedback, and put it into a big table. Then, you can use that table to “stress test” your LLM once again, but this time with actual, real-world examples.

And guess what? Not just that, but you can fine-tune your own smaller model (yes, it will cost less to run!); one that will do much better for your specialized task.

And there you have it.

Advanced prompting, data augmentation and fine-tuning are some very good ways to minimize hallucinations.

These can be really useful, but you already know the obvious: you always need to evaluate your systems and test them with lots of examples.

Are you building with LLMs? Facing some difficulties? Want to learn about a new tool/concept?

Reply to this email and let me know! I read every reply!

Interesting nuggets for this week:

Are you using LLMs for building and having difficulties? Do you want to gain more knowledge about a concept?

Subscribe to keep reading

This content is free, but you must be subscribed to ThePrompt to continue reading.

Already a subscriber?Sign In.Not now