Michiel De Koninck
Machine Learning Engineer and LLM specialist
In today's fast-paced world, staying up-to-date with the latest advancements and trends in your field is crucial. As a result, knowledge management systems have become increasingly popular, providing organizations with a centralized location to store and access their knowledge. However, not all knowledge management systems are created equal. In this blog post, we'll explore how leveraging language models, such as the recently released LLM, can enhance the effectiveness of your domain-specific knowledge base. We'll cover the basics of language models and how they can be trained on your organization's data to improve search accuracy, automate tagging, and even generate new content. Let's dive in!
An LLM is a Large Language Model. OpenAI’s GPT-4 is one example, Meta’s LLamA is another. We make the conscious choice here to stick to the general LLM term to refer to these models. Bear in mind: each of these models were trained on a gigantic set of (publicly available) data.
It has been clearly demonstrated by now that these LLMs have a meaningful understanding of general language and that they are able to (re)produce information relevant to the information that was present in their training data. This is why generative tools like ChatGPT perform astonishingly well at answering questions about topics that the LLM encountered during its training.
But what remains out of the direct grasp of those massive LLMs is the data that is so valuable within each organisation: the internal knowledge base. The question that thus massively pops up is:
How can we leverage the power of these LLMs in unlocking information stored in a specific knowledge base upon which it wasn’t originally trained?
Oh okay, so to do this, can’t we just introduce our internal knowledge base as extra data upon which the LLM should be trained? Or, if you will, can we fine-tune the LLM on our specific knowledge base.
Yes, you most likely can. But for reliable question answering, it might not be the way to go.
Meet Billy the Bookworm. Billy is a large language model and he has devoured a gigantic amount of online information, empowering with enormous knowledge. Bily however, smart as he is, has not read through the books in your very specific home library.
Fine-tuning is this: presenting Billy the Bookworm with all the books in your very specific knowledge base and letting him gobble up all that tasty extra information. This way, the LLM bookworm Billy doesn’t just know all of that general information, he also “knows” a lot about the contents of your specific knowledge base.
Congratulations, trough this fine-tuning proces you’ve turned Billy into a very specific Billy that knows so much about your specific domain! Below we show how you could start putting Billy to work. By posing questions to your improved bookworm, you can expect answers that use both the information from its gigantic general training set and information stored in your specific knowledge base.
While certainly powerful, the crucial problem with this solution approach, is that you still have little insights into how your bookworm came up with its answers. Moreover, fine-tuning an LLM has its (costly) consequences.
We list the main reasons why fine-tuning Billy comes up short:
Luckily, all these problems are solvable. If answering questions in a verifiable way and preventing hallucination is what you are looking for: you may not need the hyper-modern bookworm, let’s just ask the good old librarian where to find the answers to your questions.
The idea behind Retrieval-Augmented Generation (RAG) is quite straight-forward. Remember, the goal is to unlock the information in our knowledge base. Instead of unleashing (i.e. fine-tuning) our bookworm on it, we comprehensively index the information of our knowledge base.
In the schema above, we illustrate how the Smart Retriever functions like a librarian. Ideally, the librarian has perfect knowledge of what is in his library. For a visitor asking a certain question, he would know just which chapter of which book to recommend.
On a more technical level, this describes a semantic search engine. In this case, the embeddings are vectorial representations of document sections and they allow a mathematical description of the actual meaning stored in each section. By comparing embeddings, we can determine which text sections are similar in meaning to which other text sections. This is crucial for the retrieval process displayed below.
In play are two crucial components:
It should be clear by now why this approach is called Retrieval-Augmented Generation. Based on the question asked, you first retrieve the most relevant information from your internal knowledge base; you then augment the typical generation phase by passing that relevant information explicitly to the generator component.
Aside from those highlights, the multi-lingual aspect of LLMs is a thing of beauty. You can have a knowledge base consisting of purely Italian recipes which your pasta-loving French friend can talk to in an all-French dialogue.
Note that in the section above, we dismissed fine-tuning as a valuable option because we had little control on source clarity thus increasing the risk for hallucination.
It must be noted that the RAG approach, powered by a general LLM, only works well as long as the specific knowledge base does not contain super specific jargon that the LLM can’t understand from its general training.
Imagine you need the responses of your solution to follow ‘the tone and lingo’ that is present in your knowledge base. In this case the fine-tuning of your LLM seems less avoidable.
It could be a valid approach to be able to handle specific jargon and then incorporate your fine-tuned LLM in the RAG architecture to reap the combined benefits. Instead of working with a general bookworm, you would then use your specifically trained Billy to power the Generator and/or the Smart Retriever components.
Excellent question.
Semantic Search (smart retrieval) has been around for quite some time and so has generative AI (some primitive forms have been around for decades).
However, we have seen pivotal advancements over the last months.
On a technological level, we’ve recently witnessed big leaps forward in LLM performance. These positively impact the RAG solution on two levels:
Accompanying that improved generative quality is the increase in traction. Previously, companies could not easily imagine the opportunities of a system relying on generative AI. Now however, thanks to the wide media coverage and adoption of tools like ChatGPT, overall interest has grown exponentially.
So, though arguably mediocre versions of RAG might have been possible for quite some time, technological improvements and increased traction result in a fruitful market opportunity.
In this section, we aim to introduce you to some of the main challenges with setting up a successful RAG solution.
Charmed by the potential of RAG and intrigued by the related challenges, we now move on to looking at an actual RAG-based solution.
If your interest is sparked by the concept of Retrieval-Augmented Generation, you may be asking yourself:
Do I have what it takes to take a RAG-based solution for a spin?
Well, if you have:
Then yes, RAG might be the way to go for you.
As an experiment, we recently built a small demo to showcase how this technology can be leveraged to support government staff in answering parliamentary questions more easily.
In this case, the specific knowledge consists of:
The business value, meanwhile is in:
If you are looking for some guidelines on how to technically implement a similar solution, stay tuned for a follow-up blogpost where we aim to zoom in on the technical details of setting up RAG.