Letâs beat this dead horse one last time: Large Language Models (like GPT, Claude, Gemini, âŚ) have knowledge on a wide range of topics because theyâve been trained on vast amounts of internet data. But once theyâre training is complete, their knowledge is fixed. They canât go for a sneaky little toilet Google-search when they run out of arguments in the middle of a hypothetical discussion with their know-it-all brother-in-law. Or can they?
Any decent LLM will be able to tell you when the French Revolution went down: 1789. It doesnât need to know anything about Ridley Scottâs Napoleon film for that. Nor does it need to know (and neither should I) the link between this year and my dadâs super duper protected bank account.
However, imagine weâre interested in knowing whether Mr. Scottâs Napoleon won an Oscar yesterday. The model would suddenly need real-time web information. Thatâs something very different. How does that work?
I promise Iâll stick to just one simple analogy. Moreover itâll be an analogy that makes sense. Yes, a very close-fetched analogy indeed.
So you are a philosopher, lost at sea. Suddenly you see: an iceberg (ChatGPT). Naturally, you ask this iceberg to explain to you what it is. It tells you itâs the tip of an iceberg, and it can answer any question you throw at it.
You mean to ask where this iceberg gets its information from, when suddenly âŚ.RING RING RING ⌠why itâs ya boy Archimedes here to teach you a lesson on buoyancy:
most of an icebergâs mass is actually BENEATH THE SURFACE, in the salty water of the ocean, only about 10% of its mass sticks out above.
But of course, you realise: the tip of the iceberg is just what we see! It can only exist because of the mass beneath the surface.
So enough with the crazy talk. The point is: when you interact with a tool like ChatGPT (or Googleâs Gemini Chat, Anthropicâs Claude, Perplexity,âŚ) youâre talking to a Web UI that takes your input and sends that beneath the surface through to the software system that they have designed.
This system beneath the surface consists of the actual Large Language Model (LLM) and a control layer defining what input is sent to the model and what is finally sent back through to the Web UI.
The LLM is represented by Billy the bookworm (introduced first in this blogpost) to underline the vast amount of internet training data that these models go through to achieve their impressive capabilities.
Itâs crucial to note that those LLM capabilities are literally nothing else than ânext token (~word) predictionâ; for a given input it just comes up with a sequence of highly likely next words. Note that for many LLM Web UIs you can make a choice for which LLM you want to use; youâre simply just swapping the red block for another LLM (e.g. swap GPT-4 for GPT-3.5, or Claude 3 for Claude2, or Llama-2â13B for Mistral-7B).
So there we go! When you ask a question through a Web UI there is a controlling layer that decides whether Web Knowledge access is needed.
But as discussed, these LLMs cannot browse the internet themselves, so when you pose a question on how Ridley Scott has performed in the very recent Oscars, Billy the LLM wonât know that from the knowledge it was trained on. However, luckily for us, the controlling layer of the Web UI is able to do a quick search of the web (just like a human would), fetch the right information and deliver that to the LLM as context to form its response with.
Additionally, it should be clear that if youâre building a separate software solution that talks directly to the red block, the LLM API (e.g. the GPT-4 API), you will not be getting the sweet benefits of the Web Knowledge Access functionality. We will look into why that is below but first, we have to build a mutual understanding of the Web Search Process.
Letâs shine a light on how some different LLM Web UI providers approach the search capability. Below we compare ChatGPT, Gemini and Perplexity.
All these providers use a specific Search Engine to perform search across the web. Search Engines are quite complex but in short we can state that these engines are built by crawling through the entirety of the internet and for each webpage storing pieces of information that can be used to find these pages. The result of such a crawling exercise is an index. For more specific information on how search engines look through these indexes to find relevant web page results, Iâll refer to our piece on semantic search.
Perplexity built a Search Engine specifically to be used by their Generative AI application; therefore they cut some corners and donât index âthe entire internetâ every day. If you ask about the front page of your favourite news page today; it might answer you based on the news from two days ago simply because the Perplexity engineers decide to only re-index your favourite site every three days.
Now, you may have a brilliant search engine, you still have to configure exactly how and when to use it. And thatâs exactly where the control layer comes into play. Consider the flow below. It should be clear from the diagram, that itâs the specific Control system that determines what happens e.g. âhow many results from the Bing Search engine do we considerâ, âhow many pages do we want to fetch content fromâ, âhow do we filter that content to lower the chances of misguiding our GPT-4 generator (cfr. prompt injection dangers).
Thereâs one crucial note to make here when thinking about leveraging this search functionality: as a user, you have NO control on how the search flow works. You can be impacted:
And so we reach the conclusion that when using the WebUI, your capabilities with regards to web access will be limited. And thatâs only natural: imagine ChatGPT had no limit on how many web pages were visited after the control layer decides on doing a Bing Search. Then a single user question could lead to drawing in information from 15 different sources thus increasing the amount of tokens that are passed to the LLM and thus directly impacting cost (and possibly also performance as content may be conflicting). Given users today pay a fixed cost for the LLM WebUI that is ChatGPT, the mechanism has to be restricted for it to be safe and maintainable.
So then, if we need to take control, how can we do that?
Conceptually this should be quite obvious. You design your own control system and use that to guide the input/tasks going towards the generator LLM API (e.g. GPT-4) and the output flowing back out from it. This way you harness the reasoning capabilities but have full control of e.g. what search engine is used, how the resulting search content is filtered, how many sub pages are then visited âŚ
Note that weâve visualised the LLM as before in the iceberg but of course the same requirements for custom control hold if you are self-hosting an LLM instead of talking to it through some API.
Scratch all that weird philosopher stuff. You are now a down-to-earth solar panel manufacturer trying to sell your sweet panels. You have a list of 10000 roof repair companies. You want to check which of these offer the service of placing solar panels and potentially also which panels they supply so that you can contact specifically those companies that are most likely to be interested to work with your panels.
Below we draft a solution for the given challenge. We can build a system that, for each roof operator, finds the site_url
to visit and then visits that site to fetch content from it, filter it (beware of prompt injection and donât send unnecessary tokens to your model) and then analyse whether perhaps it is needed to visit a separate tab (perhaps the site has a tab âsolar panel installationâ) to make a final decision or whether itâs already crystal clear from the home page.
By building this quite straight forward solution yourself, you have full control over what data flows through, how many tokens go to the LLM, what search engine is used, how web content is handled, ⌠and thus can ensure an efficient, tailored solution that will work robustly over time.
âď¸ Sunny days ahead!
In this icy story we explored:
And from that we concluded:
Oh and we also learned: if you ever find yourself lost at sea, donât trust what the tip of the iceberg is telling you. Dare to dive underneath the surface. Solid advice. Right? đ§
â