Matthias Feys
Q / CTO
Yesterday, Google launched the first version of their new family of models: Gemini. The launch immediately sparked discussions - is Google, with these new models, opening up the competition field with the OpenAI models? In this blog, we’ll share our view on the Gemini launch.
Until now, GPT4 has set the standard for text generation capabilities. With Gemini, Google now wants to challenge this position. Gemini Ultra - the biggest, most capable version of the model - promises to compete with GPT4, according to several benchmarks reported in the technical paper. Next to Gemini Ultra, there is also Gemini Pro, with performance more comparable to GPT3.5 and Gemini Nano, which would support on-device LLM support!
We should always be careful about jumping to conclusions based on these benchmarks: specifically for LLMs, it’s difficult to abstract the impact of the prompt used. Still, these results show the potential of Gemini in text generation: for the first time, a model performs better than human experts on the MMLU dataset, a benchmark generally used to evaluate LLMs against human experts.
As of December 13th, Gemini Pro will be available via Google AI Studio and Vertex AI. In many places (but not yet in Europe), a fine-tuned version is already available in Bard, allowing for easy experimentation with the new model. The broad release of the Gemini Ultra is planned early next year as mentioned in this release statement, after further refining and ensuring the model is safe.
To experience the full text capabilities of Gemini Ultra you’ll need some patience. Or, if you’re lucky, you can get early access to experiment as a selected partner or customer. As a Google Cloud Premium Partner, we are in close contact for early access and are keen to experiment to learn what this model can mean for our customers.
So, why are we so excited about Gemini today? While text generation has been a major focus in the GenAI landscape, what really sets Gemini apart from other generative models are its multimodal capabilities: Gemini has been trained natively multimodal, it can take audio & visuals as input next to text and return text and images as output. This enables:
This more generalist, multimodal approach seems to us the most important differentiator for Gemini in the current landscape. As we are learning from building GenAI solutions with customers, businesses have information in very different formats. These multimodal capabilities open up new possibilities for businesses to interact with that information in a more intuitive way, for example by using Multimodal RAG.
PaLM 2 deals with text, while Gemini is more versatile, handling various types of data like text, images, and code. What makes Gemini different from PaLM 2 is its ability to learn from diverse sources. Since Bard (Google’s chat based AI tool) is moving from PaLM 2 to Gemini models, we expect it to benefit from Gemini's capabilities as time goes on.
Over the next weeks, we’ll start using these models at ML6 and find out what value they can bring to our customers. Open questions remain (pricing, regional availability, timing), but this much is clear: by launching Gemini, Google has made a significant move in the GenAI race, and we can’t wait to see what’s next.
If you’re interested to find out more about Gemini, or what GenAI can mean for your business, don’t hesitate to reach out. We’d love to think along.