Can Gemini Be Google’s Beacon of Salvation, like Batman in Gotham?

December, 2023


The debut of ChatGPT was a defining moment for OpenAI, as it achieved a stroke of beginner’s luck with an impressive one million subscribers within five days of its launch. Concurrently, Google encountered a significant setback when its Bard chatbot shared inaccurate information in a promotional video that triggered Wall Street’s disapproval, resulting in a staggering loss of USD 100 billion in market capitalization. OpenAI has ever since maintained a leading position in the generative AI race, with Google playing catch-up.

    • ChatGPT excels with support for over 95 human languages, surpassing Bard’s capability, which extends to over 40 human languages.
    • OpenAI took a significant step in March 2023 with the launch of GPT-4, a large language model (LLM) featuring image and text understanding capabilities. Subsequently, in May 2023, Google introduced image input capabilities to Bard through its Google Search application.
    • In September 2023, OpenAI upgraded ChatGPT’s capabilities to analyze audio-based information in addition to existing text and image input capabilities. Following suit, Google introduced the Gemini Pro model in December 2023 and updated its Bard platform to include text, code, audio, image, and video

But this is not the first time Google has had to play catch up. Google has a history of trailing behind, particularly in deep technology. In October 2019, scientists at Google made a significant announcement about achieving quantum supremacy, implying that their quantum computer, Sycamore, could execute a computational task practically impossible for classical computers. This assertion faced skepticism from several academic and industry experts. Notably, scientists in China later replicated the computation in a few hours using ordinary processors.

Much like Batman restoring hope to Gotham, Gemini, Google’s latest endeavor in the large language model modality, could be a beacon of redemption for the tech giant. Despite OpenAI’s triumph with ChatGPT and Google’s setbacks, the launch of Gemini could be a turning point—an opportunity for Google to not just catch up but potentially redefine its narrative and emerge as a transformative force in the dynamic landscape of generative AI.

Gemini Rises from the Digital Shadows

As the threat of OpenAI loomed large and posed a potential challenge to Google’s supremacy in the search engine domain, triggering concerns about the impact on its ad-based revenue business model, Google initiated a code red in December 2022. Business groups within the company were given a directive to realign their efforts to counter this emerging threat.

In a strategic move in April 2023, Google merged Google Brain, its in-house deep learning artificial intelligence research unit, with its subsidiary, DeepMind. This union formed an entity named Google DeepMind, leveraging Google’s vast computational resources and DeepMind’s advanced research capabilities. Before this merger, both entities responded independently to ChatGPT, with DeepMind working on Project Goodall and Google developing Bard. These efforts culminated in Gemini, a groundbreaking development showcased at Google’s annual developer conference in May 2023. This strategic move aims to fortify Google’s position and respond effectively to the evolving landscape shaped by OpenAI’s growing prominence.

In Google’s Gotham, Can Gemini Be the Hero?

As of now, making a definitive statement about whether Gemini surpasses GPT-4 is challenging. However, it undeniably positions itself as a direct competitor to ChatGPT, boasting several notable advantages, as listed below:

    • Multimodality: Gemini possesses the unique capability to comprehend and operate across multiple modalities, including text, code, audio, image, and video. In contrast, ChatGPT currently lacks native video support. Built from the ground up as a multimodal tool, Gemini’s strength lies in its nuanced understanding of diverse information, making it more adept at tasks such as summarization, reasoning, coding, and planning compared to other AI models.
    • Better performance and computation: Gemini has surpassed current benchmarks in 30 of the 32 widely used academic tasks for LLM. Notably, Gemini Ultra outperforms human experts in Massive Multitask Language Understanding (MMLU), covering 57 subjects, including math, physics, history, law, medicine, and ethics. It also excels in the new Multimodal Massive Multitask Understanding (MMMU) benchmark, showcasing methodical reasoning across diverse domains. Gemini Ultra’s image benchmarks demonstrate superior performance without relying on object character recognition systems, underscoring its native multimodality and advanced reasoning capabilities. Google is also launching a new version of its Tensor Processing Unit (TPU) system, the TPU v5p, a computing system designed for use in data centers for training and running large-scale models.
    • Coding expertise: Google introduced AlphaCode nearly two years ago, marking the advent of an AI system capable of generating code to compete in programming competitions. With Gemini, Google has elevated its coding capabilities further, unveiling AlphaCode 2. This upgraded system is claimed to outperform 85% of coding competition participants, up from the original AlphaCode’s 50%. Gemini showcases prowess in understanding, explaining, and generating high-quality code across popular programming languages such as Python, Java, C++, and Go.
    • Flexibility: Gemini offers flexibility with three sizes to meet diverse requirements, from data centers to devices. Gemini Ultra, the largest LLM, handles advanced tasks such as multimodal information analysis. Gemini Pro, integrated into the global Bard platform, excels in contextual understanding, reasoning, and planning. Gemini Nano, designed for smaller form factors such as smartphones, is already integrated into Pixel 8 Pro, enhancing summarization and automated reply drafting functions. Google plans to extend Gemini to products such as Google Search, Ads, Google Chrome, and Duet AI. In initial experiments, Gemini significantly improved the search generative experience, reducing latency by 40% in English in the US while enhancing quality.
    • Responsible AI by design: Gemini prioritizes responsibility and safety throughout its design. Aligned with Google’s AI Principles, it incorporates new protection layers for its multimodal capabilities. Gemini has the most comprehensive safety evaluations of any Google AI model to date, addressing bias, cyber offense, and toxicity. Safety measures, including dedicated classifiers for violence and stereotypes, are integrated, making Gemini safer and more inclusive. Google’s commitment to responsible AI is evident in ongoing research, adversarial testing, and collaboration with external experts to ensure thorough stress testing across diverse issues.

One area where Google possesses a strategic advantage is its extensive data reservoir, surpassing limitations seen in OpenAI’s GPT-3.5 and GPT-4, confined to data up until January 2022 and April 2023, respectively. Unlike its counterparts, Google taps into real-time information from diverse sources, encompassing YouTube transcripts and live Google search results. Incorporating this proprietary, up-to-the-minute data into training the Gemini models positions Google uniquely, providing its models with better contextual understanding and heightened personalization, enabling more nuanced insights and inferences from an enriched dataset.

In his inaugural letter to company shareholders in April 2016 upon assuming the role of Google’s CEO, Sundar Pichai conveyed a visionary message, stating, “Over time, the computer itself—whatever its form factor—will be an intelligent assistant helping you through your day. We will move from mobile first to an AI-first world.” Google now endeavors to transform this long-cherished vision into reality through the ambitious initiative of Google Gemini.

By Chandrika Dutt, Research Leader, Avasant, and Abhisekh Satapathy, Senior Analyst, Avasant