AWS Bets on Model Flexibility and Hardware Efficiency for a Distinctive Generative AI Advantage

February, 2024

Major hyperscalers have enthusiastically embraced generative AI, providing foundation models as a service and offering compute power for training/fine-tuning through proprietary AI/ML accelerators, foundation models, vector database services, and low-code generative AI platforms. While Microsoft and Google took an early lead in generative AI with Google’s introduction of Transformers in 2017 and Microsoft’s substantial investment in Open AI in 2019, AWS swiftly entered the scene with the launch of Code Whisperer in June 2022. Following the mainstream success of ChatGPT 3.5 in November 2022, AWS launched new services, such as Amazon Bedrock, a fully managed service that makes leading foundation models available through an API and Amazon Titan, a proprietary language model, in a limited preview in April 2023. Recent showcases at AWS re: Invent and Amazon AI Conclave unveiled a competitive suite of services, including a proprietary assistant, image generator, and vector databases. This evolution reflects a growing convergence in the generative AI stacks of major hyperscalers, spanning enterprise applications, middleware support, and hardware innovation.

Updated AI Gen RB Additonal Image 1030x565 - AWS Bets on Model Flexibility and Hardware Efficiency for a Distinctive Generative AI Advantage

Enterprise applications

Enterprise applications span generative AI-backed data visualization services, customer service agents, code generation, knowledge management, and workplace assistance. In this aspect, all the hyperscalers are almost equally matched. Microsoft gained an additional advantage from its investment in OpenAI and GitHub, which provided it with an integrated AI assistant for code (GitHub CoPilot) and knowledge management (Open AI ChatGPT). It further strengthened its generative AI offerings with Microsoft CoPilot for data visualization, workplace support, enterprise knowledge management, Microsoft Viva, and Contact Center AI to support enterprises around the workplace and customer service. Similarly, AWS offers the Amazon Bedrock service with a pre-configured repository of over nine foundation models (FM), including its proprietary FMs (Titan), Amazon Q, a generative AI assistant designed for enterprise workplace support, data visualization, and contact support, and Amazon CodeWhisperer for automated code completion. Although GCP might seem to have limited customization options with Duet AI for code generation and workplace assistance, its Vertex AI has proved to be more versatile among all the offerings due to its high customization support for building generative applications (Model Garden with support for 130 AI models including proprietary, open-source, and third-party FMs, and a repository of APIs) around specific industries and domains with its huge partner ecosystem.

Development tools

As the hyperscalers align their strategies, middleware support for generative AI is witnessing standardization, making it easier for businesses to integrate generative AI into their existing systems. It includes proprietary and third-party FM and LLMs, vector database management services (VDBMS), and low-code MLOps platforms. All major hyperscalers, including GCP with Gemini, PALM-2, and BERT, AWS with Titan, and Microsoft with Phi2 and ChatGPT, have invested significantly in proprietary FMs and LLMs. These models support their respective generative AI applications, such as Google Bard, Amazon Q, and Microsoft CoPilot. However, AWS stands out regarding integration support, offering compatibility with over eight open and proprietary LLMs from leading providers like Meta, Anthropic, Stability AI, Cohere, and A121 Labs. This is a notable advantage compared to hyperscalers like GCP and Microsoft, which typically support only 2-3 third-party models. Moreover, AWS demonstrates a competitive edge in VDBM services. It provides a comprehensive range of services, including graph-based data inferencing, real-time log optimization, and enhanced data search. In contrast, GCP primarily focuses on vector search optimization and data warehousing, while Microsoft concentrates on supporting database management systems like NoSQL, PostgreSQL, MongoDB, and others. Speaking of low-code MLOps platforms that assist users in model deployment, evaluation, and optimization, AWS offers Sagemaker. This platform empowers users with options to leverage built-in or custom-created ML models across 70 instances, focusing on serverless inferencing. Similarly, GCP’s Vertex AI Studio and Microsoft’s Azure ML Studio provide similar capabilities across major ML frameworks such as TensorFlow and PyTorch.

Hardware innovation

In the dynamic evolution of the generative AI landscape, the focus turns to AI chips and microprocessors, with AWS and GCP standing out as pioneers owing to their early initiatives that trace back seven years. In 2016, AWS kickstarted the AI chips race by acquiring Annapurna Labs. Initially designed for AI model training and inference, it expanded its focus to cloud-based AI hypercomputing. With AWS Inferentia 2, AWS Trainium2, Graviton 4 and ML blocks announced at 2023 re:Invent AWS continues to innovate at the GenAI Infrastructure layer. GCP, in contrast, is focused on AI chips for various applications ranging from data centers to smartphones. The introduction of its TPU v5e in December 2023 positions GCP’s chips as strong contenders against Nvidia H100, particularly in ML workload processing. Furthermore, GCP’s introduction of the Google Tensor G3 in August 2023, integrated into the Pixel 8 line of smartphones, marks a significant stride toward bringing generative AI computing capabilities to smartphones. Microsoft also recently dived into the AI chips segment with the launch of the Maia 100 for LLM training and inferencing and the Cobalt 100 for cloud computing workloads in November 2023. It will also benefit from its exclusive partnership with OpenAI, which has invested substantially in RAIN, an AI chip startup developing a dedicated Neural Processing Unit for AI workload processing.

Even as common solutions emerge in enterprise application, middleware support, and hardware innovation, AWS strategically distinguishes itself with a focus on two pivotal areas: flexibility in model selection and purpose-specific AI hardware.

    • Flexibility in model selection: While many hyperscalers are developing suites of applications centered around their foundational models, AWS has charted a distinct course. Unlike Microsoft and Google, which are committed to their proprietary FM and a few select open source LLMs such as LLaMA and Falcon, AWS offers a serverless managed generative AI service called Amazon Bedrock. This service enables enterprises to construct generative AI applications with the flexibility to choose the proprietary foundational model Titan or opt for models from leading AI firms like AI21 Labs, Anthropic, Cohere, Meta, and Stability AI. An additional advantage provided by Bedrock is the ability to compare and rank the output of all foundational models it offers tailored to the enterprise’s use case. This strategic diversification addresses the growing trend among CIOs who are increasingly scrutinizing the comparability of these models in terms of performance. For example, Gemini’s performance aligns closely with that of GPT 3.5. As enterprises grapple with whether to stay with the default hyperscaler or explore alternatives, this flexibility to choose from foundation models proves crucial. As more foundational models reach a level playing field in performance and regulatory compliance within the next 6-12 months, AWS’s ability to call specific LLM APIs tailored to the enterprise’s unique requirements positions it favorably in addressing the evolving needs of enterprises.
    • Purpose-specific AI hardware: In a strategic move to address key challenges in the AI landscape, AWS has intensified its focus on AI accelerators with a dual emphasis on reducing training costs and enhancing inference and energy efficiency. AWS has introduced purpose-specific AI chips, including Trainium 2, designed for energy-efficient training of AI models with elevated performance benchmarks. Additionally, AWS Graviton has been developed to streamline the processing of cloud-based AI workloads, while Inferentia is tailored to optimize the inference phase. Over the next six months, as companies assess the operational costs of running foundation models, AWS’s investment in specialized hardware capabilities is poised to position it favorably against other hyperscalers.

In 2024, the evolution of open, generative AI models is anticipated to advance significantly, closely rivalling proprietary models. Mistral’s release of the LLM, MoE 8x7B, in December 2023, is a testament to this progress, offering comparable capabilities to well-established models like GPT 3.5. Moreover, Mistral is gearing up to launch another model with capabilities matching GPT-4. As the performance divide between open and proprietary models diminishes, AWS is strategically positioned to excel by providing enterprises the flexibility for hosting a range of generative AI models in a hybrid or on-premises environment through its Bedrock service. Furthermore, LLMs’ increasing parameter size and context window necessitate efficient computing methods for seamless execution on smart devices like smartphones and laptops. AWS, with its substantial investment in AI hardware, is poised to lead the way in addressing this demand, ensuring optimal performance even on resource-constrained devices. This strategic focus on software and hardware positions AWS ahead of its competitors in the dynamic field of generative AI.

By Chandrika Dutt, Associate Research Director, Avasant and Abhisekh Satapathy, Lead Analyst, Avasant