At the Google Cloud Next ’24 event, Google unveiled three open source projects aimed at facilitating the development and deployment of generative AI models. Additionally, the company announced the expansion of its MaxText project with the introduction of new large language models (LLMs) built on the JAX framework.
Among the latest LLM models incorporated into MaxText are Gemma, GPT-3, Llama 2, and Mistral, all of which are compatible with both Google Cloud TPUs and Nvidia GPUs.
The newly revealed open source projects include MaxDiffusion, JetStream, and Optimum-TPU.
MaxDiffusion comprises a set of high-performance and scalable reference implementations for diffusion models like Stable Diffusion. Leveraging the JAX framework, MaxDiffusion models are designed to deliver optimal performance in large-scale machine learning tasks.
JAX, integrated with the OpenXLA compiler, ensures efficient optimization of numerical functions, enabling model developers to focus on mathematical aspects while the software handles implementation details.
Google has worked extensively to optimize JAX and OpenXLA performance on Cloud TPU, collaborating closely with Nvidia to enhance OpenXLA performance on large Cloud GPU clusters.
JetStream, another open source initiative, serves as an optimized LLM inference engine supporting XLA compilers. This engine caters to the growing demand for a cost-effective inference stack delivering high performance, supporting models trained with JAX and PyTorch/XLA, and featuring optimizations for popular models such as Llama 2 and Gemma.
Mark Lohmeyer, Google Cloud’s general manager of compute and ML infrastructure, emphasized the importance of addressing the need for a cost-efficient inference stack in AI production workloads.
Lastly, Google introduced Optimum-TPU, specifically tailored for PyTorch users within the Hugging Face community. Optimum-TPU harnesses Google Cloud TPU performance optimizations for both training and inference, currently supporting the Gemma 2b model with plans to extend support to Llama and Mistral in the near future.
Google Introduces New Open Source Initiatives for Generative AI
At the Google Cloud Next ’24 event, Google unveiled three open source projects aimed at facilitating the development and deployment of generative AI models. Additionally, the company announced the expansion of its MaxText project with the introduction of new large language models (LLMs) built on the JAX framework.
Among the latest LLM models incorporated into MaxText are Gemma, GPT-3, Llama 2, and Mistral, all of which are compatible with both Google Cloud TPUs and Nvidia GPUs.
The newly revealed open source projects include MaxDiffusion, JetStream, and Optimum-TPU.
MaxDiffusion comprises a set of high-performance and scalable reference implementations for diffusion models like Stable Diffusion. Leveraging the JAX framework, MaxDiffusion models are designed to deliver optimal performance in large-scale machine learning tasks.
JAX, integrated with the OpenXLA compiler, ensures efficient optimization of numerical functions, enabling model developers to focus on mathematical aspects while the software handles implementation details.
Google has worked extensively to optimize JAX and OpenXLA performance on Cloud TPU, collaborating closely with Nvidia to enhance OpenXLA performance on large Cloud GPU clusters.
JetStream, another open source initiative, serves as an optimized LLM inference engine supporting XLA compilers. This engine caters to the growing demand for a cost-effective inference stack delivering high performance, supporting models trained with JAX and PyTorch/XLA, and featuring optimizations for popular models such as Llama 2 and Gemma.
Mark Lohmeyer, Google Cloud’s general manager of compute and ML infrastructure, emphasized the importance of addressing the need for a cost-efficient inference stack in AI production workloads.
Lastly, Google introduced Optimum-TPU, specifically tailored for PyTorch users within the Hugging Face community. Optimum-TPU harnesses Google Cloud TPU performance optimizations for both training and inference, currently supporting the Gemma 2b model with plans to extend support to Llama and Mistral in the near future.
Archives
Categories
Archives
Rust 1.84 Unveils Enhanced Strict Provenance APIs
January 16, 2025Google Introduces Jules: A New Contender in the AI Coding Assistant Space
December 29, 2024Categories
Meta