Google DeepMind has unveiled Gemma 3, the latest advancement in its generative AI lineup, bringing enhanced multi-modal capabilities that enable the model to process and interpret visual data alongside text. This new version allows users to analyze images, extract insights, recognize objects, and generate responses based on visual content.
Announced on March 12, Gemma 3 is available for testing in Google AI Studio, offering improved performance in coding, mathematical reasoning, and following instructions. Google DeepMind has designed the model to support both vision-language inputs and text-based outputs while accommodating an extended context window of up to 128,000 tokens. Additionally, Gemma 3 expands its language comprehension to over 140 languages.
The update also introduces structured outputs and function calling, making it more effective for logical processing. Developers can choose from four model sizes—1B, 4B, 12B, and 27B—each available in both pre-trained and instruction-tuned versions. The large context window enables it to manage vast amounts of information, making it well-suited for complex problem-solving tasks.
For deployment, Gemma 3 provides multiple options, including Google Cloud’s GenAI API and Cloud Run. The model weights can be downloaded from Hugging Face and Kaggle, ensuring accessibility for developers. With an optimized code base and enhanced fine-tuning capabilities, Gemma 3 supports seamless inference and customization.
Nvidia GPUs, ranging from entry-level Jetson Nano to the high-performance Blackwell series, offer direct support for Gemma 3, maximizing its efficiency. The model is also optimized for Google Cloud TPUs and integrates with AMD GPUs. For execution on GPUs, developers can utilize Gemma.cpp for streamlined implementation.
Alongside Gemma 3, Google DeepMind introduced ShieldGemma 2, a 4B-parameter model designed to enhance content safety. Built on Gemma 3’s foundation, ShieldGemma 2 assesses both synthetic and natural images to ensure adherence to safety standards. It is intended as a filtering tool for vision-language models and image generation systems, mitigating risks associated with harmful or explicit content.
With these innovations, Google DeepMind continues to push the boundaries of AI capabilities, providing developers with powerful tools for advanced image analysis, text processing, and ethical AI development.
Google Introduces Gemma 3: A Multi-Modal AI Evolution
Google DeepMind has unveiled Gemma 3, the latest advancement in its generative AI lineup, bringing enhanced multi-modal capabilities that enable the model to process and interpret visual data alongside text. This new version allows users to analyze images, extract insights, recognize objects, and generate responses based on visual content.
Announced on March 12, Gemma 3 is available for testing in Google AI Studio, offering improved performance in coding, mathematical reasoning, and following instructions. Google DeepMind has designed the model to support both vision-language inputs and text-based outputs while accommodating an extended context window of up to 128,000 tokens. Additionally, Gemma 3 expands its language comprehension to over 140 languages.
The update also introduces structured outputs and function calling, making it more effective for logical processing. Developers can choose from four model sizes—1B, 4B, 12B, and 27B—each available in both pre-trained and instruction-tuned versions. The large context window enables it to manage vast amounts of information, making it well-suited for complex problem-solving tasks.
For deployment, Gemma 3 provides multiple options, including Google Cloud’s GenAI API and Cloud Run. The model weights can be downloaded from Hugging Face and Kaggle, ensuring accessibility for developers. With an optimized code base and enhanced fine-tuning capabilities, Gemma 3 supports seamless inference and customization.
Nvidia GPUs, ranging from entry-level Jetson Nano to the high-performance Blackwell series, offer direct support for Gemma 3, maximizing its efficiency. The model is also optimized for Google Cloud TPUs and integrates with AMD GPUs. For execution on GPUs, developers can utilize Gemma.cpp for streamlined implementation.
Alongside Gemma 3, Google DeepMind introduced ShieldGemma 2, a 4B-parameter model designed to enhance content safety. Built on Gemma 3’s foundation, ShieldGemma 2 assesses both synthetic and natural images to ensure adherence to safety standards. It is intended as a filtering tool for vision-language models and image generation systems, mitigating risks associated with harmful or explicit content.
With these innovations, Google DeepMind continues to push the boundaries of AI capabilities, providing developers with powerful tools for advanced image analysis, text processing, and ethical AI development.
Archives
Categories
Archives
JetBrains Launches Koog: A New Kotlin-Based AI Agent Framework for the JVM
May 31, 2025Red Hat Unveils RHEL 10 with AI-Enhanced Automation Tools
May 25, 2025Categories
Meta