Meta has launched the fourth generation of its large language models, including Llama 4 Scout Llama 4 Maverick, and Llama 4 Behemoth.
According to Meta, Scout is a compact model designed for efficiency. It reportedly performs well across a range of standard benchmarks while remaining lightweight enough to run on a single Nvidia H100 GPU. Despite its smaller size, Meta claims that Scout outperforms Google’s Gemma 3 and Gemini 2.0 Flash-Lite. This performance level makes it a good choice for developers who are looking for strong performance with lower resource demands.
Maverick is a larger and more capable model, and it is being positioned to compete with advanced systems like GPT-4o and Gemini 2.0 Flash. Meta states that Maverick delivers comparable results in reasoning and coding tasks while using fewer active parameters.
The most powerful model in the lineup, Behemoth, is still in development. Meta says it will serve not just as a high-performing model but also as a foundational system to help train future models. Meta CEO Mark Zuckerberg describes it as “the highest performing base model in the world.”
All three models are built on a Mixture of Experts (MoE) architecture. MoE is a method that activates only the necessary parts of the model for a given task, which helps optimize resource use and improve speed.
Llama 4 Preview
Credit: Meta
Meta describes the Llama 4 models as open source and has made them available for download on the llama platform and on Hugging Face. However, the licensing terms contain restrictions for commercial organisations with more than 700 million monthly active users.
The models are already being integrated into Meta’s AI assistant across various platforms including WhatsApp, Messenger, Instagram, and the web. With these updates, users engaging with Meta AI are now using Llama 4 technology in real time.