Llama 4 introduces a Mixture-of-Experts architecture with three model variants:
Llama 4 Scout offers 17 billion active parameters with 16 experts and 109 billion total parameters. It fits on a single H100 GPU while supporting an industry-leading 10 million token context window. Scout outperforms previous Llama models and competes with leading alternatives for its class.
Llama 4 Maverick uses 17 billion active parameters with 128 experts and 400 billion total parameters. It offers a 1 million token context window and competitive performance against GPT-4o and Gemini 2.0 Flash on multimodal benchmarks.
Llama 4 Behemoth remains in training, with 288 billion active parameters and approximately 2 trillion total parameters. Meta uses it as a teacher model to distil capability into Scout and Maverick.