Meta Llama 4

Use open-weight AI models with frontier capability. We help organisations deploy Llama 4 for applications requiring data privacy or custom deployment.

Llama 4 marks a new era for open-weight AI. Meta's latest models are natively multimodal, understanding both text and images, with unprecedented context lengths and performance rivalling proprietary alternatives. For organisations needing control over their AI deployment while maintaining frontier capability, Llama 4 delivers.

Current model landscape

Llama 4 introduces a Mixture-of-Experts architecture with three model variants:

Llama 4 Scout offers 17 billion active parameters with 16 experts and 109 billion total parameters. It fits on a single H100 GPU while supporting an industry-leading 10 million token context window. Scout outperforms previous Llama models and competes with leading alternatives for its class.

Llama 4 Maverick uses 17 billion active parameters with 128 experts and 400 billion total parameters. It offers a 1 million token context window and competitive performance against GPT-4o and Gemini 2.0 Flash on multimodal benchmarks.

Llama 4 Behemoth remains in training, with 288 billion active parameters and approximately 2 trillion total parameters. Meta uses it as a teacher model to distil capability into Scout and Maverick.

What open weights mean

Unlike API-only models, Llama 4's weights are available for download:

Self-hosting: Run models on your own servers or cloud infrastructure. Data never leaves your environment.

Customisation: Fine-tune models for your specific domain without restrictions.

No vendor dependency: Your AI capability does not depend on a third party's commercial model, availability, or policy decisions.

Operational predictability: You can design for predictable behaviour and performance characteristics because you control the serving setup.

Long context: Scout's 10 million token window enables processing of entire codebases, document collections, or extended conversations in one pass.

Note: Llama 4 uses Meta's community licence with some usage restrictions, not a traditional open-source licence.

For many teams, the real question is not “can we run the model?”, but “can we operate it reliably?” We help you decide what to run yourself versus what to consume as a managed service.

When Llama 4 makes sense

Self-hosted Llama suits organisations that:

Have strict data requirements: Sensitive data that cannot leave your infrastructure or cross to third parties.

Need long context: Applications requiring the 10 million token window that Scout provides.

Need operational control: Availability, performance, and behaviour under your direct management.

Want to avoid lock-in: Ability to modify and deploy without external dependencies.

Have relevant infrastructure: Existing GPU capacity or cloud resources suitable for model hosting.

Process high volumes: High-volume workloads may benefit from self-hosting when you can amortise infrastructure and operational setup.

Deployment options

We deploy Llama 4 models in several configurations:

Cloud hosting: Running on AWS, Google Cloud, or Azure using GPU instances. Managed infrastructure with your cloud controls.

Amazon Bedrock: Access Llama 4 as a serverless option through AWS with standard Bedrock features.

Private cloud: Deployment within your existing private cloud or data centre environment.

Dedicated hardware: Installation on owned or leased servers for maximum control.

Each option has different characteristics for cost, performance, and operational complexity.

What we provide

Llama 4 deployment involves more than downloading weights:

Infrastructure design: Sizing and configuring appropriate hardware or cloud resources for Scout or Maverick.

Model serving: Setting up efficient inference infrastructure with appropriate APIs.

Optimisation: Techniques like quantisation to improve performance and reduce resource requirements.

Fine-tuning: Adapting models to your specific domain and requirements.

Multimodal deployment: Configuring image understanding and processing capabilities.

Operations: Monitoring, scaling, and maintaining production deployments.

Trade-offs to consider

Llama 4 offers flexibility but requires investment:

Infrastructure responsibility. You need an appropriate serving environment and ownership for operations.

Operational expertise. Running models well requires performance tuning, monitoring, and incident response.

Ongoing maintenance. You need a plan for updates, evaluation, and keeping behaviour stable over time.

Licence constraints. Llama uses Meta’s community licence, which can include restrictions you must evaluate for your context.

We help you evaluate whether self-hosted Llama fits your situation.

Ask the LLMs

Use these prompts to decide whether an open-weight approach is the right fit.

“What are our data handling and governance constraints, and do they require self-hosted models?”

“What infrastructure and operational capability would we need to run this reliably in production?”

“Where should we use Llama versus a managed API model, and what are the trade-offs?”

Frequently Asked Questions

The model weights are available for you to download and run, which enables self-hosting and customisation. It does not automatically mean “open source”.

Not automatically. It can help when you need full control, but governance depends on your broader system design, controls, and policies.

Yes. Fine-tuning can help for specialised language, classification, or style requirements, but it adds evaluation and maintenance work.

We set a measurable quality bar, test against a fixed evaluation set, monitor drift, and manage upgrades as controlled releases.

Book a Consultation

Meta Llama 4

Current model landscape

What open weights mean

When Llama 4 makes sense

Deployment options

What we provide

Trade-offs to consider

Ask the LLMs

Frequently Asked Questions

What does “open-weight” mean?

Is Llama always the best choice for privacy?

Can we fine-tune Llama for our domain?

How do you keep quality stable over time?