MetaOpen-weight·131K context·405B params·Llama 3.1 Community License
Llama 3.1 405B Instructmeta/llama-3-1-405b-instruct
Meta's largest open-weight model — 405B dense Transformer. Widely hosted (Fireworks, Together, Replicate, etc.); the reference point for open-weight flagship quality.
Cheapest blended:$3.00 / 1M tokenson Fireworks.ai · 2 providers listed
Pricing across providers
Sort by:
| Provider | Input /1M | Output /1M | Blended /1M | Latency p50 | Format | Freshness | Action |
|---|---|---|---|---|---|---|---|
| Fireworks.ai accounts/fireworks/models/llama-v3p1-405b-instruct | $3.00 | $3.00 | $3.00 | — | OpenAI-compatible | Verified 2d ago | Try → |
| Together.ai meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | $3.50 | $3.50 | $3.50 | — | OpenAI-compatible | Verified 2d ago | Try → |
Affiliate disclosure: We may earn a commission from qualified signups. Pricing independence is enforced at the data layer — see our Editorial Independence Policy.
Works with
Point any of these clients at a hosting's base URL — they all speak at least one of this model's endpoint protocols (OPENAI_COMPATIBLE).
Capabilities
- chat
- reasoning
- code
- tool_use
Languages: en, de, fr, es, it, pt, hi, th
Code samples
Example using Fireworks.ai — the cheapest hosting for this model as of last verification. Swap base_url and model to use a different provider from the matrix above.
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.example.com/v1",
)
response = client.chat.completions.create(
model="accounts/fireworks/models/llama-v3p1-405b-instruct",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
Technical specs
- Context
- 131K
- Max output
- 4K
- Parameters
- 405B
- Release
- —
- Training cutoff
- —
- License
- Llama 3.1 Community License
Similar models
Compare with
- Llama 3.1 405B Instruct vs Hunyuan LargeComparison planned — not yet published
- Llama 3.1 405B Instruct vs MiniMax-Text-01Comparison planned — not yet published
Frequently asked
How much does Llama 3.1 405B Instruct cost?+−
The cheapest public hosting is $3.00 per 1M blended tokens on Fireworks.ai. 2 total providers are listed above with per-input / per-output / cached pricing.
Is Llama 3.1 405B Instruct open-source? Can I fine-tune it?+−
Yes. Llama 3.1 405B Instruct is open-weight under the Llama 3.1 Community License license. Weights are available on Hugging Face for local inference, fine-tuning, and commercial use (see license for specific terms).
Is Llama 3.1 405B Instruct OpenAI-compatible?+−
Most listed hostings expose an OpenAI-compatible API, so you can point an existing
openai SDK client at the Provider's base_url and use the Provider's model name. See the Code Samples above for a copy-pasteable example.What's the maximum context window for Llama 3.1 405B Instruct?+−
The model supports up to 131,072 tokens of context (input + output). Some hosted versions may impose a smaller limit — check the "Context" column in the pricing matrix for each provider.