DeepSeek ships two flagship models — V3 for fast everyday chat and R1 for extended reasoning. They cost different amounts, behave differently under the hood, and suit different workloads. This post breaks down when each one actually saves you money and latency.
See the full side-by-side comparison page for live pricing and benchmarks.
| DeepSeek V3 | DeepSeek R1 | |
|---|---|---|
| Architecture | 671B MoE (37B active) | 671B MoE (37B active) |
| Context window | 131,072 tokens | 131,072 tokens |
| Best at | Fast chat, code, tool-use | Reasoning, math, hard code |
| Input / output price | $0.27 / $1.10 per 1M | $0.55 / $2.19 per 1M |
| Open-weight | Yes — DeepSeek License | Yes — DeepSeek License |
| Multi-step CoT cost | N/A (short responses) | Higher output token count |
Both models are open-weight — you can also self-host the weights, or use third-party hosting (e.g. Together.ai runs both). The numbers above are from DeepSeek's own platform, verified via their official pricing docs.
V3 is a general-purpose chat / code model. Its cost structure is aggressive — at $0.27 input, it's one of the cheapest production-grade chat APIs that also ranks in the top tier on most benchmarks.
Use V3 for:
Don't reach for R1 just because the task is "hard." Try V3 first. If V3 fails systematically on your eval set, then R1's reasoning layer starts to pay for itself.
R1 is an extended reasoning model — it generates an internal chain-of-thought before the final answer, similar to OpenAI's o1/o3 line. The internal reasoning is billed as output tokens, so R1's per-response cost is often 3-5× V3's for the same user prompt.
The premium is worth paying when:
For everything else, V3 + a well-designed prompt gives you 90% of R1's quality at roughly 1/3 the total cost.
R1's extended reasoning adds time-to-first-token (TTFT) that V3 doesn't incur. On interactive surfaces (chat UIs, tab-autocomplete, inline editors), users notice. Rule of thumb:
Our pricing matrix lets you sort by blended price across both; latency is not currently surfaced in the matrix but is being tracked per ModelHosting.
See also: our LLM benchmark rankings track both models on MMLU, GPQA, HumanEval, and MATH.
DeepSeek isn't the only Chinese model worth evaluating. On price, it often beats Qwen 2.5 Max by 4-5×. On reasoning benchmarks, GLM-4-Plus is a close competitor. The right shortlist depends on whether you need overseas routing (DashScope has an international endpoint, DeepSeek doesn't) and whether your infra team can tolerate operating open-weight models directly.
Our provider directory lists all 20 tracked Chinese and global AI-infra providers, and the Providers page filters to LLM API vendors specifically.
DeepSeek's publicly-posted pricing occasionally runs promotional off-peak windows where both V3 and R1 are ~50% cheaper. The dollar figures above are the standard rate; check the provider's page at time of purchase.
Last updated: 2026-04-22. Prices and benchmark tiers verified via the DeepSeek platform docs. See our editorial independence policy — we earn affiliate commission on some provider signups, but it never affects which model we recommend.