Deploy Llama 3 70B for Production Inference

A validated GPU cloud stack for self-hosting Llama 3 70B at production latency. Uses A100 80GB on-demand with room for 2k-4k req/min.

Estimated monthly cost
$2400.00
Complexity
4 / 5 · Involved
Components
1

Recommended stack

  1. inference-gpu
    A100 80GB On-demand (Secure Cloud)
    via RunPod

    2 × A100 80GB, 720h on-demand. Cheaper with spot or yearly reservation.

    $2200.00
    / month