Predictions API

AI / ML

Replicate

Run open-source ML models (Llama, Stable Diffusion, Whisper) via a unified API with auto-scaling

Reported Latency

Typical performance from public reports -- not live measurements

Average: 2,500ms

p50

1,800ms

p95

8,000ms

p99

15,000ms

99.7%observed uptime

Free tier

None

Paid starts at

$0.000225 / second (CPU)

Authentication

API Key

Rate limit

600 RPM

Protocols

RESTWebSocket

SDKs

PythonNode.jsSwiftElixir

Regions

Last updated: 2026-04-12