Predictions API

AI / ML

Replicate

Run open-source ML models (Llama, Stable Diffusion, Whisper) via a unified API with auto-scaling

Reported Latency

Typical performance from public reports -- not live measurements

Average: 2,500ms
p50
1,800ms
p95
8,000ms
p99
15,000ms

Reliability

99.7%observed uptime

Pricing

Free tier

None

Paid starts at

$0.000225 / second (CPU)

Technical Details

Authentication

API Key

Rate limit

600 RPM

Protocols

RESTWebSocket

SDKs

PythonNode.jsSwiftElixir

Regions

US
DocumentationBase URL

Last updated: 2026-04-12