Predictions API
AI / MLReplicate
Run open-source ML models (Llama, Stable Diffusion, Whisper) via a unified API with auto-scaling
Reported Latency
Typical performance from public reports -- not live measurements
Average: 2,500ms
p50
1,800ms
p95
8,000ms
p99
15,000ms
Reliability
99.7%observed uptime
Pricing
Free tier
None
Paid starts at
$0.000225 / second (CPU)
Technical Details
Authentication
API Key
Rate limit
600 RPM
Protocols
RESTWebSocket
SDKs
PythonNode.jsSwiftElixir
Regions
US
Last updated: 2026-04-12