Inference API
AI / MLHugging Face
Serverless inference for 200,000+ open-source models including NLP, vision, and audio
Reported Latency
Typical performance from public reports -- not live measurements
Average: 1,200ms
p50
800ms
p95
3,500ms
p99
8,000ms
Reliability
99.6%observed uptime
Pricing
Free tier
Rate-limited free tier
Paid starts at
$0.06 / hour (Inference Endpoints)
Technical Details
Authentication
API Key
Rate limit
30 RPM (free)
Protocols
REST
SDKs
PythonNode.jsRust
Regions
USEU
Last updated: 2026-04-12