Inference API

AI / ML

Hugging Face

Serverless inference for 200,000+ open-source models including NLP, vision, and audio

Reported Latency

Typical performance from public reports -- not live measurements

Average: 1,200ms
p50
800ms
p95
3,500ms
p99
8,000ms

Reliability

99.6%observed uptime

Pricing

Free tier

Rate-limited free tier

Paid starts at

$0.06 / hour (Inference Endpoints)

Technical Details

Authentication

API Key

Rate limit

30 RPM (free)

Protocols

REST

SDKs

PythonNode.jsRust

Regions

USEU
DocumentationBase URL

Last updated: 2026-04-12