Inference API

AI / ML

Hugging Face

Serverless inference for 200,000+ open-source models including NLP, vision, and audio

Reported Latency

Typical performance from public reports -- not live measurements

Average: 1,200ms

p50

800ms

p95

3,500ms

p99

8,000ms

99.6%observed uptime

Free tier

Rate-limited free tier

Paid starts at

$0.06 / hour (Inference Endpoints)

Authentication

API Key

Rate limit

30 RPM (free)

Protocols

REST

SDKs

PythonNode.jsRust

Regions

USEU

Last updated: 2026-04-12