Inference Index
Directory · 02 / 07

Providers.

Labs, aggregators, alternative networks, and self-host platforms — the full map of where inference actually lives.

/
Labs

Anthropic

Maker of Claude. First-party API for the Claude 4.x family and the reference implementation for every downstream tool.

prompt-cachingbatch-apitool-usevision
Models
3
Privacy
Standard
Founded
2021
HQ
San Francisco, CA
Labs

OpenAI

First-party API for the GPT-5 and o-series families. The broadest feature surface in the industry.

realtime-voiceassistants-apifine-tuningbatch-api
Models
3
Privacy
Standard
Founded
2015
HQ
San Francisco, CA
Labs

Google AI Studio

Google’s direct Gemini API. Free tier is the most generous in the industry; 2M context on Ultra.

2m-contextnative-videonative-audiofree-tier
Models
2
Privacy
Standard
Founded
2023
HQ
Mountain View, CA
Labs

xAI

Direct access to Grok. The only way to get real-time X firehose grounding.

live-x-searchtool-usevision
Models
1
Privacy
Standard
Founded
2023
HQ
Palo Alto, CA
Labs

DeepSeek

First-party API for DeepSeek models at rock-bottom prices. Off-peak discounts up to 75%.

off-peak-discountprompt-cachingopenai-compatible
Models
2
Privacy
Standard
Founded
2023
HQ
Hangzhou, China
Labs

Mistral

European frontier. La Plateforme offers first-party access to Mistral X and fine-tuning.

eu-data-residencyfine-tuningjson-modeguardrailing
Models
1
Privacy
Standard
Founded
2023
HQ
Paris, France
Aggregators

OpenRouter

Unified API for 200+ models. One API key, one format, automatic fallbacks across every major provider.

unified-apiauto-fallbackusage-dashboardfree-tier
Models
7
Privacy
Standard
Founded
2023
HQ
San Francisco, CA
Aggregators

Together AI

Fast inference for open-source models. Specializes in fine-tuning and custom deployments.

fine-tuningcustom-modelsfast-inferencefree-tier
Models
5
Privacy
Standard
Founded
2022
HQ
San Francisco, CA
Aggregators

Groq

Ultra-fast inference powered by custom LPU hardware. The fastest tokens-per-second in the industry.

fastest-inferencecustom-hardwarefree-tieropenai-compatible
Models
4
Privacy
Standard
Founded
2016
HQ
Mountain View, CA
Aggregators

Fireworks AI

Production-grade inference for open models with aggressive quantization and dedicated deployments.

fp8-quantizationdedicated-deploymentsfunction-callingfine-tuning
Models
4
Privacy
Standard
Founded
2022
HQ
Redwood City, CA
Aggregators

Amazon Bedrock

Enterprise model hosting on AWS. Anthropic, Meta, Mistral, and Amazon Nova, all inside your VPC.

vpc-privateiam-authguardrailsknowledge-bases
Models
5
Privacy
Standard
Founded
2023
HQ
Seattle, WA
Aggregators

Azure AI

Microsoft’s enterprise AI cloud. Native GPT-5 access, plus Llama and Mistral in regional deployments.

private-networkingcontent-filtersfine-tuningregional-deployments
Models
5
Privacy
Standard
Founded
2023
HQ
Redmond, WA
Aggregators

Replicate

Run open-source models with one line of code. Billed by the second of GPU time, great for custom models.

per-second-billingcog-custom-modelsimage-genvideo-gen
Models
3
Privacy
Standard
Founded
2019
HQ
San Francisco, CA
Alternative

Venice.ai

Privacy-first AI platform. No logs, no training on your data. Powered by open-source models and decentralized infrastructure.

no-logsprivacy-firstimage-generationuncensored-models
Models
4
Privacy
No Logs
Founded
2024
HQ
Decentralized
Alternative

CheapTokens.ai

Discounted Venice.ai API credits using time-decay pricing. Up to 75% off by midnight UTC.

time-decay-pricingcrypto-paymentsopenai-compatiblex402-protocol
Models
3
Privacy
No Logs
Founded
2026
HQ
Decentralized
Alternative

Hyperbolic

Decentralized GPU marketplace with surprisingly cheap frontier-model inference.

decentralized-gpusbase-tier-pricingopenai-compatible
Models
3
Privacy
Standard
Founded
2023
HQ
San Francisco, CA
Alternative

Akash Network

Permissionless supercloud. Deploy model inference on a decentralized GPU marketplace paid in AKT.

fully-decentralizedakt-paymentsbyo-container
Models
2
Privacy
On Chain
Founded
2018
HQ
Decentralized
Self-host

Ollama

The easiest way to run open-weight models on your own machine. One command; zero config.

single-command-installgguf-supportrest-apiopenai-compatible
Models
4
Privacy
No Logs
Founded
2023
HQ
Palo Alto, CA
Self-host

vLLM

The production-grade open-source inference engine. Powers most serverless providers under the hood.

paged-attentioncontinuous-batchingopenai-compatiblequantization
Models
5
Privacy
No Logs
Founded
2023
HQ
Berkeley, CA
Self-host

LM Studio

Desktop app for running local models with a polished UI. The ChatGPT of self-hosting.

guiopenai-compatible-servergguf-catalogmlx-apple-silicon
Models
4
Privacy
No Logs
Founded
2023
HQ
Brooklyn, NY
Aggregators

Cerebras

Wafer-scale chips doing 2,000+ tokens/second. Redefining what "fast" means for open-weight models.

wafer-scaleworld-record-throughputopenai-compatible
Models
2
Privacy
Standard
Founded
2016
HQ
Sunnyvale, CA
Showing 21 of 21