Providers.
Labs, aggregators, alternative networks, and self-host platforms — the full map of where inference actually lives.
Anthropic
Maker of Claude. First-party API for the Claude 4.x family and the reference implementation for every downstream tool.
OpenAI
First-party API for the GPT-5 and o-series families. The broadest feature surface in the industry.
Google AI Studio
Google’s direct Gemini API. Free tier is the most generous in the industry; 2M context on Ultra.
xAI
Direct access to Grok. The only way to get real-time X firehose grounding.
DeepSeek
First-party API for DeepSeek models at rock-bottom prices. Off-peak discounts up to 75%.
Mistral
European frontier. La Plateforme offers first-party access to Mistral X and fine-tuning.
OpenRouter
Unified API for 200+ models. One API key, one format, automatic fallbacks across every major provider.
Together AI
Fast inference for open-source models. Specializes in fine-tuning and custom deployments.
Groq
Ultra-fast inference powered by custom LPU hardware. The fastest tokens-per-second in the industry.
Fireworks AI
Production-grade inference for open models with aggressive quantization and dedicated deployments.
Amazon Bedrock
Enterprise model hosting on AWS. Anthropic, Meta, Mistral, and Amazon Nova, all inside your VPC.
Azure AI
Microsoft’s enterprise AI cloud. Native GPT-5 access, plus Llama and Mistral in regional deployments.
Replicate
Run open-source models with one line of code. Billed by the second of GPU time, great for custom models.
Venice.ai
Privacy-first AI platform. No logs, no training on your data. Powered by open-source models and decentralized infrastructure.
CheapTokens.ai
Discounted Venice.ai API credits using time-decay pricing. Up to 75% off by midnight UTC.
Hyperbolic
Decentralized GPU marketplace with surprisingly cheap frontier-model inference.
Akash Network
Permissionless supercloud. Deploy model inference on a decentralized GPU marketplace paid in AKT.
Ollama
The easiest way to run open-weight models on your own machine. One command; zero config.
vLLM
The production-grade open-source inference engine. Powers most serverless providers under the hood.
LM Studio
Desktop app for running local models with a polished UI. The ChatGPT of self-hosting.
Cerebras
Wafer-scale chips doing 2,000+ tokens/second. Redefining what "fast" means for open-weight models.