Directory · 02 / 07

Providers.

Labs, aggregators, alternative networks, and self-host platforms — the full map of where inference actually lives.

21 tracked

Labs

Anthropic

Maker of Claude. First-party API for the Claude 4.x family and the reference implementation for every downstream tool.

prompt-cachingbatch-apitool-usevision

OpenAI

First-party API for the GPT-5 and o-series families. The broadest feature surface in the industry.

realtime-voiceassistants-apifine-tuningbatch-api

Google AI Studio

Google’s direct Gemini API. Free tier is the most generous in the industry; 2M context on Ultra.

2m-contextnative-videonative-audiofree-tier

xAI

Direct access to Grok. The only way to get real-time X firehose grounding.

live-x-searchtool-usevision

DeepSeek

First-party API for DeepSeek models at rock-bottom prices. Off-peak discounts up to 75%.

off-peak-discountprompt-cachingopenai-compatible

Mistral

European frontier. La Plateforme offers first-party access to Mistral X and fine-tuning.

eu-data-residencyfine-tuningjson-modeguardrailing

OpenRouter

Unified API for 200+ models. One API key, one format, automatic fallbacks across every major provider.

unified-apiauto-fallbackusage-dashboardfree-tier

Together AI

Fast inference for open-source models. Specializes in fine-tuning and custom deployments.

fine-tuningcustom-modelsfast-inferencefree-tier

Groq

Ultra-fast inference powered by custom LPU hardware. The fastest tokens-per-second in the industry.

fastest-inferencecustom-hardwarefree-tieropenai-compatible

Fireworks AI

Production-grade inference for open models with aggressive quantization and dedicated deployments.

fp8-quantizationdedicated-deploymentsfunction-callingfine-tuning

Amazon Bedrock

Enterprise model hosting on AWS. Anthropic, Meta, Mistral, and Amazon Nova, all inside your VPC.

vpc-privateiam-authguardrailsknowledge-bases

Azure AI

Microsoft’s enterprise AI cloud. Native GPT-5 access, plus Llama and Mistral in regional deployments.

private-networkingcontent-filtersfine-tuningregional-deployments

Replicate

Run open-source models with one line of code. Billed by the second of GPU time, great for custom models.

per-second-billingcog-custom-modelsimage-genvideo-gen

Venice.ai

Privacy-first AI platform. No logs, no training on your data. Powered by open-source models and decentralized infrastructure.

no-logsprivacy-firstimage-generationuncensored-models

CheapTokens.ai

Discounted Venice.ai API credits using time-decay pricing. Up to 75% off by midnight UTC.

time-decay-pricingcrypto-paymentsopenai-compatiblex402-protocol

Hyperbolic

Decentralized GPU marketplace with surprisingly cheap frontier-model inference.

decentralized-gpusbase-tier-pricingopenai-compatible

Akash Network

Permissionless supercloud. Deploy model inference on a decentralized GPU marketplace paid in AKT.

fully-decentralizedakt-paymentsbyo-container

Ollama

The easiest way to run open-weight models on your own machine. One command; zero config.

single-command-installgguf-supportrest-apiopenai-compatible

vLLM

The production-grade open-source inference engine. Powers most serverless providers under the hood.

paged-attentioncontinuous-batchingopenai-compatiblequantization

LM Studio

Desktop app for running local models with a polished UI. The ChatGPT of self-hosting.

guiopenai-compatible-servergguf-catalogmlx-apple-silicon

Cerebras

Wafer-scale chips doing 2,000+ tokens/second. Redefining what "fast" means for open-weight models.

wafer-scaleworld-record-throughputopenai-compatible

Showing 21 of 21