Model Management

Model Performance Matrix

High Quality Low Latency
Quality Score
Latency (ms)
GPT-4o (98/120ms)
Claude 3.5 Sonnet (94/180ms)
Llama 3 70B (88/240ms)
Mistral Large (91/310ms)

Available Models

GPT-4o

OpenAI

Context

128k Tokens

Cost / 1M

$5.00

TEXTVISIONAUDIO

Claude 3.5 Sonnet

Anthropic

Context

200k Tokens

Cost / 1M

$3.00

TEXTVISION

Llama 3.1 405B

Meta / Open Source

Context

128k Tokens

Cost / 1M

$0.80

TEXTCODE

Custom Model Endpoints

Endpoint NameBase URLStatusLatencyActions
Local-Llama-FineTune
http://192.168.1.45:8000/v1ONLINE42ms
Corporate-Mistral-Proxy
https://ai-proxy.internal.coSTANDBY--