Unified API for 500+ AI Models

One API key. 6 providers. Every model you need. OpenAI-compatible for seamless integration.

500+
AI Models
<100ms
Avg Response

Loading models...

Loading models...

Find Your Model

Model Recommendation Quiz

Answer 5 quick questions to find the perfect model for your needs

Question 1 of 520%

What's your primary use case?

Side-by-Side

Compare Top Models

See how flagship models stack up across performance, capabilities, and pricing

ModelProviderContextInput PriceOutput PriceVisionFunctionsJSON ModeBest For
GPT-4 Turbo
gpt-4-turbo
OpenAI128K$10/1M$30/1MGeneral purpose, vision tasks
Claude 3.5 Sonnet
claude-3-5-sonnet
Anthropic200K$3/1M$15/1MCode, analysis, long context
Gemini 1.5 Pro
gemini-1.5-pro
Google2M$1.25/1M$5/1MMassive context, multimodal
Llama 3.2 70B
llama-3.2-70b
Meta128K$0.88/1M$0.88/1MCost-effective, open source
Mistral Large 2
mistral-large-2
Mistral128K$2/1M$6/1MEuropean, multilingual
DeepSeek Coder V2
deepseek-coder
DeepSeek128K$0.27/1M$1.10/1MCode generation, debugging
Available
Not Available
Prices shown are per 1M tokens
At a Glance

Model Capabilities Matrix

Quick reference guide to help you choose the right model for your task

Model
Code Gen
Context
Vision
Functions
JSON
Multilingual
Cost
Speed
GPT-4 Turbo
OpenAI
High
128K
Excellent
Native
Native
Excellent
Medium
Fast
Claude 3.5 Sonnet
Anthropic
Excellent
200K
Excellent
Native
Native
Excellent
Good
Fast
Gemini 1.5 Pro
Google
High
2M
Excellent
Native
Native
Excellent
Excellent
Fast
Llama 3.2 70B
Meta
Good
128K
Good
Native
Native
Good
Excellent
Very Fast
Mistral Large 2
Mistral
High
128K
No
Native
Native
Excellent
Good
Very Fast
DeepSeek Coder V2
DeepSeek
Excellent
128K
No
Native
Native
Good
Excellent
Very Fast

Legend

Performance Levels:
ExcellentTop-tier performance
High/GoodStrong performance
MediumAdequate performance
Categories:
Code Gen: Code generation quality
Context: Token context window size
Vision: Image understanding capability
Functions: Function calling support
JSON: JSON mode support
Cost: Price-to-performance ratio

Start Building in Seconds

OpenAI-compatible API for seamless integration. Switch models with one parameter.

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
  api_key="your_runaicloud_key",
  base_url="https://api.runaicloud.com/v1"
)

response = client.chat.completions.create(
  model="llama-3-1-70b",
  messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Switch between any model by changing the model parameter

Find Your Perfect Model

Choose based on your specific use case

Code Generation

Write, review, and debug code

Llama 3.1 70B92/100
DeepSeek Coder90/100
Mixtral 8x7B87/100
Key: Syntax accuracy, multi-language support

Long Documents

Analyze entire codebases or books

Llama 3.1 405B128K tokens
Llama 3.1 70B128K tokens
Gemini 1.5 Flash1M tokens
Key: Full context retention, summarization

High Volume

Maximum throughput, lowest cost

Llama 3.2 70B$0.16/1M
Mixtral 8x7B$0.07/1M
Qwen 2.5$0.04/1M
Key: Low latency, scalable, open source

Vision & Multimodal

Understand images, videos, documents

Gemini 1.5 FlashImage+Video
Gemini 1.5 ProImage+Audio
Llama 3.2 90B VisionImage+Text
Key: OCR, image analysis, chart understanding

Speed & Latency

Fastest response times

Llama 3.2 8B<40ms
Gemini 1.5 Flash<80ms
Mixtral 8x7B<60ms
Key: Real-time apps, chat interfaces

Function Calling

Integrate with APIs and tools

Llama 3.1 70B90%
Mistral Large89%
Llama 3.1 405B92%
Key: Tool use, API calls, structured output

Frequently Asked Questions

Everything you need to know about our AI models

Model Selection

How do I choose the right model?

Consider 3 factors: (1) Use case - code, chat, vision, etc. (2) Budget - Llama for volume, GPT-4 for quality. (3) Context length - how much text you need to process. Use our Use Case Guide above for recommendations.

What's the difference between GPT-4 and Claude?

GPT-4 excels at complex reasoning and has the most robust function calling. Claude 3.5 Sonnet is better for code generation and has a 200K context window vs GPT-4's 128K. Claude is also 3x cheaper per token.

Can I switch models mid-conversation?

Yes! Simply change the 'model' parameter in your API request. Your conversation history works with any model. However, function calling schemas and system prompts may need adjustments between providers.

Pricing

How does context length affect pricing?

Longer context windows cost more because they require more compute. For example, Gemini 1.5 Pro (1M context) is ~$1.25/1M input tokens, while GPT-3.5 Turbo (16K context) is only $0.50/1M. Only pay for tokens you actually use.

What's the cheapest high-quality model?

Llama 3.2 70B offers excellent quality at $0.16/1M input tokens - 60x cheaper than GPT-4 Turbo. For even lower costs, try Mixtral 8x7B ($0.07/1M) or smaller Llama models.

Do you charge for system messages?

Yes, all tokens count - system messages, user messages, assistant messages, and function definitions. Use our token counter in the API docs to estimate costs accurately.

Technical

What's the maximum context length I can use?

Gemini 1.5 Pro supports up to 1 million tokens (~750,000 words or ~3,500 pages). Claude 3.5 Sonnet supports 200K tokens. Most other models support 32K-128K tokens.

Do all models support function calling?

No. GPT-4, Claude 3.5, Gemini 1.5, Mistral Large, and recent Llama models support native function calling. Check the model card for 'Function Calling' capability.

Can I use vision models with PDFs?

Yes! Models like GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro can process PDFs, images, and screenshots. Encode images as base64 or provide URLs in the messages array.

Integration

Is your API compatible with OpenAI SDKs?

Yes! Our API is 100% OpenAI-compatible. Use the official OpenAI Python/JavaScript libraries - just change the base_url and api_key. All existing code works without modification.

Do you support streaming responses?

Yes, all models support streaming. Set stream=true in your request to receive tokens in real-time as they're generated, perfect for chat interfaces.

Can I fine-tune models?

We support fine-tuning for GPT-3.5 Turbo and open-source models like Llama and Mistral. Contact us for enterprise fine-tuning on dedicated infrastructure.

Need to Fine-Tune or Train Your Own Models?

Access dedicated GPUs (H100, A100) and multi-node clusters for custom model training and fine-tuning. Deploy in minutes with pre-configured templates.

GPU Instances
From $0.59/hr
Single GPUs for fine-tuning
GPU Clusters
From $27/hr (16 GPUs)
Multi-node distributed training
Pre-configured
Templates included
PyTorch, Axolotl, Slurm ready