Unified API for 500+ AI Models

One API key. 6 providers. Every model you need. OpenAI-compatible for seamless integration.

500+

AI Models

<100ms

Avg Response

Loading models...

Find Your Model

Model Recommendation Quiz

Answer 5 quick questions to find the perfect model for your needs

Question 1 of 520%

What's your primary use case?

Side-by-Side

Compare Top Models

See how flagship models stack up across performance, capabilities, and pricing

Model	Provider	Context	Input Price	Output Price	Best For
GPT-4 Turbo gpt-4-turbo	OpenAI	128K	$10/1M	$30/1M	General purpose, vision tasks
Claude 3.5 Sonnet claude-3-5-sonnet	Anthropic	200K	$3/1M	$15/1M	Code, analysis, long context
Gemini 1.5 Pro gemini-1.5-pro	Google	2M	$1.25/1M	$5/1M	Massive context, multimodal
Llama 3.2 70B llama-3.2-70b	Meta	128K	$0.88/1M	$0.88/1M	Cost-effective, open source
Mistral Large 2 mistral-large-2	Mistral	128K	$2/1M	$6/1M	European, multilingual
DeepSeek Coder V2 deepseek-coder	DeepSeek	128K	$0.27/1M	$1.10/1M	Code generation, debugging

Available

Not Available

Prices shown are per 1M tokens

At a Glance

Model Capabilities Matrix

Quick reference guide to help you choose the right model for your task

Model

Code Gen

Context

Vision

Functions

JSON

Multilingual

Cost

Speed

GPT-4 Turbo

OpenAI

High

128K

Excellent

Native

Excellent

Medium

Fast

Claude 3.5 Sonnet

Anthropic

Excellent

200K

Excellent

Native

Excellent

Good

Fast

Gemini 1.5 Pro

Google

High

Excellent

Native

Excellent

Fast

Llama 3.2 70B

Legend

Performance Levels:

ExcellentTop-tier performance

High/GoodStrong performance

MediumAdequate performance

Categories:

Code Gen: Code generation quality

Context: Token context window size

Vision: Image understanding capability

Functions: Function calling support

JSON: JSON mode support

Cost: Price-to-performance ratio

Start Building in Seconds

OpenAI-compatible API for seamless integration. Switch models with one parameter.

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
  api_key="your_runaicloud_key",
  base_url="https://api.runaicloud.com/v1"
)

response = client.chat.completions.create(
  model="llama-3-1-70b",
  messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Switch between any model by changing the model parameter

Full Documentation Get API Key

Find Your Perfect Model

Choose based on your specific use case

Code Generation

Write, review, and debug code

Llama 3.1 70B92/100

DeepSeek Coder90/100

Mixtral 8x7B87/100

Key: Syntax accuracy, multi-language support

Long Documents

Analyze entire codebases or books

Llama 3.1 405B128K tokens

Llama 3.1 70B128K tokens

Gemini 1.5 Flash1M tokens

Key: Full context retention, summarization

High Volume

Maximum throughput, lowest cost

Llama 3.2 70B$0.16/1M

Mixtral 8x7B$0.07/1M

Qwen 2.5$0.04/1M

Key: Low latency, scalable, open source

Vision & Multimodal

Understand images, videos, documents

Gemini 1.5 FlashImage+Video

Gemini 1.5 ProImage+Audio

Llama 3.2 90B VisionImage+Text

Key: OCR, image analysis, chart understanding

Speed & Latency

Fastest response times

Llama 3.2 8B<40ms

Gemini 1.5 Flash<80ms

Mixtral 8x7B<60ms

Key: Real-time apps, chat interfaces

Function Calling

Integrate with APIs and tools

Llama 3.1 70B90%

Mistral Large89%

Llama 3.1 405B92%

Key: Tool use, API calls, structured output

Frequently Asked Questions

Everything you need to know about our AI models

Model Selection

How do I choose the right model?

Consider 3 factors: (1) Use case - code, chat, vision, etc. (2) Budget - Llama for volume, GPT-4 for quality. (3) Context length - how much text you need to process. Use our Use Case Guide above for recommendations.

What's the difference between GPT-4 and Claude?

GPT-4 excels at complex reasoning and has the most robust function calling. Claude 3.5 Sonnet is better for code generation and has a 200K context window vs GPT-4's 128K. Claude is also 3x cheaper per token.

Can I switch models mid-conversation?

Yes! Simply change the 'model' parameter in your API request. Your conversation history works with any model. However, function calling schemas and system prompts may need adjustments between providers.

Pricing

How does context length affect pricing?

Longer context windows cost more because they require more compute. For example, Gemini 1.5 Pro (1M context) is ~$1.25/1M input tokens, while GPT-3.5 Turbo (16K context) is only $0.50/1M. Only pay for tokens you actually use.

What's the cheapest high-quality model?

Llama 3.2 70B offers excellent quality at $0.16/1M input tokens - 60x cheaper than GPT-4 Turbo. For even lower costs, try Mixtral 8x7B ($0.07/1M) or smaller Llama models.

Do you charge for system messages?

Yes, all tokens count - system messages, user messages, assistant messages, and function definitions. Use our token counter in the API docs to estimate costs accurately.

Technical

What's the maximum context length I can use?

Gemini 1.5 Pro supports up to 1 million tokens (~750,000 words or ~3,500 pages). Claude 3.5 Sonnet supports 200K tokens. Most other models support 32K-128K tokens.

Do all models support function calling?

No. GPT-4, Claude 3.5, Gemini 1.5, Mistral Large, and recent Llama models support native function calling. Check the model card for 'Function Calling' capability.

Can I use vision models with PDFs?

Yes! Models like GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro can process PDFs, images, and screenshots. Encode images as base64 or provide URLs in the messages array.

Integration

Is your API compatible with OpenAI SDKs?

Yes! Our API is 100% OpenAI-compatible. Use the official OpenAI Python/JavaScript libraries - just change the base_url and api_key. All existing code works without modification.

Do you support streaming responses?

Yes, all models support streaming. Set stream=true in your request to receive tokens in real-time as they're generated, perfect for chat interfaces.

Can I fine-tune models?

We support fine-tuning for GPT-3.5 Turbo and open-source models like Llama and Mistral. Contact us for enterprise fine-tuning on dedicated infrastructure.

Need to Fine-Tune or Train Your Own Models?

Access dedicated GPUs (H100, A100) and multi-node clusters for custom model training and fine-tuning. Deploy in minutes with pre-configured templates.

GPU Instances

From $0.59/hr

Single GPUs for fine-tuning

GPU Clusters

From $27/hr (16 GPUs)

Multi-node distributed training

Pre-configured

Templates included

PyTorch, Axolotl, Slurm ready

Explore GPU Clusters View GPU Pricing