Unified API for 500+ AI Models
One API key. 6 providers. Every model you need. OpenAI-compatible for seamless integration.
Loading models...
Loading models...
Model Recommendation Quiz
Answer 5 quick questions to find the perfect model for your needs
What's your primary use case?
Compare Top Models
See how flagship models stack up across performance, capabilities, and pricing
Model | Provider | Context | Input Price | Output Price | Vision | Functions | JSON Mode | Best For |
---|---|---|---|---|---|---|---|---|
GPT-4 Turbo gpt-4-turbo | OpenAI | 128K | $10/1M | $30/1M | General purpose, vision tasks | |||
Claude 3.5 Sonnet claude-3-5-sonnet | Anthropic | 200K | $3/1M | $15/1M | Code, analysis, long context | |||
Gemini 1.5 Pro gemini-1.5-pro | 2M | $1.25/1M | $5/1M | Massive context, multimodal | ||||
Llama 3.2 70B llama-3.2-70b | Meta | 128K | $0.88/1M | $0.88/1M | Cost-effective, open source | |||
Mistral Large 2 mistral-large-2 | Mistral | 128K | $2/1M | $6/1M | European, multilingual | |||
DeepSeek Coder V2 deepseek-coder | DeepSeek | 128K | $0.27/1M | $1.10/1M | Code generation, debugging |
Model Capabilities Matrix
Quick reference guide to help you choose the right model for your task
Legend
Start Building in Seconds
OpenAI-compatible API for seamless integration. Switch models with one parameter.
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="your_runaicloud_key",
base_url="https://api.runaicloud.com/v1"
)
response = client.chat.completions.create(
model="llama-3-1-70b",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Switch between any model by changing the model
parameter
Find Your Perfect Model
Choose based on your specific use case
Code Generation
Write, review, and debug code
Long Documents
Analyze entire codebases or books
High Volume
Maximum throughput, lowest cost
Vision & Multimodal
Understand images, videos, documents
Speed & Latency
Fastest response times
Function Calling
Integrate with APIs and tools
Frequently Asked Questions
Everything you need to know about our AI models
Model Selection
How do I choose the right model?
Consider 3 factors: (1) Use case - code, chat, vision, etc. (2) Budget - Llama for volume, GPT-4 for quality. (3) Context length - how much text you need to process. Use our Use Case Guide above for recommendations.
What's the difference between GPT-4 and Claude?
GPT-4 excels at complex reasoning and has the most robust function calling. Claude 3.5 Sonnet is better for code generation and has a 200K context window vs GPT-4's 128K. Claude is also 3x cheaper per token.
Can I switch models mid-conversation?
Yes! Simply change the 'model' parameter in your API request. Your conversation history works with any model. However, function calling schemas and system prompts may need adjustments between providers.
Pricing
How does context length affect pricing?
Longer context windows cost more because they require more compute. For example, Gemini 1.5 Pro (1M context) is ~$1.25/1M input tokens, while GPT-3.5 Turbo (16K context) is only $0.50/1M. Only pay for tokens you actually use.
What's the cheapest high-quality model?
Llama 3.2 70B offers excellent quality at $0.16/1M input tokens - 60x cheaper than GPT-4 Turbo. For even lower costs, try Mixtral 8x7B ($0.07/1M) or smaller Llama models.
Do you charge for system messages?
Yes, all tokens count - system messages, user messages, assistant messages, and function definitions. Use our token counter in the API docs to estimate costs accurately.
Technical
What's the maximum context length I can use?
Gemini 1.5 Pro supports up to 1 million tokens (~750,000 words or ~3,500 pages). Claude 3.5 Sonnet supports 200K tokens. Most other models support 32K-128K tokens.
Do all models support function calling?
No. GPT-4, Claude 3.5, Gemini 1.5, Mistral Large, and recent Llama models support native function calling. Check the model card for 'Function Calling' capability.
Can I use vision models with PDFs?
Yes! Models like GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro can process PDFs, images, and screenshots. Encode images as base64 or provide URLs in the messages array.
Integration
Is your API compatible with OpenAI SDKs?
Yes! Our API is 100% OpenAI-compatible. Use the official OpenAI Python/JavaScript libraries - just change the base_url and api_key. All existing code works without modification.
Do you support streaming responses?
Yes, all models support streaming. Set stream=true in your request to receive tokens in real-time as they're generated, perfect for chat interfaces.
Can I fine-tune models?
We support fine-tuning for GPT-3.5 Turbo and open-source models like Llama and Mistral. Contact us for enterprise fine-tuning on dedicated infrastructure.
Need to Fine-Tune or Train Your Own Models?
Access dedicated GPUs (H100, A100) and multi-node clusters for custom model training and fine-tuning. Deploy in minutes with pre-configured templates.