Simple, Transparent Pricing

Pay only for what you use. No monthly subscriptions. No hidden fees. Choose from API access, dedicated GPUs, or multi-node clusters.

How It Works

1. Add Credits

Add any amount starting from $10. Your credits never expire.

2. Use AI Models

Access 500+ models for chat, audio, code, images, embeddings & moderation. Credits deduct based on actual usage.

3. Top Up Anytime

Add more credits whenever you need. No commitments.

Get Started - Add Credits

No credit card required to sign up

Multi-Node GPU Clusters

GPU Cluster Pricing

High-performance multi-node clusters for distributed training, HPC, and large-scale inference

60-75% cheaper than AWS, Azure, or GCP

NVIDIA H100 80GB

Flagship GPU for AI training

$3.23/hr
Per GPU$3.23/hour
2-Node Cluster (16 GPUs)$51.65/hour
4-Node Cluster (32 GPUs)$103.30/hour
8-Node Cluster (64 GPUs)$206.59/hour
3.2 Tbps InfiniBand interconnect
80GB HBM3 memory per GPU
Configure H100 Cluster

NVIDIA A100 80GB

Proven performance for AI workloads

$1.69/hr
Per GPU$1.69/hour
2-Node Cluster (16 GPUs)$27.04/hour
4-Node Cluster (32 GPUs)$54.08/hour
8-Node Cluster (64 GPUs)$108.16/hour
1.6 Tbps InfiniBand interconnect
80GB HBM2e memory per GPU
Configure A100 Cluster
GPU TypePer GPU2 Nodes (16 GPUs)4 Nodes (32 GPUs)Interconnect
NVIDIA H200 141GB
Next-gen flagship
$4.29/hr$68.64/hr$137.28/hr3.2 Tbps
NVIDIA B200
Latest Blackwell architecture
$5.49/hr$87.84/hr$175.68/hr3.2 Tbps
NVIDIA H100 NVL
Optimized for inference
$3.89/hr$62.24/hr$124.48/hr3.2 Tbps
NVIDIA L40s
Cost-effective option
$0.99/hr$15.84/hr$31.68/hr1.6 Tbps
2-8
Nodes per cluster
1-8
GPUs per node
Per-Second
Billing (no minimums)

What's Included:

Pre-configured Templates
PyTorch, Slurm, Axolotl, TensorFlow, Ray
Ultra-Fast Networking
Up to 3.2 Tbps InfiniBand/RoCE v2
SSH Access to All Nodes
Direct access to primary and worker nodes
Flexible Storage
100GB - 5TB per node
Deploy in 1-2 Minutes
Instant provisioning, no waiting
Environment Pre-configured
NCCL, CUDA, distributed training ready
Dedicated GPU Instances

GPU Instance Pricing

Single GPU instances for development, fine-tuning, and inference

Per-second billing • No setup fees • Instant deployment

GPU ModelVRAMPer HourPer DayPer MonthBest For
NVIDIA H100 80GB
80GB$3.23$77.52$2,357.90Large model training
NVIDIA A100 80GB
80GB$1.69$40.56$1,233.70Fine-tuning, training
NVIDIA A100 40GB
40GB$1.29$30.96$941.70Medium models
NVIDIA L40s
48GB$0.99$23.76$722.70Inference, dev
NVIDIA RTX A6000
48GB$0.79$18.96$576.70Development, testing
NVIDIA RTX 4090
24GB$0.59$14.16$430.70Small models, prototyping

Compare Our Pricing

See how much you can save compared to major cloud providers

ConfigurationRunAICloudAWSAzureGCPSavings
1x H100 80GB
Per hour
$3.23
$8.14$9.45$8.9260-65% off
1x A100 80GB
Per hour
$1.69
$4.95$5.61$5.2365-70% off
8x H100 Cluster
Per hour
$25.84
$65.12$75.60$71.3660-65% off
16x H100 Cluster
2 nodes • Per hour
$51.65
$130.24$151.20$142.7260-65% off

Massive Savings on GPU Compute

By optimizing our infrastructure and passing the savings to you, we offer 60-75% lower prices than AWS, Azure, and GCP for equivalent GPU compute.

Example: A 16-GPU H100 cluster that costs $130+/hr on AWS costs only $51.65/hr on RunAICloud. That's over $78/hr in savings, or $56,000+ per month!

AI Model API Pricing

Access 500+ AI models through our unified API. Transparent pricing per million tokens.

Model CategoryPrice per 1M Tokens
Small Models (3B-7B)
Llama 8B, Gemma, etc.
$0.035 - $0.05
Medium Models (8B-34B)
DeepSeek, Qwen, etc.
$0.07 - $0.20
Large Models (70B+)
Llama 70B, Mixtral, etc.
$0.14 - $0.20
Premium Models
GPT-4, Claude, Gemini
$0.16 - $3.90
Code Models
Specialized coding models
$0.04 - $0.20
Image Models
Text-to-image generation
View models page
Audio Models
Speech-to-text, TTS, audio processing
$0.01 - $0.50
Embedding Models
Vector embeddings for RAG & search
$0.02 - $0.10
Moderation Models
Content safety & filtering
$0.02 - $0.15

Frequently Asked Questions

GPU Clusters & Instances

How is GPU usage billed?

Both GPU instances and clusters are billed per-second with no minimum charges. You only pay for the exact time your GPUs are running. For example, if you use a $3/hour GPU for 30 minutes, you'll be charged $1.50.

What's the difference between GPU instances and clusters?

GPU Instances: Single GPU machines perfect for development, fine-tuning, and inference. Deploy in seconds, SSH access included.

GPU Clusters: Multi-node systems (2-8 nodes) with high-speed InfiniBand networking, ideal for distributed training, HPC workloads, and large-scale inference.

Can I scale my cluster up or down?

Currently, cluster configurations are fixed at creation time. However, you can terminate a cluster and create a new one with different specifications at any time. We're working on dynamic scaling capabilities.

What templates are available for clusters?

We offer 5 pre-configured templates: PyTorch Distributed Training, Slurm HPC Cluster, Axolotl LLM Fine-Tuning, TensorFlow Distributed, and Ray Distributed Computing. All templates come with NCCL, CUDA, and necessary environment variables pre-configured.

What network speeds do clusters offer?

Clusters feature ultra-fast InfiniBand or RoCE v2 interconnects: 1.6 Tbps for A100/L40s clusters and up to 3.2 Tbps for H100/H200/B200 clusters. This ensures minimal communication overhead for distributed training.

How quickly can I deploy a GPU or cluster?

GPU instances deploy instantly (typically under 30 seconds). Clusters deploy in 1-2 minutes. All come with pre-configured environments and SSH access.

Is there a minimum usage time for GPUs?

No minimum! Billing is per-second. Use a GPU for 5 seconds or 5 months - you only pay for actual usage. However, we recommend keeping GPUs running for at least a few minutes to make setup worthwhile.

API & Credits

Do credits expire?

No, your credits never expire. Use them at your own pace for API calls, GPU instances, or clusters.

What's the minimum credit purchase?

The minimum is $10. You can add any amount above that. Credits can be used for AI model APIs, GPU instances, and GPU clusters.

Are there any hidden fees?

Absolutely no hidden fees. You only pay for what you use - API tokens, GPU seconds, or cluster compute time. No setup fees, no bandwidth charges, no surprise costs.

Can I get a refund?

Unused credits can be refunded within 30 days of purchase. Contact support for refund requests. Note that used credits (API calls, GPU time) are non-refundable.

Do you offer volume discounts?

Yes! Contact us for enterprise pricing if you plan to spend $1,000+ per month. We offer custom pricing for high-volume API usage and dedicated GPU commitments.

Can I use the same credits for APIs and GPUs?

Yes! Credits are universal across our platform. Use them for AI model API calls, single GPU instances, or multi-node clusters - whatever your project needs.

Savings & Comparison

How can you offer 60-75% savings vs AWS/Azure/GCP?

We optimize our infrastructure, leverage spot capacity efficiently, and maintain lower overhead. We pass these savings directly to customers rather than pocketing the difference. Our pricing is transparent and competitive.

Are there any compromises with lower pricing?

No compromises! You get the same enterprise-grade NVIDIA GPUs (H100, A100, etc.), ultra-fast networking, and reliable infrastructure. The only difference is the price.

How much can I save on a typical workload?

Example 1: Training a large language model on a 16-GPU H100 cluster for 24 hours:

  • • AWS: ~$3,125 (24 hrs × $130.24/hr)
  • • RunAICloud: $1,240 (24 hrs × $51.65/hr)
  • Savings: $1,885 per day!