Simple, Transparent Pricing

Pay only for what you use. No monthly subscriptions. No hidden fees. Choose from API access, dedicated GPUs, or multi-node clusters.

How It Works

1. Add Credits

Add any amount starting from $10. Your credits never expire.

2. Use AI Models

Access 500+ models for chat, audio, code, images, embeddings & moderation. Credits deduct based on actual usage.

3. Top Up Anytime

Add more credits whenever you need. No commitments.

Get Started - Add Credits

No credit card required to sign up

Multi-Node GPU Clusters

GPU Cluster Pricing

High-performance multi-node clusters for distributed training, HPC, and large-scale inference

60-75% cheaper than AWS, Azure, or GCP

NVIDIA H100 80GB

Flagship GPU for AI training

$3.23/hr

Per GPU$3.23/hour

2-Node Cluster (16 GPUs)$51.65/hour

4-Node Cluster (32 GPUs)$103.30/hour

8-Node Cluster (64 GPUs)$206.59/hour

3.2 Tbps InfiniBand interconnect

80GB HBM3 memory per GPU

Configure H100 Cluster

NVIDIA A100 80GB

Proven performance for AI workloads

$1.69/hr

Per GPU$1.69/hour

2-Node Cluster (16 GPUs)$27.04/hour

4-Node Cluster (32 GPUs)$54.08/hour

8-Node Cluster (64 GPUs)$108.16/hour

1.6 Tbps InfiniBand interconnect

80GB HBM2e memory per GPU

Configure A100 Cluster

GPU Type	Per GPU	2 Nodes (16 GPUs)	4 Nodes (32 GPUs)	Interconnect
NVIDIA H200 141GB Next-gen flagship	$4.29/hr	$68.64/hr	$137.28/hr	3.2 Tbps
NVIDIA B200 Latest Blackwell architecture	$5.49/hr	$87.84/hr	$175.68/hr	3.2 Tbps
NVIDIA H100 NVL Optimized for inference	$3.89/hr	$62.24/hr	$124.48/hr	3.2 Tbps
NVIDIA L40s Cost-effective option	$0.99/hr	$15.84/hr	$31.68/hr	1.6 Tbps

2-8

Nodes per cluster

1-8

GPUs per node

Per-Second

Billing (no minimums)

What's Included:

Pre-configured Templates

PyTorch, Slurm, Axolotl, TensorFlow, Ray

Ultra-Fast Networking

Up to 3.2 Tbps InfiniBand/RoCE v2

SSH Access to All Nodes

Direct access to primary and worker nodes

Flexible Storage

100GB - 5TB per node

Deploy in 1-2 Minutes

Instant provisioning, no waiting

Environment Pre-configured

NCCL, CUDA, distributed training ready

Dedicated GPU Instances

GPU Instance Pricing

Single GPU instances for development, fine-tuning, and inference

Per-second billing • No setup fees • Instant deployment

GPU Model	VRAM	Per Hour	Per Day	Per Month	Best For
NVIDIA H100 80GB	80GB	$3.23	$77.52	$2,357.90	Large model training
NVIDIA A100 80GB	80GB	$1.69	$40.56	$1,233.70	Fine-tuning, training
NVIDIA A100 40GB	40GB	$1.29	$30.96	$941.70	Medium models
NVIDIA L40s	48GB	$0.99	$23.76	$722.70	Inference, dev
NVIDIA RTX A6000	48GB	$0.79	$18.96	$576.70	Development, testing
NVIDIA RTX 4090	24GB	$0.59	$14.16	$430.70	Small models, prototyping

View All GPU Plans

Compare Our Pricing

See how much you can save compared to major cloud providers

Configuration	RunAICloud	AWS	Azure	GCP	Savings
1x H100 80GB Per hour	$3.23	$8.14	$9.45	$8.92	60-65% off
1x A100 80GB Per hour	$1.69	$4.95	$5.61	$5.23	65-70% off
8x H100 Cluster Per hour	$25.84	$65.12	$75.60	$71.36	60-65% off
16x H100 Cluster 2 nodes • Per hour	$51.65	$130.24	$151.20	$142.72	60-65% off

Massive Savings on GPU Compute

By optimizing our infrastructure and passing the savings to you, we offer 60-75% lower prices than AWS, Azure, and GCP for equivalent GPU compute.

Example: A 16-GPU H100 cluster that costs $130+/hr on AWS costs only $51.65/hr on RunAICloud. That's over $78/hr in savings, or $56,000+ per month!

AI Model API Pricing

Access 500+ AI models through our unified API. Transparent pricing per million tokens.

Model Category	Price per 1M Tokens
Small Models (3B-7B) Llama 8B, Gemma, etc.	$0.035 - $0.05
Medium Models (8B-34B) DeepSeek, Qwen, etc.	$0.07 - $0.20
Large Models (70B+) Llama 70B, Mixtral, etc.	$0.14 - $0.20
Premium Models GPT-4, Claude, Gemini	$0.16 - $3.90
Code Models Specialized coding models	$0.04 - $0.20
Image Models Text-to-image generation	View models page
Audio Models Speech-to-text, TTS, audio processing	$0.01 - $0.50
Embedding Models Vector embeddings for RAG & search	$0.02 - $0.10
Moderation Models Content safety & filtering	$0.02 - $0.15

View All Models & Pricing →

Frequently Asked Questions

GPU Clusters & Instances

How is GPU usage billed?

Both GPU instances and clusters are billed per-second with no minimum charges. You only pay for the exact time your GPUs are running. For example, if you use a $3/hour GPU for 30 minutes, you'll be charged $1.50.

What's the difference between GPU instances and clusters?

GPU Instances: Single GPU machines perfect for development, fine-tuning, and inference. Deploy in seconds, SSH access included.

GPU Clusters: Multi-node systems (2-8 nodes) with high-speed InfiniBand networking, ideal for distributed training, HPC workloads, and large-scale inference.

Can I scale my cluster up or down?

Currently, cluster configurations are fixed at creation time. However, you can terminate a cluster and create a new one with different specifications at any time. We're working on dynamic scaling capabilities.

What templates are available for clusters?

We offer 5 pre-configured templates: PyTorch Distributed Training, Slurm HPC Cluster, Axolotl LLM Fine-Tuning, TensorFlow Distributed, and Ray Distributed Computing. All templates come with NCCL, CUDA, and necessary environment variables pre-configured.

What network speeds do clusters offer?

Clusters feature ultra-fast InfiniBand or RoCE v2 interconnects: 1.6 Tbps for A100/L40s clusters and up to 3.2 Tbps for H100/H200/B200 clusters. This ensures minimal communication overhead for distributed training.

How quickly can I deploy a GPU or cluster?

GPU instances deploy instantly (typically under 30 seconds). Clusters deploy in 1-2 minutes. All come with pre-configured environments and SSH access.

Is there a minimum usage time for GPUs?

No minimum! Billing is per-second. Use a GPU for 5 seconds or 5 months - you only pay for actual usage. However, we recommend keeping GPUs running for at least a few minutes to make setup worthwhile.

API & Credits

Do credits expire?

No, your credits never expire. Use them at your own pace for API calls, GPU instances, or clusters.

What's the minimum credit purchase?

The minimum is $10. You can add any amount above that. Credits can be used for AI model APIs, GPU instances, and GPU clusters.

Are there any hidden fees?

Absolutely no hidden fees. You only pay for what you use - API tokens, GPU seconds, or cluster compute time. No setup fees, no bandwidth charges, no surprise costs.

Can I get a refund?

Unused credits can be refunded within 30 days of purchase. Contact support for refund requests. Note that used credits (API calls, GPU time) are non-refundable.

Do you offer volume discounts?

Yes! Contact us for enterprise pricing if you plan to spend $1,000+ per month. We offer custom pricing for high-volume API usage and dedicated GPU commitments.

Can I use the same credits for APIs and GPUs?

Yes! Credits are universal across our platform. Use them for AI model API calls, single GPU instances, or multi-node clusters - whatever your project needs.

Savings & Comparison

How can you offer 60-75% savings vs AWS/Azure/GCP?

We optimize our infrastructure, leverage spot capacity efficiently, and maintain lower overhead. We pass these savings directly to customers rather than pocketing the difference. Our pricing is transparent and competitive.

Are there any compromises with lower pricing?

No compromises! You get the same enterprise-grade NVIDIA GPUs (H100, A100, etc.), ultra-fast networking, and reliable infrastructure. The only difference is the price.

How much can I save on a typical workload?

Example 1: Training a large language model on a 16-GPU H100 cluster for 24 hours:

• AWS: ~$3,125 (24 hrs × $130.24/hr)
• RunAICloud: $1,240 (24 hrs × $51.65/hr)
• Savings: $1,885 per day!