GPU Servers. Accelerated compute.

Dedicated NVIDIA A100 and H100 GPU servers for AI training, inference, rendering, and high-performance computing. Single-tenant hardware with NVLink interconnect, high-bandwidth NVMe storage, and 10 Gbps networking.

A100
80 GB HBM2e
H100
80 GB HBM3
NVLink
GPU interconnect
PCIe 5
I/O bus
Hardware

NVIDIA A100 — Data center GPU

The A100 is NVIDIA's data center GPU based on the Ampere architecture. Each GPU has 80 GB of HBM2e memory with 2 TB/s bandwidth, third-generation Tensor Cores, and support for TF32, FP16, BF16, INT8, and sparsity acceleration.

We offer A100 servers in both PCIe and SXM4 form factors. SXM4 configurations connect up to 8 GPUs via NVLink with 600 GB/s bidirectional bandwidth per link — critical for distributed training where GPUs need to exchange gradients frequently.

A100 servers are a good fit for training medium-scale models, running inference at scale, and general HPC workloads like molecular dynamics, CFD, and financial Monte Carlo simulations.

NVIDIA H100 — Hopper architecture

The H100 is NVIDIA's latest data center GPU based on the Hopper architecture. 80 GB HBM3 memory with 3.35 TB/s bandwidth, fourth-generation Tensor Cores with FP8 support, and the Transformer Engine — purpose-built hardware for attention mechanisms.

H100 SXM5 configurations connect up to 8 GPUs via fourth-generation NVLink with 900 GB/s bidirectional bandwidth. The NVSwitch fabric provides all-to-all GPU communication without going through the CPU.

H100 servers deliver roughly 3× the training performance of A100 on transformer workloads and are the best choice for large language model training, fine-tuning, and high-throughput inference.

Dedicated GPUs. No sharing. No oversubscription.

Configurations

GPU server configurations

All GPU servers are single-tenant. CUDA toolkit, cuDNN, NCCL, and container runtimes pre-installed. Hourly or monthly billing.

GPU.A100-1x$2.85/hr
GPU
1× NVIDIA A100
80 GB HBM2e · PCIe Gen4
CPU
AMD EPYC 7543
16 cores / 32 threads allocated
Memory
128 GB DDR4
ECC · 3200 MHz
Storage
1 TB NVMe
7 GB/s sequential read
GPU.A100-4x$10.80/hr
GPU
4× NVIDIA A100 SXM4
320 GB HBM2e total · NVLink
CPU
AMD EPYC 7543
32 cores / 64 threads allocated
Memory
256 GB DDR4
ECC · 3200 MHz
Storage
2× 1.92 TB NVMe
RAID 0 · 14 GB/s read
GPU.A100-8x$21.00/hr
GPU
8× NVIDIA A100 SXM4
640 GB HBM2e total · NVLink + NVSwitch
CPU
2× AMD EPYC 7543
64 cores / 128 threads total
Memory
512 GB DDR4
ECC · 3200 MHz
Storage
4× 3.84 TB NVMe
RAID 0 · 28 GB/s read
GPU.H100-1x$4.25/hr
GPU
1× NVIDIA H100
80 GB HBM3 · PCIe Gen5
CPU
Intel Xeon w9-3495X
16 cores / 32 threads allocated
Memory
128 GB DDR5
ECC · 4800 MHz
Storage
1.92 TB NVMe
Gen4 · 7 GB/s sequential read
GPU.H100-8x$32.50/hr
GPU
8× NVIDIA H100 SXM5
640 GB HBM3 total · NVLink + NVSwitch
CPU
2× Intel Xeon w9-3495X
112 cores / 224 threads total
Memory
1 TB DDR5
ECC · 4800 MHz
Storage
8× 3.84 TB NVMe
RAID 0 · 56 GB/s read
Software

Pre-installed ML stack

Every GPU server ships with the NVIDIA driver, CUDA toolkit, cuDNN, and NCCL pre-installed and tested against the specific GPU configuration. You don't need to spend hours debugging driver compatibility — SSH in and start training.

Docker and NVIDIA Container Toolkit are pre-configured so you can pull and run GPU-enabled containers immediately. We maintain tested base images with PyTorch, TensorFlow, and JAX on our container registry.

For Kubernetes users, our managed clusters support GPU scheduling via the NVIDIA device plugin. Request GPUs in your pod spec and the scheduler handles placement. Multi-GPU jobs with NCCL are supported out of the box.

Storage

High-bandwidth storage

NVMe storage arrays keep training data close to compute and minimize I/O bottlenecks during data loading.

GPU workloads are often bottlenecked by data loading speed rather than compute. Our multi-GPU configurations use striped NVMe arrays (RAID 0) to maximize sequential read throughput — the 8x configurations deliver up to 56 GB/s, enough to saturate the PCIe lanes feeding data to the GPUs.

For datasets that don't fit on local storage, GPU servers can be attached to our private networking fabric and access data from storage-optimized bare metal servers in the same region with no bandwidth charges.

Use cases

Model training

Train transformer models, diffusion models, and other deep learning architectures on dedicated GPU hardware. Multi-GPU configurations with NVLink provide the interconnect bandwidth needed for efficient distributed training.

For multi-node training, connect GPU servers via our private networking fabric. NCCL can use the 10 Gbps network for cross-node gradient synchronization. Larger clusters can be built by reserving multiple 8-GPU servers in the same region.

Inference at scale

Run inference on fine-tuned models with predictable latency. Single-GPU configurations are cost-effective for serving models that fit in 80 GB of GPU memory. For larger models, multi-GPU configurations with tensor parallelism split the model across GPUs.

Combine GPU servers with our load balancers to distribute inference requests across multiple servers. Health checks ensure that servers with GPU memory errors or driver issues are automatically removed from rotation.

Features
Dedicated GPUs
Single-tenant GPU servers. No GPU sharing, no time-slicing, no oversubscription. Your workloads get the full GPU, all the memory, and all the compute units.
Exclusive
NVLink Interconnect
Multi-GPU configurations use NVLink for high-bandwidth GPU-to-GPU communication. 600 GB/s on A100 SXM4, 900 GB/s on H100 SXM5. NVSwitch on 8-GPU configs.
900 GB/s
Pre-installed Drivers
CUDA toolkit, cuDNN, NCCL, Docker, and NVIDIA Container Toolkit pre-installed. Tested base images for PyTorch, TensorFlow, and JAX available on our registry.
Ready
Hourly Billing
Pay by the hour with no minimum commitment. Monthly reservations available at a discount for predictable workloads. Deprovision when training is done.
Flexible
Private Networking
Connect GPU servers to VPS and bare metal instances over isolated VLANs. Free internal bandwidth. Build training pipelines with separate data, compute, and serving tiers.
VLAN
API Managed
Provision, snapshot, and destroy GPU servers via the same REST API, CLI, and Terraform provider. Automate training pipelines with programmatic server lifecycle management.
REST

Start training today.

A100 servers available immediately. H100 on reservation.