GPU Servers — Sigilhosting

Hardware

NVIDIA A100 — Data center GPU

The A100 is NVIDIA's data center GPU based on the Ampere architecture. Each GPU has 80 GB of HBM2e memory with 2 TB/s bandwidth, third-generation Tensor Cores, and support for TF32, FP16, BF16, INT8, and sparsity acceleration.

We offer A100 servers in both PCIe and SXM4 form factors. SXM4 configurations connect up to 8 GPUs via NVLink with 600 GB/s bidirectional bandwidth per link — critical for distributed training where GPUs need to exchange gradients frequently.

A100 servers are a good fit for training medium-scale models, running inference at scale, and general HPC workloads like molecular dynamics, CFD, and financial Monte Carlo simulations.

NVIDIA H100 — Hopper architecture

The H100 is NVIDIA's latest data center GPU based on the Hopper architecture. 80 GB HBM3 memory with 3.35 TB/s bandwidth, fourth-generation Tensor Cores with FP8 support, and the Transformer Engine — purpose-built hardware for attention mechanisms.

H100 SXM5 configurations connect up to 8 GPUs via fourth-generation NVLink with 900 GB/s bidirectional bandwidth. The NVSwitch fabric provides all-to-all GPU communication without going through the CPU.

H100 servers deliver roughly 3× the training performance of A100 on transformer workloads and are the best choice for large language model training, fine-tuning, and high-throughput inference.

Configurations

GPU server configurations

All GPU servers are single-tenant. CUDA toolkit, cuDNN, NCCL, and container runtimes pre-installed. Hourly or monthly billing.

GPU.A100-1x$2.85/hr

GPU

1× NVIDIA A100

80 GB HBM2e · PCIe Gen4

CPU

AMD EPYC 7543

16 cores / 32 threads allocated

Memory

128 GB DDR4

ECC · 3200 MHz

Storage

1 TB NVMe

7 GB/s sequential read

GPU.A100-4x$10.80/hr

GPU

4× NVIDIA A100 SXM4

320 GB HBM2e total · NVLink

CPU

AMD EPYC 7543

32 cores / 64 threads allocated

Memory

256 GB DDR4

ECC · 3200 MHz

Storage

2× 1.92 TB NVMe

RAID 0 · 14 GB/s read

GPU.A100-8x$21.00/hr

GPU

8× NVIDIA A100 SXM4

640 GB HBM2e total · NVLink + NVSwitch

CPU

2× AMD EPYC 7543

64 cores / 128 threads total

Memory

512 GB DDR4

ECC · 3200 MHz

Storage

4× 3.84 TB NVMe

RAID 0 · 28 GB/s read

GPU.H100-1x$4.25/hr

GPU

1× NVIDIA H100

80 GB HBM3 · PCIe Gen5

CPU

Intel Xeon w9-3495X

16 cores / 32 threads allocated

Memory

128 GB DDR5

ECC · 4800 MHz

Storage

1.92 TB NVMe

Gen4 · 7 GB/s sequential read

GPU.H100-8x$32.50/hr

GPU

8× NVIDIA H100 SXM5

640 GB HBM3 total · NVLink + NVSwitch

CPU

2× Intel Xeon w9-3495X

112 cores / 224 threads total

Memory

1 TB DDR5

ECC · 4800 MHz

Storage

8× 3.84 TB NVMe

RAID 0 · 56 GB/s read

Software

Pre-installed ML stack

Every GPU server ships with the NVIDIA driver, CUDA toolkit, cuDNN, and NCCL pre-installed and tested against the specific GPU configuration. You don't need to spend hours debugging driver compatibility — SSH in and start training.

Docker and NVIDIA Container Toolkit are pre-configured so you can pull and run GPU-enabled containers immediately. We maintain tested base images with PyTorch, TensorFlow, and JAX on our container registry.

For Kubernetes users, our managed clusters support GPU scheduling via the NVIDIA device plugin. Request GPUs in your pod spec and the scheduler handles placement. Multi-GPU jobs with NCCL are supported out of the box.

Storage

High-bandwidth storage

NVMe storage arrays keep training data close to compute and minimize I/O bottlenecks during data loading.

GPU workloads are often bottlenecked by data loading speed rather than compute. Our multi-GPU configurations use striped NVMe arrays (RAID 0) to maximize sequential read throughput — the 8x configurations deliver up to 56 GB/s, enough to saturate the PCIe lanes feeding data to the GPUs.

For datasets that don't fit on local storage, GPU servers can be attached to our private networking fabric and access data from storage-optimized bare metal servers in the same region with no bandwidth charges.

Use cases

Model training

Train transformer models, diffusion models, and other deep learning architectures on dedicated GPU hardware. Multi-GPU configurations with NVLink provide the interconnect bandwidth needed for efficient distributed training.

For multi-node training, connect GPU servers via our private networking fabric. NCCL can use the 10 Gbps network for cross-node gradient synchronization. Larger clusters can be built by reserving multiple 8-GPU servers in the same region.

Inference at scale

Run inference on fine-tuned models with predictable latency. Single-GPU configurations are cost-effective for serving models that fit in 80 GB of GPU memory. For larger models, multi-GPU configurations with tensor parallelism split the model across GPUs.

Combine GPU servers with our load balancers to distribute inference requests across multiple servers. Health checks ensure that servers with GPU memory errors or driver issues are automatically removed from rotation.

Features

Dedicated GPUs

Single-tenant GPU servers. No GPU sharing, no time-slicing, no oversubscription. Your workloads get the full GPU, all the memory, and all the compute units.

Exclusive

NVLink Interconnect

Multi-GPU configurations use NVLink for high-bandwidth GPU-to-GPU communication. 600 GB/s on A100 SXM4, 900 GB/s on H100 SXM5. NVSwitch on 8-GPU configs.

900 GB/s

Pre-installed Drivers

CUDA toolkit, cuDNN, NCCL, Docker, and NVIDIA Container Toolkit pre-installed. Tested base images for PyTorch, TensorFlow, and JAX available on our registry.

Ready

Hourly Billing

Pay by the hour with no minimum commitment. Monthly reservations available at a discount for predictable workloads. Deprovision when training is done.

Flexible

Private Networking

Connect GPU servers to VPS and bare metal instances over isolated VLANs. Free internal bandwidth. Build training pipelines with separate data, compute, and serving tiers.

VLAN

API Managed

Provision, snapshot, and destroy GPU servers via the same REST API, CLI, and Terraform provider. Automate training pipelines with programmatic server lifecycle management.

REST

Start training today.

A100 servers available immediately. H100 on reservation.

GPU Servers. Accelerated compute.

NVIDIA A100 — Data center GPU

NVIDIA H100 — Hopper architecture

Dedicated GPUs. No sharing. No oversubscription.

GPU server configurations

Pre-installed ML stack

High-bandwidth storage

Model training

Inference at scale

Start training today.