Using accelerators¶

Most ML workloads need GPUs. Skyward abstracts GPU selection across providers — you specify the hardware you need, and the provider finds the right instance type, resolves availability, and provisions it. This guide covers how to request specific accelerators, detect the hardware available at runtime, and understand the difference between string and typed accelerator specs.

Requesting a GPU¶

Specify the accelerator when creating a pool:

with sky.Compute(
    provider=sky.VastAI(),
    accelerator=sky.accelerators.L40S(),
    image=sky.Image(pip=["torch"]),
) as compute:

Use the factory functions under sky.accelerators. They carry catalog metadata — VRAM size, CUDA compatibility, form factor — and provide IDE autocomplete:

sky.Compute(provider=sky.AWS(), accelerator=sky.accelerators.A100())
sky.Compute(provider=sky.AWS(), accelerator=sky.accelerators.H100(count=4))

Each factory returns an Accelerator dataclass — a frozen specification with name, memory, count, and optional metadata. The factory functions look up defaults from an internal catalog, so sky.accelerators.H100() already knows it has 80GB of memory without you specifying it.

The translation from a logical accelerator name to a provider-specific resource isn't a simple string match. An "A100" on AWS is a p4d.24xlarge, on RunPod it's a pod with a specific gpuTypeId, on VastAI it's a marketplace offer filtered by GPU model. The catalog centralizes this complexity so that the same Accelerator spec resolves correctly on any provider that supports it.

Detecting hardware at runtime¶

Inside a @sky.function function, instance_info() tells you what hardware is available:

@sky.function
def gpu_info() -> sky.InstanceInfo:
    """Return information about this instance."""
    return sky.instance_info()

InstanceInfo includes the node index, cluster size, head status, and the number and type of accelerators. This is useful for conditional logic — running a GPU path when CUDA is available, falling back to CPU otherwise.

GPU vs CPU benchmark¶

A matrix multiplication benchmark illustrates the GPU advantage. The function runs on the remote instance, where the accelerator is available:

@sky.function
def matrix_benchmark(size: int) -> dict:
    """Benchmark matrix multiplication on GPU vs CPU."""
    import time

    import torch

    a = torch.randn(size, size)
    b = torch.randn(size, size)

    start = time.perf_counter()
    torch.matmul(a, b)
    cpu_time = time.perf_counter() - start

    if torch.cuda.is_available():
        a_gpu = a.cuda()
        b_gpu = b.cuda()
        torch.matmul(a_gpu, b_gpu)
        torch.cuda.synchronize()

        start = time.perf_counter()
        torch.matmul(a_gpu, b_gpu)
        torch.cuda.synchronize()
        gpu_time = time.perf_counter() - start

        return {"cpu": cpu_time, "gpu": gpu_time, "speedup": cpu_time / gpu_time}

    return {"cpu": cpu_time}

The first torch.matmul on GPU is a warmup call — it triggers CUDA kernel compilation, which is a one-time cost. After warmup, GPU matmul on a 4096x4096 matrix is typically 20-50x faster than CPU. The exact speedup depends on the GPU model, matrix size, and data type (fp32 vs fp16).

Note that imports happen inside the function. This is intentional — the function runs on the remote worker, where torch is installed via the Image's pip field. Your local machine doesn't need torch installed.

Run the full example¶

git clone https://github.com/gabfssilva/skyward.git
cd skyward
uv run python guides/04_gpu_accelerators.py

What you learned:

accelerator parameter requests specific GPU hardware via factory functions like sky.accelerators.A100().
sky.accelerators.* provides catalog-backed specs with VRAM, CUDA version, and provider-specific resolution.
instance_info() detects hardware at runtime — node identity, accelerators, cluster metadata.
Imports inside @sky.function — remote dependencies don't need to be installed locally.
GPU warmup — first CUDA kernel compilation is a one-time cost; benchmark after warmup for accurate numbers.