HuggingFace fine-tuning¶

Fine-tuning a pre-trained transformer is one of the most common ML workflows: take a model from the HuggingFace Hub, adapt it to your task with a small labeled dataset, and evaluate the results. The bottleneck is usually hardware — fine-tuning even a small model like DistilBERT benefits significantly from a GPU, and larger models require one. Skyward lets you wrap the entire pipeline in a single @sky.function function, provision a GPU instance, and run it remotely. Everything — model download, tokenization, training, evaluation — happens on the cloud instance.

Loading model and tokenizer¶

Load a pre-trained model inside the compute function:

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2,
    id2label={0: "negative", 1: "positive"},
    label2id={"negative": 0, "positive": 1},
)

AutoModelForSequenceClassification.from_pretrained() downloads the base model and adds a classification head. The download happens on the remote instance, which typically has faster internet than a laptop and avoids transferring multi-GB model weights over the SSH tunnel. The id2label and label2id mappings configure the model for binary sentiment classification.

Preparing the dataset¶

Load IMDB, tokenize, and prepare for training — all on the remote instance:

dataset = load_dataset("imdb")
train_ds = dataset["train"].select(range(max_samples))
test_ds = dataset["test"].select(range(max_samples // 4))

def tokenize(examples):
    return tokenizer(examples["text"], truncation=True, max_length=256)

train_ds = train_ds.map(tokenize, batched=True, remove_columns=["text"])
test_ds = test_ds.map(tokenize, batched=True, remove_columns=["text"])

load_dataset("imdb") downloads the dataset on the worker. The select(range(max_samples)) call limits the dataset size for faster iteration during development — remove it for a full fine-tuning run. Tokenization runs remotely too, so you don't need transformers or datasets installed locally.

This is one of the key advantages of remote execution: heavy data processing and model operations happen on a machine with the right hardware and fast network, while your local machine just dispatches the work and collects results.

Training with the Trainer API¶

Configure training arguments and launch the Trainer:

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="/tmp/finetuned",
        num_train_epochs=epochs,
        per_device_train_batch_size=batch_size,
        eval_strategy="epoch",
        save_strategy="no",
        fp16=torch.cuda.is_available(),
        report_to="none",
    ),
    train_dataset=train_ds,
    eval_dataset=test_ds,
    data_collator=DataCollatorWithPadding(tokenizer),
    compute_metrics=compute_metrics,
)

train_result = trainer.train()
eval_result = trainer.evaluate()

The Trainer manages the training loop, evaluation, gradient accumulation, and mixed-precision (fp16) when a GPU is available. eval_strategy="epoch" runs evaluation after each epoch. save_strategy="no" disables checkpointing — since the instance is ephemeral, saved checkpoints would be lost on teardown. For production fine-tuning, you'd save checkpoints to a persistent location (S3, HuggingFace Hub, or a mounted volume).

The function returns a summary dict with training loss, evaluation accuracy, and runtime. This is the result that comes back through the SSH tunnel to your local process.

Dispatching to the cloud¶

The full example dispatches the fine-tuning job to an A100 instance:

result = finetune(
    model_name="distilbert-base-uncased",
    epochs=2,
    batch_size=16,
) >> pool

The HuggingFace Trainer handles device placement and mixed-precision internally. Skyward provisions the GPU instance, runs the function, and returns the result. The Image(pip=[...]) in the pool configuration installs the required dependencies on the worker.

Run the full example¶

git clone https://github.com/gabfssilva/skyward.git
cd skyward
uv run python guides/08_huggingface_finetuning.py

What you learned:

Everything runs remotely — model download, tokenization, training, evaluation all happen on the cloud GPU.
No Skyward-specific APIs inside the function — standard HuggingFace Trainer, AutoModel, load_dataset.
Remote imports — transformers and datasets only need to be installed on the worker (via the Image's pip field), not locally.
Ephemeral instances — checkpoints are lost on teardown; save to persistent storage for production runs.
Single-node fine-tuning — the HuggingFace Trainer manages device placement internally; add sky.plugins.torch() for multi-node.