Skip to content

Parallel execution

A single >> sends one computation to one node. But many workloads consist of multiple independent tasks — processing chunks of data, running different model configurations, evaluating several inputs. Skyward provides two ways to parallelize: gather() for dynamic collections and & for type-safe composition of a fixed set. Both dispatch all tasks concurrently and block until the results are ready.

Compute functions

Define the functions you want to run remotely. Each one is independent — they can execute in any order, on the same or different nodes:

@sky.function
def process_chunk(data: list[int]) -> int:
    """Sum all numbers in a chunk."""
    return sum(data)


@sky.function
def multiply(x: int, y: int) -> int:
    """Multiply two numbers."""
    return x * y


@sky.function
def factorial(n: int) -> int:
    """Calculate factorial."""
    result = 1
    for i in range(1, n + 1):
        result *= i
    return result

Parallel with gather()

When you have a dynamic number of tasks — iterating over a list of chunks, a set of configurations, or any collection whose size isn't known at write time — use gather():

# gather() runs all calls in parallel
results = sky.gather(*[process_chunk(c) for c in chunks]) >> compute
print(f"Chunk sums: {results}")  # (6, 15, 24)

gather() collects multiple PendingFunction values into a PendingFunctionGroup. When dispatched with >>, all tasks execute concurrently on the pool's nodes (distributed via round-robin) and the results come back as a tuple. The pool handles serialization, dispatch, and collection — you just express which tasks should run in parallel.

Type-safe parallel with &

When the number of parallel tasks is fixed and you want full type inference, use the & operator:

# & operator for type-safe parallel execution
a, b = (multiply(2, 3) & multiply(4, 5)) >> compute
print(f"Products: {a}, {b}")  # 6, 20

The & operator creates the same PendingFunctionGroup that gather() produces, but with a key difference: the types are preserved individually. Here, a and b are both int because multiply returns int. If you chain three different functions — preprocess() & train() & evaluate() — the result type is tuple[DataFrame, Model, float], not tuple[Any, ...].

Mixing different computations

Since & preserves types per-position, you can compose completely different functions in a single parallel batch:

# Mix different computations
s, p, f = (process_chunk([1, 2, 3]) & multiply(10, 20) & factorial(5)) >> compute
print(f"Mixed: sum={s}, product={p}, factorial={f}")  # 6, 200, 120

Each computation may go to a different node (round-robin scheduling), and the group blocks until all of them complete. The destructured variables s, p, f each carry their correct type.

The distinction from broadcast (@) is important: @ runs the same function on all nodes, while & runs different functions concurrently. Use @ when every node should do the same work; use & when you have distinct, independent tasks.

Streaming results

By default, gather() waits for all tasks to finish before returning. With stream=True, results are yielded as they complete — useful when tasks have variable duration and you want to start processing early:

# gather(stream=True) yields results as they complete
tasks = [process_chunk([i] * 1000) for i in range(5)]
start = time.monotonic()
for result in sky.gather(*tasks, stream=True) >> compute:
    elapsed = time.monotonic() - start
    print(f"  [{elapsed:.1f}s] Got: {result}")

Streaming changes the return type from a tuple to a generator. Results arrive in completion order, not submission order — the fastest tasks come first. This is ideal for displaying progress, feeding partial results into a downstream pipeline, or reducing time-to-first-result when tasks have uneven durations.

If you need results in the original submission order even while streaming, pass ordered=True (the default). Skyward will buffer internally and yield in order, though this means you won't see a result until all preceding tasks have also completed.

Run the full example

git clone https://github.com/gabfssilva/skyward.git
cd skyward
uv run python guides/02_parallel_execution.py

What you learned:

  • gather() collects a dynamic number of computations into a parallel batch — dispatch with >>, results as a tuple.
  • & operator composes a fixed set of computations with full type inference per position.
  • stream=True yields results as they complete instead of waiting for all — useful for variable-duration tasks.
  • @ vs & — broadcast runs the same function on all nodes; & runs different functions concurrently.