Parallel execution¶
A single >> sends one computation to one node. But many workloads consist of multiple independent tasks — processing chunks of data, running different model configurations, evaluating several inputs. Skyward provides two ways to parallelize: gather() for dynamic collections and & for type-safe composition of a fixed set. Both dispatch all tasks concurrently and block until the results are ready.
Compute functions¶
Define the functions you want to run remotely. Each one is independent — they can execute in any order, on the same or different nodes:
@sky.function
def process_chunk(data: list[int]) -> int:
"""Sum all numbers in a chunk."""
return sum(data)
@sky.function
def multiply(x: int, y: int) -> int:
"""Multiply two numbers."""
return x * y
@sky.function
def factorial(n: int) -> int:
"""Calculate factorial."""
result = 1
for i in range(1, n + 1):
result *= i
return result
Parallel with gather()¶
When you have a dynamic number of tasks — iterating over a list of chunks, a set of configurations, or any collection whose size isn't known at write time — use gather():
# gather() runs all calls in parallel
results = sky.gather(*[process_chunk(c) for c in chunks]) >> compute
print(f"Chunk sums: {results}") # (6, 15, 24)
gather() collects multiple PendingFunction values into a PendingFunctionGroup. When dispatched with >>, all tasks execute concurrently on the pool's nodes (distributed via round-robin) and the results come back as a tuple. The pool handles serialization, dispatch, and collection — you just express which tasks should run in parallel.
Type-safe parallel with &¶
When the number of parallel tasks is fixed and you want full type inference, use the & operator:
# & operator for type-safe parallel execution
a, b = (multiply(2, 3) & multiply(4, 5)) >> compute
print(f"Products: {a}, {b}") # 6, 20
The & operator creates the same PendingFunctionGroup that gather() produces, but with a key difference: the types are preserved individually. Here, a and b are both int because multiply returns int. If you chain three different functions — preprocess() & train() & evaluate() — the result type is tuple[DataFrame, Model, float], not tuple[Any, ...].
Mixing different computations¶
Since & preserves types per-position, you can compose completely different functions in a single parallel batch:
# Mix different computations
s, p, f = (process_chunk([1, 2, 3]) & multiply(10, 20) & factorial(5)) >> compute
print(f"Mixed: sum={s}, product={p}, factorial={f}") # 6, 200, 120
Each computation may go to a different node (round-robin scheduling), and the group blocks until all of them complete. The destructured variables s, p, f each carry their correct type.
The distinction from broadcast (@) is important: @ runs the same function on all nodes, while & runs different functions concurrently. Use @ when every node should do the same work; use & when you have distinct, independent tasks.
Streaming results¶
By default, gather() waits for all tasks to finish before returning. With stream=True, results are yielded as they complete — useful when tasks have variable duration and you want to start processing early:
# gather(stream=True) yields results as they complete
tasks = [process_chunk([i] * 1000) for i in range(5)]
start = time.monotonic()
for result in sky.gather(*tasks, stream=True) >> compute:
elapsed = time.monotonic() - start
print(f" [{elapsed:.1f}s] Got: {result}")
Streaming changes the return type from a tuple to a generator. Results arrive in completion order, not submission order — the fastest tasks come first. This is ideal for displaying progress, feeding partial results into a downstream pipeline, or reducing time-to-first-result when tasks have uneven durations.
If you need results in the original submission order even while streaming, pass ordered=True (the default). Skyward will buffer internally and yield in order, though this means you won't see a result until all preceding tasks have also completed.
Run the full example¶
git clone https://github.com/gabfssilva/skyward.git
cd skyward
uv run python guides/02_parallel_execution.py
What you learned:
gather()collects a dynamic number of computations into a parallel batch — dispatch with>>, results as a tuple.&operator composes a fixed set of computations with full type inference per position.stream=Trueyields results as they complete instead of waiting for all — useful for variable-duration tasks.@vs&— broadcast runs the same function on all nodes;&runs different functions concurrently.