Skip to content

Failure Detector

casty.PhiAccrualFailureDetector

Phi accrual failure detector (Hayashibara et al.).

Outputs a continuous suspicion level (phi) instead of a binary alive/dead signal. phi = -log10(1 - CDF(elapsed)) where CDF is the normal distribution fitted to the observed heartbeat interval history.

Parameters:

Name Type Description Default
threshold float

Phi value above which a node is considered unreachable. Higher values tolerate more jitter but detect failures more slowly.

8.0
max_sample_size int

Maximum number of heartbeat intervals to keep per node.

200
min_std_deviation_ms float

Floor for the standard deviation estimate, preventing overly aggressive detection when intervals are very stable.

100.0
acceptable_heartbeat_pause_ms float

Additional grace period added to the mean estimate, accounting for expected pauses (e.g. GC).

0.0
first_heartbeat_estimate_ms float

Assumed mean interval before enough samples have been collected.

1000.0

Examples:

>>> fd = PhiAccrualFailureDetector(threshold=8.0)
>>> fd.heartbeat("node-1")
>>> fd.is_available("node-1")
True
>>> fd.phi("unknown-node")
0.0

tracked_nodes property

Return the set of node keys that have received at least one heartbeat.

Returns:

Type Description
frozenset[str]

Node identifiers currently being tracked.

__init__(*, threshold=8.0, max_sample_size=200, min_std_deviation_ms=100.0, acceptable_heartbeat_pause_ms=0.0, first_heartbeat_estimate_ms=1000.0)

heartbeat(node)

Record arrival of a heartbeat from a node.

Parameters:

Name Type Description Default
node str

Identifier of the node that sent the heartbeat.

required

phi(node)

Calculate the suspicion level for node.

Parameters:

Name Type Description Default
node str

Identifier of the node to evaluate.

required

Returns:

Type Description
float

The phi value. 0.0 if node has never sent a heartbeat; inf if the node is almost certainly down.

remove(node)

Stop tracking a node entirely.

Called when a node is marked down so it no longer accumulates stale history in the failure detector.

Parameters:

Name Type Description Default
node str

Identifier of the node to remove.

required

is_available(node)

Check if a node is considered available (phi below threshold).

Parameters:

Name Type Description Default
node str

Identifier of the node to check.

required

Returns:

Type Description
bool

True if phi(node) < threshold.