Failure Detector¶
casty.PhiAccrualFailureDetector
¶
Phi accrual failure detector (Hayashibara et al.).
Outputs a continuous suspicion level (phi) instead of a binary alive/dead
signal. phi = -log10(1 - CDF(elapsed)) where CDF is the normal
distribution fitted to the observed heartbeat interval history.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threshold
|
float
|
Phi value above which a node is considered unreachable. Higher values tolerate more jitter but detect failures more slowly. |
8.0
|
max_sample_size
|
int
|
Maximum number of heartbeat intervals to keep per node. |
200
|
min_std_deviation_ms
|
float
|
Floor for the standard deviation estimate, preventing overly aggressive detection when intervals are very stable. |
100.0
|
acceptable_heartbeat_pause_ms
|
float
|
Additional grace period added to the mean estimate, accounting for expected pauses (e.g. GC). |
0.0
|
first_heartbeat_estimate_ms
|
float
|
Assumed mean interval before enough samples have been collected. |
1000.0
|
Examples:
>>> fd = PhiAccrualFailureDetector(threshold=8.0)
>>> fd.heartbeat("node-1")
>>> fd.is_available("node-1")
True
>>> fd.phi("unknown-node")
0.0
tracked_nodes
property
¶
Return the set of node keys that have received at least one heartbeat.
Returns:
| Type | Description |
|---|---|
frozenset[str]
|
Node identifiers currently being tracked. |
__init__(*, threshold=8.0, max_sample_size=200, min_std_deviation_ms=100.0, acceptable_heartbeat_pause_ms=0.0, first_heartbeat_estimate_ms=1000.0)
¶
heartbeat(node)
¶
Record arrival of a heartbeat from a node.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node
|
str
|
Identifier of the node that sent the heartbeat. |
required |
phi(node)
¶
Calculate the suspicion level for node.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node
|
str
|
Identifier of the node to evaluate. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The phi value. |
remove(node)
¶
Stop tracking a node entirely.
Called when a node is marked down so it no longer accumulates
stale history in the failure detector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node
|
str
|
Identifier of the node to remove. |
required |
is_available(node)
¶
Check if a node is considered available (phi below threshold).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node
|
str
|
Identifier of the node to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
|