...
- Track the last time stamp any message was received on a local NI
- if the NI hasn't received any traffic for a period of
router_ping_timeout + MAX(live_router_check_interval, dead_router_check_interval)
then it's marked down- This is done so that other nodes using the gateway can mark the route down, given that
avoid_asym_router_failure
is set to 1.
- This is done so that other nodes using the gateway can mark the route down, given that
- Do not send messages to a peer NI which is marked down.
- Set the peer status to up when messages are receivedWhen messages are flowing
- through the router, query the peer NI a message is destined to every one second to determine if it has come back up again, and if so then set its status to aliveQuery a peer NI, which is marked down, a minimum of once per second when there is traffic destined to it. If the query result determines that the peer NI is reachable, the peer NI state is set to UP. Messages can then be send to that peer NI.
Gateway Requirements
A gateway in this context is the peer NI created when adding a route on a node. For example: lnetctl route add --net tcp --gateway <gateway-NID>.
Dealing with that peer-NI is somewhat of a special case.
...
- Keep track of the last time the peer was alive,
lpni_last_alive
- Keep track the last time the peer was notified that its state has changed,
lpni_timestamp
- The peer can change state under the following conditions:
- The LND notifies that the peer is down when it fails to send a message to the peer.
- As an example in o2iblnd:
- kiblnd_peer_connect_failed() and kiblnd_disconnect_conn() call kiblnd_peer_notify() which calls lnet_notify() to set the peer to
dead
if there was an error
- kiblnd_peer_connect_failed() and kiblnd_disconnect_conn() call kiblnd_peer_notify() which calls lnet_notify() to set the peer to
- As an example in o2iblnd:
- A message is received in
lnet_parse()
- In this case the peer stat is set to alive
- only for gateway peer NIs
- When the router checker ping is responded to or it fails.
- If the router checker ping times out.
- The LND notifies that the peer is down when it fails to send a message to the peer.
- The peer can change state under the following conditions:
- This step only concerns routers. : Only send the message if the peer is alive, determined as outlined above.
- On the router if the NI hasn't received any traffic for a period of
router_ping_timeout + MAX(live_router_check_interval, dead_router_check_interval)
then it's marked down.- This is done in order for the peers using the router to mark the peer down when the
avoid_asym_router_failure
is set to 1, which it is by default.
- This is done in order for the peers using the router to mark the peer down when the