Page History

...

On transmit timeout kiblnd notifies LNet that the peer has closed due to an error. This goes through the lnet_notify path.
The peer aliveness at the LNet layer is set to 0 (dead), and the last alive
In IBLND whenever a message is received successfully, transmitted successfully or a connection is completed (whether it is successful or has been rejected) then the last alive time of the peer is set.
At the LNet layer whenever sending a message to a peer check if that peer is alive. for a non router node, lnet_peer_isaliveness_aliveenabled() is calledwill always return 0:
- If the peer is marked dead and you've been notified by the lnd of its death at time X which is after the last known alive time, then consider the peer currently dead.
- Otherwise consider the peer is alive if peer_timeout seconds has not passed from the last time it was alive.
- if the peer_timeout has elapsed then consider the peer dead.
  - The issue with that is we will never retry this peer ever again after the peer_timeout is elapsed.
- Code Block
  #define lnet_peer_aliveness_enabled(lp) (the_lnet.ln_routing != 0 && \ ((lp)->lpni_net) && \ (lp)->lpni_net->net_tunables.lct_peer_time_out > 0)
- In effect, the aliveness of the peer is not considered at all if the node is not a router.
  - This can remain the same since the health of the peer will be considered in lnet_select_pathway() before this is considered.
  - In fact if the logic for the health of the peer is done in lnet_select_pathway(), then the logic in lnet_post_send_locked() can be removed. A peer will always be as healthy as possible by the time the flow hits lnet_post_send_locked()
If the node is not a router, then a peer will always be tried irregardless of its health. If it is a router then

Health Revisited

There are different scenarios to consider with Health:

...

Space shortcuts

Page tree

Versions Compared

Old Version 11

New Version 12

Key

Health Revisited