Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Track the last time stamp any message was received on a local NI
  2. if the NI hasn't received any traffic for a period of router_ping_timeout + MAX(live_router_check_interval, dead_router_check_interval) then it's marked down
    1. This is done so that other nodes using the gateway can mark the route down, given that avoid_asym_router_failure is set to 1.
  3. Do not send messages to a peer NI which is marked down.
  4. Set the peer status to up when messages are receivedWhen messages are flowing
  5. through the router, query the peer NI a message is destined to every one second to determine if it has come back up again, and if so then set its status to aliveQuery a peer NI, which is marked down, a minimum of once per second when there is traffic destined to it. If the query result determines that the peer NI is reachable, the peer NI state is set to UP. Messages can then be send to that peer NI.

Gateway Requirements

A gateway in this context is the peer NI created when adding a route on a node. For example: lnetctl route add --net tcp --gateway <gateway-NID>. Dealing with that peer-NI is somewhat of a special case.

...

  1. Keep track of the last time the peer was alive, lpni_last_alive
  2. Keep track the last time the peer was notified that its state has changed, lpni_timestamp
    1. The peer can change state under the following conditions:
      1. The LND notifies that the peer is down when it fails to send a message to the peer.
        1. As an example in o2iblnd:
          1. kiblnd_peer_connect_failed() and kiblnd_disconnect_conn() call kiblnd_peer_notify() which calls lnet_notify() to set the peer to dead if there was an error
      2. A message is received in lnet_parse()
        1. In this case the peer stat is set to alive
        The peer has been dead for longer than the configured peer timeout and it's status hasn't been updated either in the process of receiving or sending messages. In other words the system came up and stayed idle for longer than the configured peer timeout. In this case set the peer state to alive.
        1. only for gateway peer NIs
      3. When the router checker ping is responded to or it fails.
      4. If the router checker ping times out.
  3. This step only concerns routers. : Only send the message if the peer is alive, determined as outlined above.
  4. On the router if the NI hasn't received any traffic for a period of router_ping_timeout + MAX(live_router_check_interval, dead_router_check_interval) then it's marked down.
    1. This is done in order for the peers using the router to mark the peer down when the avoid_asym_router_failure is set to 1, which it is by default.