Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Original Pre-Health Requirements

...

  1. Keep track of the last time the peer was alive, lpni_last_alive
  2. Keep track the last time the peer was notified that its state has changed, lpni_timestamp
    1. The peer can change state under the following conditions:
      1. The LND notifies that the peer is down when it fails to send a message to the peer.
        1. As an example in o2iblnd:
          1. kiblnd_peer_connect_failed() and kiblnd_disconnect_conn() call kiblnd_peer_notify() which calls lnet_notify() to set the peer to dead if there was an error
      2. A message is received in lnet_parse()
        1. In this case the peer state is set to alive only for gateway peer NIs
      3. When the router checker ping is responded to or it fails.
      4. If the router checker ping times out.
  3. This step only concerns routers: Only send the message if the peer is alive, determined as outlined above.
  4. On the router if the NI hasn't received any traffic for a period of router_ping_timeout + MAX(live_router_check_interval, dead_router_check_interval) then it's marked down.
    1. This is done in order for the peers using the router to mark the peer down when the avoid_asym_router_failure is set to 1, which it is by default.

LNet Multi-Rail Routing

Multi-Rail introduced the concept of a peer and a peer NI. A peer can have multiple peer NIs. This changes the semantics of route configuration. Currently a route can be configured as:

...

Nodes on different networks will use different primary NIDs to refer to the same router. IE a primary NID is only a representation of the router on the peer with the route configured.

Multi-Rail Router Requirements

  1. Do not put message on the wire if the health of a peer_ni is below MAX_HEALTH * rtr_sensitivity_percentage
  2. Attempt to recover an unhealthy peer_ni once per second by pinging it
  3. LND shall notify LNet whenever it determines a peer_ni is alive or dead. That will result in the adjustment of the peer_ni's health value.
  4. LNet shall call an LND API to notify that a peer_ni is dead whenever the peer_ni's health goes below MAX_HEALTH * rtr_sensitivity_percentage

Multi-Rail Route Requirements

  1. A route is considered down if there are no viable peer_nis on the remote net of the gateway
    1. EX: if a route is defined as: lnetctl route add --net tcp2 --gateway 10.10.10.3@tcp, then if 10.10.10.3@tcp has not peer_nis which are healthy on tcp2, then that route is dead
  2. A gateway is consider down under two circumstances:
    1. All remote nets reported in the REPLY to the PING are down
    2. All local representation of the peer_nis on the remote net have a health value below: MAX_HEALTH * rtr_sensitivity_percentage

Configuration

A router can be configured as follows to utilize the new health infrastructure

  1. lnet_health_sensitivity = 1 ## this will set the decrement the health of the NI by 1 everytime there is a failure to send to that interface
  2. router_sensitivity_percentage = 100 ## this will consider the route down if NI's health is lower than LNET_MAX_HEALTH_VALUE
  3. Optionally we can set retry_count > 0 ## this will attempt to resend a message on a different NI if one is available

Route Selection

Currently a route is selected based on the priority and hops value given to it, after that the credits for the peer NI are evaluated. With Multi-Rail there should be a two evaluation factors in the selection process.

...