Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code exists to handle receiving a reply for the ping. When a REPLY is received a function is called to analyze the NIs in the REPLY. This analysis revolves around checking the status of each of the peer interfaces. If the asymmetric router failure is set and one of the interfaces provided in the REPLY is down, then the gateway is marked down.

...

  1. Turning on sensitivity to > 0 will track the health of the peer NIs
  2. When sending messages the healthiest peer NI is selected. This reduces the need to drop messages before they are passed to the LND.
    1. The current code has an UP/DOWN behavior. There is no granularity. Is that a requirement?
      1. After discussion with Cray this appears to have been introduced due to Cray's gni network, which has health baked into it. So when the gnilnd reports that a peer is alive then it's sure it's alive, similarly when it's dead.
      2. This has been considered in the requirements outlined under the "Multi-Rail Router Requirements" section.
  3. Unhealthy local and peer NIs are placed on their respective queues for recovery. This takes the place of querying peer NIs in the case of Router. Extra functionality can be added to remove the peer-NI from the recovery queue if it has not been used for a peer_timeout length of time
  4. There will be no need to ping a gateway peer NI separately to determine if it's back up. This will be done by the health code.
  5. When selecting a gateway peer ni the health of the interfaces will be considered. If the intent is to not use a gateway peer_ni if it's less than fully healthy a configuration parameter can be added to control that, router_sensitivity_percentage described above.
  6. When messages are received or sent successfully on any peer_ni it's health value of the NI is incremented, making it more likely to be used.
  7. If a router does not receive a message for a configured period of time on a local NI, then bring down the status of the local NI. That will be discovered when the router is pinged.
  8. Instead of marking a gateway as up or down, mark a route as up or down. You can have multiple routes through the same gateway. Depending on which interfaces are up on the gateway a subset of these routes could function through the gateway.
  9. Allow multiple routes to the same remote network over multiple gateways.

...