...
- find the router you can reach the dst_nid on (router selection already considers router health using the existing mechanism. Currently a router is alive or dead, discovered via router pings and controlled by tunables such as asynchronous route failure)
- If this is a resend and the peer_ni is unhealthy fail the send
- If this is an original send, then use the peer_ni even if it's not healthy.
- select the best_ni to send from by going through all the local nis that can reach the router NID
- consider local_ni health in the selection by selecting the local_ni with the best health value.
- If this is a resend do not select a local_ni that has already been used.
- send over that path
- find the router you can reach the dst_nid on (router selection already considers router health using the existing mechanism. Currently a router is alive or dead, discovered via router pings and controlled by tunables such as asynchronous route failure)
(Olaf): trying to rewrite the above in a way that incorporates the single source to NMR destination requirement, and highlights commonalities in the logic flow
- find route to
dst_nid
- find
peer_ni
of router- no issue if
peer_ni
is healthy - try this
peer_ni
even if it is unhealthy if this is the 1st attempt to send this message - fail if resending to an unhealthy
peer_ni
- no issue if
- pick the preferred NI for the
dst_nid
if set- otherwise pick a healthy local NI and make it the preferred NI for this
dst_nid
- otherwise pick a healthy local NI and make it the preferred NI for this
- send over this path
Work Items
- Health Value Maintenance/Demerit system
- Selection based on Health Value and not resending over already used interfaces
- Handling the new events in IBLND and passing them to LNet
- Handling the new events in SOCKLND and passing them to LNet
- Adding LNet level transaction timeout and cancelling a resend on timeout
- Handling timeout case in ptlrpc
...