Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This appraoch would add the LNet resiliency required and avoid the many corner cases that will need to be addressed when receiving message which have already been processed.

Resiliency vs. Reliability

There are two concepts that need to stay separate. Reliability of RPC messages and LNet Resiliency. This feature attempts to add LNet Resiliency against local and immediate next hop interface failure. End-to-end reliability is to ensure that upper layer messages, namely RPC messages, are received and processed by the final destination, and take appropriate action in case this does not happen. End-to-end reliability is the responsibility of the application that uses LNet, in this case ptlrpc. Ptlrpc already has a mechanism to ensure this.

Roughly, LNet would be analogonous to the IP layer and ptlrpc is analogonous to the TCP layer.

O2IBLND

Overview

There are two types of events to account for:

...