...
In the following discussion, node will often be the shorthand for local node, while peer will be shorthand for peer node or remote node.
...
PUT,
...
ACK,
...
GET,
...
REPLY
Within LNet there are three cases of interest: PutPUT, PutPUT+AckACK, and GetGET+ReplyREPLY.
- PutPUT: The simplest message type is a bare Put PUT message. This message is sent on the wire, and after that no further response is expected by LNet. In terms of error handling, this means that a failure to send can result in an error, but any kind of failure after the message has been sent will result in the message being dropped without notification. There is nothing LNet can do in the latter case, it will be up to the higher layers that use LNet to detect what is going on and take whatever corrective action is required.
- PutPUT+AckACK: The sender of a Put can ask from for an Ack ACK from the receiver. The Ack ACK message is generated by LNet on the receiving node (the peer), and this is done as soon as the Put has been received. In contrast to a bare put this means LNet on the sender can track whether an Ack ACK arrives, and if it does not promptly arrive it can take some action. Such an action can take two forms: inform the upper layers that the Ack ACK took too long to arrive, or retry within LNet itself.
- GetGET+ReplyREPLY: With a Get GET message, there is always a Reply REPLY from the receivedpeer. Prior to the GetGET, the sender and receiver arrange an agreement on the MD that the data for the Reply REPLY must be obtained from, so LNet on the receiving node can generate the Reply REPLY message as soon as the Get GET has been received. Failure detection and handling is similar to the PutPUT+Ack ACK case.
The protocols used to implement LNet tend to be connection-oriented, and implement some kind of handshake or ack protocol that tells the sender that a message has been received. As long as the LND actually reports errors to LNet (not a given, alas) this means that in practice the sender of a message can reliably determine whether the message was successfully sent to another node. When the destination node is on the same LNet network, this is sufficient to enable LNet itself to detect failures even in the bare Put case. But in a routed configuration this only guarantees that the LNet router received the message, and if the LNet router then fails to forward it, a bare Put will be lost without trace.
...
Users of LNet that send bare Put messages must implement their own methods to detect whether a message was lost. The general rule is simple: the recipient of a Put is supposed react somehow, and if the reaction doesn't happen within a set amount of time, the sender assumes that either the Put PUT was lost, or the recipient is in some other kind of trouble.
For our purposes PtlRPC is of interest. PtlRPC messages can be classified as Request+Response pairs. Both a Request and a Response are built from one or more Get or Put PUT messages. A node that sends a PtlRPC Request requires the receiver to send a Response within a set amount of time, and failing this the Request times out and PtlRPC takes corrective action.
...
The interfaces that LNet provides to the upper layers should work as follows. Set up an MD (Memory Descriptor) to send data from (for a PutPUT) or receive data into (for a Get). An event handler is associated with the MD. Then call LNetGet() or LNetPut() as appropriate.
...
The caller of LNetPut()
requests an Ack ACK by using LNET_ACK_REQ
as the value of the ack
parameter.
A Put PUT with an Ack ACK is similar to a Get+Reply pair. The events in this case are LNET_EVENT_SEND
and LNET_EVENT_ACK
.
For a PutPUT, the LNET_EVENT_SEND
indicates that the MD is no longer used by the LNet code and the caller is free do discard or re-use it.
As with send, LNET_EVENT_ACK
is expected to only carry an error indication if there was a timeout before the Ack ACK was received.
LNetPut() LNET_NOACK_REQ
The caller of LNetPut()
requests no Ack ACK by using LNET_NOACK_REQ
as the value of the ack
parameter.
A Put PUT without an Ack ACK will only generate an LNET_EVENT_SEND
, which indicates that the MD can now be re-used or discarded.
...
- Node interface reports failure. This includes the interface itself being healthy but it noting that the cable connecting it to a switch, or the switch port, is somehow not working right.
- Peer interface not reachable. A peer interface that should be reachable from the node interface cannot be reached. Depending on the error this can result in "fast" error or a timeout in the LND-level protocol.
- Some peer interfaces on a net not reachable. The node interface appears to be OK, but there are interfaces several peers it cannot talk to.
- All peer interfaces on a net not reachable. The node interface appears to be OK, but cannot talk to any peer interface.
- All interfaces of a peer not reachable. All LNDs report errors when talking to a specific peer, but have no problem talking to other peers.
- PutPUT+Ack ACK or GetGET+Reply REPLY timeout. The LND gives no failure indication, but the Ack ACK or Reply REPLY takes too long to arrive.
- Dropped PutPUT. Everything appears to work, except it doesn't.
...
LNet might treat this as the "remote interface not reachable" case for all the interfaces of the remote node. That is, without much difference due to apparently all interfaces of the remote node being down, except for a log message indicating this.
...
PUT+Ack Or Get+Reply Timeout
This is the case where the LND does not signal any problem, so the Ack for a Put PUT or Reply for a Get should arrive promptly, with the only delays due to credit-based throttling, and yet it does not do so. Note that this assumes that were possible the LND layer already implements reasonably tight timeouts, so that LNet can assume the problem is somewhere else.
...
One argument for nevertheless implementing this facility in LNet is that it means the upper layers to have to re-invent and re-implement this wheel time and again.
Dropped
...
PUT
No problem was signalled by the LND, and there is no Ack that we could time out waiting for. LNet does not have enough information to do anything, so the upper layers must do so instead.
...