Page properties | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Table of Contents
Introduction
...
Details on how these structures are built are described in the following sections.
Primary NID
Both LNet and users of LNet like PtlRPC and LDLM assume that a peer is identified by a single NID. In order to minimize the impact of the changes to LNet on its users, a primary NID will be selected from a peer's NIDs, and this primary NID will be presented to the users of LNet.
The only hard limitation on the primary NID of a peer is that it must be unique within the cluster. The section on Primary NIDs below goes into more detail.
PTLRPC
The PtlRPC subsystem will be changed to tell LNet whether the messages it sends to a peer may go over whatever local NI/peer NI combination or whether a specific peer NI should be preferred. The distinction is here between a PtlRPC request, which can be sent over whichever path seems most suitable, and a PtlRPC response which should be sent to the peer NI from which the request message was received.
...
We can either add an explicit CPT field to lnet_md_t
and struct lnet_libmd
, or build on the existing CPT-aware interfaces and modify how they pick the CPT to better match our requirements.
IOCTL Handling
Adding NI
Handling of the new ADD_NI IOCTL will be done in the module.c:lnet_ioctl()
There will not be any parsing required, as all the string parsing will be done in user space.
Anchor Primary NIDs Primary NIDs
Primary NIDs
Primary NIDs | |
Primary NIDs |
The assumption that a peer can be identified by a single, unique, NID is deeply embedded in parts of the code. Unfortunately these include the public interfaces of LNet.
- match entries (
struct lnet_match_info
) have the peer's NID is one of the possible match criteria. - events (
lnet_event_t
), identify the initiator peer by its NID.
For match entries we will translate from the source NID to the primary NID prior to checking for a match. There is an exception in early Discovery because then the primary NID of the peer is not yet known. However, this case is completely contained within LNet.
For events, LNet will provide the primary NID in the initiator
field. Event handlers may also need the actual source NID so a source
field will be added to lnet_event_t
.
The primary user of LNet in the Lustre code is PtlRPC and the OBD and LDLM layer built on top of that which are strongly intertwined with PtlRPC. (Both of these peek into PtlRPC data structures.)
- The
c_peer
field ofstruct ptlrpc_connection
identifies the peer by a NID. - The
rq_peer
field ofstruct ptlrpc_request
identifies the peer by a NID.
The rq_peer
field is set to the primary NID. Since we want PtlRPC to be able to route responses to a specific source NID, a new field, rq_source
is added for that purpose.
ptlrpc_uuid_to_peer()
may need to be changed to map the selected peer NID to the primary NID of that peer.
target_handle_connect()
is a place outside PtlRPC that peeks into PtlRPC datastructures to find a peer's NID. Setting rq_peer
to the primary NID should suffice.
ldml_flock_deadlock()
looks at c_peer
when doing deadlock detection.
IOCTL Handling
Adding NI
Handling of the new ADD_NI IOCTL will be done in the module.c:lnet_ioctl()
There will not be any parsing required, as all the string parsing will be done in user space.
Code Block |
---|
lnet_add_ni(nid, |
Code Block |
lnet_add_ni(nid, tunables...)
{
net = NID2NET(nid);
/* lnet_find_or_create_net()
* if net is not created already create it.
* if net was just created run the selection net rules using:
* lnet_selection_run_net_rule()
*/
rc = find_or_create_net(net, &n);
if (rc != 0)
return -rc;
/* make sure that nid doesn't already exist in that net */
rc = add_ni_2_net(nid, tunables);
if (rc != 0)
/* delete net if empty */
lnet_del_net(net);
return -rc;
/* run applicable rules */
/* lnet_selection_run_nid_rules()
* Given the nid of the newly added ni, see if that nid matches any defined rules and
* assign the priority accordingly
*/
if (lnet_selection_run_nid_rules(ni->nid, &ni->priority))
/* print an error and increment error counters, but don't fail */
/* lnet_selection_run_peer_rules()
* Given the newly added ni, see if any of the peer rules match the new NI
* and create an association between that ni and any remote peer which matches
* the rule. So if there already exists a rule that matches both this new NI and
* an existing peer then create an association between the pair.
*/
if (!lnet_selection_run_peer_rules(ni, 0))
/* print an error and increment error counters, but don't fail */
/* startup the LND with user specified tunables */
rc = startup_lndni(ni, tunables...);
if (rc != 0)
return -rc;
} |
...
Receiving a message may trigger Discovery.
Backward Compatibility
The features of the existing code noted in the Primary NIDs section imply that a multi-rail capable node should always use the same source NI when sending messages to a non-multi-rail capable node. The likely symptoms of failing to do this include spurious resets of PtlRPC connections, but also more subtle problems like failures to detect flock deadlocks.
Dynamic Peer Discovery
Dynamic Peer Discovery ("Discovery" for short) is the process by which a node can discover the network interfaces it can reach a peer on without being pre-configured. This involves sending a ping to the peer. The ping response carries a flag bit to indicate that the peer is multi-rail capable. If it is the node then pushes its own network interface information to the peer. This protocol distributes the network interface information to both nodes and subsequently the nodes can excercise the peer network interfaces as well as its own, as described in further detail in this section. Discovery can be enabled, disabled or in verification mode. If it is in verification mode, then it will cross reference the discovered peer NIDs with the configured NIDs and complain if there is a discrepancy, but will continue to use the configured NIDs. cfg-085, dyn-005, dyn-010, dyn-015, dyn-020, dyn-025, dyn-030, dyn-035, dyn-040, dyn-045, dyn-050, dyn-055, dyn-060, dyn-065
...