Page properties | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
colour | Green |
|
Table of Contents
Introduction
...
The intent for the first revision of this document is to target sign-off by all stakeholders. Subsequently as the implementation work is divided into phases, multiple other documents will be created as needed detailing the design further. This document will be updated with reference links to the other detailed design documents.
Reference Documents
Document Link |
---|
Multi-Rail Scope and Requirements Document |
Document Structure
This document is made up of the following sections:
...
Kernel Space: Describes the details of Kernel Space changes including the Dynamic Discovery Behavior
Acronym Table
Acronym | Description |
---|---|
LNet | Lustre Network |
NI | Network Interface |
RPC | Remote Procedure Call |
FS | File System |
o2ib | Infiniband Network |
TCP | Ethernet TCP-layer Network |
NUMA | Non-Uniform Memory Access |
RR | Round Robin |
CPT | CPU Partition |
CB | Channel Bonding |
NID | Network Identifier |
downrev | Node with no Multi-Rail |
uprev | Node with Multi-Rail |
Design Overview
System level
...
Code Block |
---|
eth[1,2,3], eth[1-4/2] |
Expression Structural Form | Description | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Figure 4: syntax descriptor | An expression can be a number:
An express can be a wild card
An expression can be a range
An expression can be a range and an increment
|
When passing the built structural format to the kernel it will need to be serialized, in order not to pass pointers between user space and kernel space.
...
The state of a peer is a combination of the following bits of information, where the flags can be found in the source code by prepending LNET_PEER_
, so CONFIGURED
becomes LNET_PEER_CONFIGURED
.
CONFIGURED
: The peer was configured via DLC.DISCOVERED
: The peer has been discovered.UNDISCOVERED
: Peer discovery was disabled when the peer was created.
Configuration via DLC overrides peer discovery, but does not prevent the discovery algorithm from processing a peer. The algorithm complains if it finds differences between the configuration and what the peer reports. As such the CONFIGURED
and DISCOVERED
flags can both be set on a peer.
The UNDISCOVERED
state is used to indicate that a peer has been seen by discovery, but not been updated because discovery is disabled. It signals that a peer only needs to be re-examined if discovery is enabled.
QUEUED
: Peer is queued for discovery.DISCOVERING
: Discovery is active for the peer.DATA_PRESENT
: Peer data is available to update the peer.NIDS_UPTODATE
: Discovery has successfully updated the NIDs of the peer.PING_SENT
: Discovery has sent a Ping to the peer and is waiting for the Reply.PUSH_SENT
: Discovery has sent a Push to the peer and is waiting for the Ack.PING_FAILED
: Sending a ping to the peer failed.PUSH_FAILED
: Sending a push to the peer failed.PING_REQUIRED
: Discovery must Ping the peer.
The QUEUED
flag is used to determine whether a peer is on the ln_dc_request
or ln_dc_working
queues via its lp_dc_list
member. A peer is queued by lnet_peer_queue_for_discovery()
and dequeued by lnet_peer_discovery_complete()
.
The DISCOVERING
flag indicates that peer discovery is looking at the peer. When it is cleared, one of DISCOVERED
or UNDISCOVERED
is set.
The DATA_PRESENT
flag is set by the event handler for an incoming Push if it successfully stores the data, and by the event handler for an incoming Reply to a Ping. These event handlers run with spinlocks held, which is why we postpone the complex operation of updating the peer until the discovery thread can do it. The discovery thread processes the data and updates the peer by calling lnet_peer_data_present()
, which clears the flag.
The NIDS_UPTODATE
flag is used to indicate that the NIDs for the peer are believed to be known. It is cleared when data is received that indicates that the peer may have changed, like an incoming Push. If storing the data from an incoming Push fails we cannot set the DATA_PRESENT
flag but do clear NIDS_UPTODATE
to indicate that the peer must be re-examined.
The PING_SENT
flag indicates that a Ping has been sent and we are waiting for a Reply message. The implication is that lp_ping_mdh
is live and has an MD bound to it.
The PUSH_SENT
flag indicates that a Push has been sent and we are waiting for an Ack message. The implication is that lp_push_mdh
is live and has an MD bound to it.
The PING_FAILED
flag indicates that an attempted Ping failed for some reason. In addition to LNet messaging failures, a Ping fails if the Reply does not fit in the pre-allocated buffer.
The PUSH_FAILED
flag indicates that an attempted Push failed for some reason. The node sending the Push only sees a failure if LNet messaging reports one.
The PING_REQUIRED
flag indicates that a Ping is necessary to properly determine the state of a peer. Triggering a Ping is the mechanism by which discovery attempts to recover from any problems it may have encountered while processing a peer.
MULTI_RAIL
: This flag indicates whether a peer is running a multi-rail aware version of Lustre.- If
MULTI_RAIL
is set, thenlp_node_seqno
contains the last ping source sequence number of the node that has been received by the peer.
The following discussion must be updated – OW
- L: Local config sent to peer
- P: Peer config merged
- M: Multi-rail capable peer
- D: Data received from peer, not yet merged
- R: Reply to ping pending
- A: Ack pending
- Q: Queued for the discovery thread to work on
- C: Configured by DLC
- S: Size of MD buffers need to be increased
...
- INIT - pre state. Only transitory
- CREATED - peer_ni created but no active connections exists.
- ACTIVATING - 1st message sent to the peer_ni, but has not completed yet
- CONNECTED - 1st message sent successfully
- FAILED - A message (1st or after) has failed to send
- DELETING - A dynamic update or a config delete removes that peer_ni
Sign-off
Name | Status |
---|---|
Signed Off | |
Signed Off | |
Signed Off | |
Robert Read (optional) | |
SGI PAC | Signed Off |
Appendix
Various Comments and Older Notes
...