...
The user interface is recorded here.
Use Cases
Preferred Network
If a node can be reached on two networks, it is sometimes desirable to designate a fail-over network. Currently in lustre there is the concept of High Availability (HA) which allows servicenode nids to be defined as described in the lustre manual section 11.2. By using the syntax described in that section, two nids to the same peer can also be defined. However, this approach suffers from the current limitation in the lustre software, where the NIDs are exposed to layers above LNet. It is ideal to keep network failures handling contained within LNet and only let lustre worry about defining HA.
Given this it is desirable to have two LNet networks defined on a node, each could have multiple interfaces. Then have a way to tell LNet to always use one network until it is no longer available, IE: all interfaces in that network are down.
In this manner we separate the functionality of defining fail-over pairs from defining fail-over networks.
Preferred NIDs
Depending on the network topology which the Lustre network is built on, it might be necessary to assign priorities to specific interfaces which are connected to optimized paths. In this way messages don't take more hops than necessary to get to the destination. As an example, in a dragonfly topology as diagrammed below, a node can have multiple interfaces on the same network, but some interfaces are not optimized to go directly to the destination group. So if the selection algorithm is operating without any rules, it could select a local interface which is less than optimal.
Therefore, giving priority for a local NID within a network is a way to ensure that messages always prefer the optimized paths.
| Gliffy Diagram | ||||
|---|---|---|---|---|
|
Preferred local/remote NID pairs
This is a finer tuned method of specifying an exact path, by not only specifying a priority to a local interface or a remote interface, but by specifying concrete pairs of interfaces that are most preferred. A peer interface can be associated with multiple local interfaces if necessary, to have a N:1 relationship between local interfaces and remote interfaces.
DLC APIs
The DLC library will provide the outlined APIs to expose a way to create, delete and show rules.
...