...
Depending on the network topology which the Lustre network is built on, it might be necessary to assign priorities to specific interfaces which are connected to optimized paths. In this way messages don't take more hops than necessary to get to the destination. As an example, in a dragonfly topology as diagrammed below, a node can have multiple interfaces on the same network, but some interfaces are not optimized to go directly to the destination group. So if the selection algorithm is operating without any rules, it could select a local interface which is less than optimal.
The clouds in the diagram below represents a group of LNet nodes on the o2ib network. The admin should know which node interfaces resolve to a direct path to the destination group. Therefore, giving priority for a local NID within a network is a way to ensure that messages always prefer the optimized paths.
...
| Gliffy Diagram | ||||
|---|---|---|---|---|
|
In the above diagram you can have a set of clients on the OPA network and a set of clients on the MLX network. The servers have both OPA and MLX interfaces. The OPA clients are connected on o2ib1 and MLX clients are connected on o2ib0. But there also exists a route to route between MLX and OPA. In the above scenario you might want to prefer the green path to avoid an extra hop through the router.
On the MLX clients you'd have a rule to prefer the MLX NID of the server and on the OPA clients you'd have a rule to prefer the OPA NID.
...
Preferred local/remote NID pairs
This is a finer tuned method of specifying an exact path, by not only specifying a priority to a local interface or a remote interface, but by specifying concrete pairs of interfaces that are most preferred. A peer interface can be associated with multiple local interfaces if necessary, to have a N:1 relationship between local interfaces and remote interfaces.
| Gliffy Diagram | ||||
|---|---|---|---|---|
|
Refer to Olaf's LUG 2016/LAD 2016 PPT for more context.
DLC APIs
The DLC library will provide the outlined APIs to expose a way to create, delete and show rules.
...
- determine the best network to communicate to the destination peer by looking at all the LNet networks the peer is on.
- for each network go through all the local NIs and keep track of the best_ni based on:
- NUMA distance
- available credits
- round robin
- As you visit each network select the best_ni from the network with the highest priority. Skip any networks which are lower priority than the "active" one. If there are multiple networks with the same priority then the best_ni is selected from amongst them using the stated criteria.
- Once the best_ni has been selected, select the best peer_ni available by going through the list of the peer_nis on the selected network. Select the peer_ni based on:
- if the NID of the best_ni is on the preferred local NID list of the peer_ni. It is placed there through the application of the peer to peer rules.
- available credits
- round robin
...
- using the stated criteria.
- Once the best_ni has been selected, select the best peer_ni available by going through the list of the peer_nis on the selected network. Select the peer_ni based on:
- if the NID of the best_ni is on the preferred local NID list of the peer_ni. It is placed there through the application of the peer to peer rules.
- available credits
- round robin
Misc
As an example, in a dragonfly topology as diagrammed below, a node can have multiple interfaces on the same network, but some interfaces are not optimized to go directly to the destination group. So if the selection algorithm is operating without any rules, it could select a local interface which is less than optimal.
The clouds in the diagram below represents a group of LNet nodes on the o2ib network. The admin should know which node interfaces resolve to a direct path to the destination group. Therefore, giving priority for a local NID within a network is a way to ensure that messages always prefer the optimized paths.
| Gliffy Diagram | ||||||
|---|---|---|---|---|---|---|
|
...