The following assumes that underlying network connectivity has been tested (using iperf for example) as was found to be working as expected for reliability and performance.
It is also assumed that LNet routing has been configured at least such that necessary routes are added on the LNet nodes and, accordingly, there's at least one node designated as LNet router.
Knowledge of lnet_selftest and configuring module parameters via modprobe is also assumed.
Useful Commands
Here's a list of commands which the verification procedure is going to rely on:
- lnetctl global show
- lnetctl net show -v 4
- lnetctl peer show -v 4
- lnetctl route show -v 4
- lnetctl routing show
- lnetctl discover
- lnetctl ping
- top
Scenarios
Two scenarios shall be considered:
Single-Hop LNet Routing
In this case, the setup looks like this:
A <-LNet1-> R <-LNet2-> B
Where A and B are Lustre endpoints and R is an LNet router. LNetX can be any lnet, for example "tcp0" or "o2ib100", the only requirement is that LNet1 and LNet2 are different.
Multi-Hop LNet Routing
TODO
LNet Configuration Verification
Routes
- Use "lnetctl route show -v 4" on A and B to make sure routes are listed as "UP"
- On R, use "lnetctl routing show" to make sure R is setup as a router
- On R, use "lnetctl peer show -v 4" to see that A and B are listed as peers. They should be pinging R periodically if they are configured to use it even when there's no traffic.
Credits
- Use "lnetctl net show -v 4" to check the peer_credits parameter. This parameter limits the number of in-flight messages to the same peer. It should be set to the recommended value for respective LND.
- Use "lnetctl net show -v 4" to check the credits parameter. This parameter limits the total number of in-flight messages. This may need to be increased to (peer_credits x number_of_peers), where number_of_peers is the expected number of local peers the node is going to be connecting to locally, e.g. number of routers.
Router Buffers
This is applicable only to the router node R. The buffers are used to hold the messages being forwarded.
There are "tiny", "small" and "large" buffers. If memory size allows, the numbers for each should be increased to ((number_of_peers_on_LNet1 x peer_credits_on_LNet1) + (number_of_peers_on_LNet2 x peer_credits_on_LNet2))
"Small" buffers are 4K bytes, "large" are 1M and "small" are a only a few bytes.
Basic Connectivity Verification
- Use "lnetctl ping" to ping between A and B, repeat multiple times
- If A and B are MR, use "lnetctl net show -v 4" to make sure all of A's NIDs on LNet1 and all of B's NIDs on LNet2 are used in round-robin
Load-Testing
If all of the above looks good, lnet_selftest can be used to check that LNet performance under load. A few reminders on lnet_selftest usage:
- To test between A and B, the nodes need to have discovered each other. If necessary, use "lnetctl discover" to perform discovery. Then use respective primary NIDs as shown by "lnetctl peer show" to identify the nodes.
- Optimal transfer size is 1M - LNet-level "MTU"
- Concurrency parameter can be varied to achieve better performance
Problem Isolation
If the performance results are not satisfactory, it may be helpful to isolate the problem to a particular node or LNet, for example:
- pick another node on the same LNet as A and run lnet_selftest between A and this node (A1)
- pick another node on the same LNet as B and run lnet_selftest between B and this node (B1)
- pick another router node and run lnet_selftest beween R and the other router R1, exercise LNet1 and LNet2 nids as separate tests
Tuning SockLND
If at this point the performance results are not satisfactory, there's still a chance certain parameter adjustments can make improvements. This section discusses SockLND tuning.
conns_per_peer
For peers on "tcpx" lnets, check the conns_per_peer value in the "lnetctl net show -v 4" output. Heuristically determined optimal settings are 4 for 100Gbps link and higher, 3 for 50Gbps link, 2 for 5-10Gbps and 1 for anything less. It is possible that in some situations increasing this parameter beyond the recommended value may help improve performance.
nscheds
Use "top" to check on socklnd threads while lnet_selftest (or any other test, e.g. FIO, is running). If socklnd threads are seen to be fully loaded, it may be beneficial to increase nscheds value. It makes sense to increase it to a value between conns_per_peer and (conns_per_peer x 2)
Troubleshooting
- Check MTU size configured for the interfaces facing certain LNet. It should be matching across the LNet nodes as well as the network switch configuration.
- Check CPU utilization/core affinitization on LNet nodes, especially "clients". If LNet is sharing the cores its been assigned to use with some other process, the performance may suffer. If this is the case, cpu_pattern and cpu_npartitions libcfs parameters may need to be changed