Step 1
Load and initialize the LNet module:
modprobe lnet lnetctl lnet configure
Step 2
Configure local Networks. For example, say you have 3 IB interfaces, ib0, ib1 and ib2, then:
lnetctl net add --net o2ib --if ib0,ib1,ib2
Run this step on all nodes.
The first interface becomes the primary NID of the node.
Step 3
Verify Multi-Rail functionality
lnetctl discover <primary_nid>
This will run the Discovery protocol between the node and the peer identified by primary_nid
. Look at the diagram here for more details on the discovery handshake.
Step 4
Verify Multi-Rail performance. Once the configuration is in place we can run an lnet_selftest script to verify the performance. The performance should be an aggregate of the performance of each NI.
Here is a sample selftest script
Make the following changes to the script:
- LTO: add the NID(s) which initiate the operations
- LFROM: add the NID(s) which processes the operation
in the
lst add_test
line make sure to remove the--distribute
option if the test is a 1-1 test. Otherwise the--distribute
should be set to the LTO:LFROM ratio. For example if you're running a test between 2 LTO nodes and 3 LFROM nodes, then--distribute 2:3.
- There are two types of tests: read and write. It's important to note that LNet doesn't do any RDMA reads. It'll only do RDMA writes. In the write test the LTO will RDMA write to LFROM and for the read test LFROM will write to LTO. Basically in the read test the LTO will setup the receive buffers and request that data be written to these buffers.
# load self test on each node involved in the test modprobe lnet_selftest # on the node managing the test run the selftest script (ex: st.sh) ./st.sh
1 Comment
Joseph Gmitter
It seems like this should make it into the ops manual as the definitive place for this type of documentation.