Step 1

Load and initialize the LNet module:

modprobe lnet
lnetctl lnet configure

Step 2

Configure local Networks. For example, say you have 3 IB interfaces, ib0, ib1 and ib2, then:

lnetctl net add --net o2ib --if ib0,ib1,ib2

Run this step on all nodes.

The first interface becomes the primary NID of the node.

Step 3

Verify Multi-Rail functionality

lnetctl discover <primary_nid>

This will run the Discovery protocol between the node and the peer identified by primary_nid. Look at the diagram here for more details on the discovery handshake.

Step 4

Verify Multi-Rail performance. Once the configuration is in place we can run an lnet_selftest script to verify the performance. The performance should be an aggregate of the performance of each NI.

Here is a sample selftest script

Make the following changes to the script:

  1. LTO: add the NID(s) which initiate the operations
  2. LFROM: add the NID(s) which processes the operation
  3. in the lst add_test line make sure to remove the --distribute option if the test is a 1-1 test. Otherwise the --distribute should be set to the LTO:LFROM ratio. For example if you're running a test between 2 LTO nodes and 3 LFROM nodes, then --distribute 2:3.

  4. There are two types of tests: read and write. It's important to note that LNet doesn't do any RDMA reads. It'll only do RDMA writes. In the write test the LTO will RDMA write to LFROM and for the read test LFROM will write to LTO. Basically in the read test the LTO will setup the receive buffers and request that data be written to these buffers.


# load self test on each node involved in the test
modprobe lnet_selftest
# on the node managing the test run the selftest script (ex: st.sh)
./st.sh

1 Comment

  1. It seems like this should make it into the ops manual as the definitive place for this type of documentation.