...
Interface Selection and Message Sending Requirements
Note to test NUMA proximity, you can use python psutil to bind a process to a specific CPU then execute a write/read operation to the FS on that CPU.
The CPU distances can be acquired from /proc/sys/lnet/cpu_partition_distance
The NUMA cpu list can be acquired from /sys/devices/system/node/node*/cpulist
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|
snd-005 | | UT-0220 | - Configure 3 NIs with equadistant NUMA distance
- Send three or more messages
- Dump statistics on each NI to verify that each NI was used to send messages
|
snd-010 | snd-015 | UT-0225 | - Configure 3 NIs closer to different NUMA nodes
- dump the NI statistics
- Verify that each NI has the correct device CPT
|
snd-020 | | UT-0230 | - Configure 3 NIs with different NUMA distances
- Send messages
- Confirm through statistics that messages are being sent over the nearest NI (NUMA wise)
|
snd-020 | | UT-0235 | - Configure 2 NIs with different NUMA distances
- Send messages
- Confirm through statistics that messages are being sent over the nearest NI (NUMA wise)
- add another NI which is close NUMA wise than the current nearest
- confirm through statistics that messages are not being sent over the newly added NI
|
snd-030 | | UT-0240 | - Configure 3 NIs, one EDR, one FDR and one QDR
- set the NUMA range to a large value so all NIs are considered through RR
- start traffic
- monitor statistics on each NI.
- Confirm that EDR is preferred until it becomes saturated, then FDR is selected then QDR
|
snd-030 | | UT-0245 | - Configure 3 NIs
- set the NUMA range to a large value so all NIs are considered through RR
- start traffic
- monitor statistics on each NIs to confirm all are being used.
- Remove one of the NIs
- Confirm that that NI is no longer used for new messages
- Confirm that the other 2 NIs are being used.
- No messages should be dropped.
|
snd-035 | | UT-0250 | - Configure 3 NIs
- Configure a peer with 3 NIDs
- Send messages to the peer
- Confirm through statistics that peer NIDs are being used based on their available credits.
|
snd-040 | | UT-0255 | - Configure 3 NIs which are not equadistant all on the same network
- configure a peer with 3 NIDs all on the same network
- start traffic
- Confirm closest NUMA NI is being used
- Confirm peer NIDs are being used
- set NUMA range to a large value
- Confirm all NIs are being used
- Confirm no change in traffic pattern to the peers
|
snd-045 | snd-070 | UT-0260 | - Configure NIs A, B and C
- Configure the peer with the same NIDs
- Send 1 message which requires a response from NI A
- Confirm that responses are being sent to the same NI
|
snd-050 | | UT-0265 | - Configure NIs A, B and C
- Configure the peer with the same NIDs
- Send 1 message which requires a response from NI A
- bring down NI A
- confirm that response is sent to one of the other configured NIDs
|
snd-050 | | UT-0270 | - Configure an MR system
- Start traffic
- monitor traffic is being sent to all configured peers
- bring down one of the peer NIDs
- monitory traffic is no longer sent to that peer NID
- no messages should be dropped
|
snd-050 | snd-060, snd-075 | UT-0275 | - Configure an MR system
- Start traffic
- monitor traffic is being sent to all configured peers
- bring down one of the peer NIDs
- monitory traffic is no longer sent to that peer NID
- bring up the peer NID again
- monitor traffic is being sent to it again
|
snd-055 | snd-060, snd-075 | UT-0280 | - Configure an MR system
- Start traffic
- monitor traffic is being sent to all configured peers
- bring down one of the peer NIDs
- monitory traffic is no longer sent to that peer NID
- bring down all peer NIDs
- message should fail.
|
snd-055 | snd-060, snd-075, snd-085 | UT-0285 | - Configure an MR system
- Start traffic
- monitor traffic is being sent to all configured peers over all NIs
- bring down the local NIs one by one
- note traffic is migrated to the NIs still up, until no NIs are left then messages are dropped
|
snd-055 | snd-060, snd-075, snd-085 | UT-0290 | - Configure an MR system
- Start traffic
- monitor traffic is being sent to all configured peers over all NIs
- bring down the local NIs one by one
- note traffic is migrated to the NIs still up, until no NIs are left then messages are dropped
- bring up the NIs again and confirm that NIs are being reused.
|
snd-055 | snd-060, snd-075 | UT-0295 | - Configure two networks tcp and o2ib
- Configure nodes to have multiple interfaces on each of the networks
- start traffic over the o2ib network
- o2ib should be used
- bring down the o2ib network
- traffic should migrage to the tcp network.
- no traffic should be dropped.
|
snd-080 | | UT-0300 | - Configure an MR system
- bring down an NI
- confirm that the show info shows the NI as down
|
snd-080 | | UT-0305 | - TODO: how do we test device failure?
|
| | UT-0310 | - Configure an MR system
- Configure peers via DLC
- Run traffic
- Delete one of the peer_nis we're sending to via DLC
- Traffic going over that peer_ni should continue but no more traffic should use that NI
|
| | UT-0315 | - Configure an MR system
- Configure peers via DLC
- Run traffic
- Delete one of the peer_nis we're sending to via DLC
- Bring that peer_ni back
- Note traffic stops and starts on that peer with no traffic loss
- Repeat the deletion and reconfiguration of the peer_ni
|
| | UT-0320 | - Configure an MR system
- Configure peers via DLC
- Run traffic
- Delete the entire peer
- The peer should be recreated on the next message, but it won't be MR capable.
|
...