Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Interface Selection and Message Sending Requirements

Note to test NUMA proximity, you can use python psutil to bind a process to a specific CPU then execute a write/read operation to the FS on that CPU. 

The CPU distances can be acquired from /proc/sys/lnet/cpu_partition_distance

The NUMA cpu list can be acquired from /sys/devices/system/node/node*/cpulist

Primary Requirement IDSecondary Requirement IDUnit Test IDUnit Test Description
snd-005 UT-0220
  • Configure 3 NIs with equadistant NUMA distance
  • Send three or more messages
  • Dump statistics on each NI to verify that each NI was used to send messages
snd-010snd-015UT-0225
  • Configure 3 NIs closer to different NUMA nodes
  • dump the NI statistics
  • Verify that each NI has the correct device CPT
snd-020 UT-0230
  • Configure 3 NIs with different NUMA distances
  • Send messages
  • Confirm through statistics that messages are being sent over the nearest NI (NUMA wise)
snd-020 UT-0235
  • Configure 2 NIs with different NUMA distances
  • Send messages
  • Confirm through statistics that messages are being sent over the nearest NI (NUMA wise)
  • add another NI which is close NUMA wise than the current nearest
  • confirm through statistics that messages are not being sent over the newly added NI
snd-030 UT-0240
  • Configure 3 NIs, one EDR, one FDR and one QDR
  • set the NUMA range to a large value so all NIs are considered through RR
  • start traffic
  • monitor statistics on each NI.
  • Confirm that EDR is preferred until it becomes saturated, then FDR is selected then QDR
snd-030 UT-0245
  • Configure 3 NIs
  • set the NUMA range to a large value so all NIs are considered through RR
  • start traffic
  • monitor statistics on each NIs to confirm all are being used.
  • Remove one of the NIs
  • Confirm that that NI is no longer used for new messages
  • Confirm that the other 2 NIs are being used.
  • No messages should be dropped.
snd-035 UT-0250
  • Configure 3 NIs
  • Configure a peer with 3 NIDs
  • Send messages to the peer
  • Confirm through statistics that peer NIDs are being used based on their available credits.
snd-040 UT-0255
  • Configure 3 NIs which are not equadistant all on the same network
  • configure a peer with 3 NIDs all on the same network
  • start traffic
  • Confirm closest NUMA NI is being used
  • Confirm peer NIDs are being used
  • set NUMA range to a large value
  • Confirm all NIs are being used
  • Confirm no change in traffic pattern to the peers
snd-045snd-070UT-0260
  • Configure NIs A, B and C
  • Configure the peer with the same NIDs
  • Send 1 message which requires a response from NI A
  • Confirm that responses are being sent to the same NI
snd-050 UT-0265
  • Configure NIs A, B and C
  • Configure the peer with the same NIDs
  • Send 1 message which requires a response from NI A
  • bring down NI A
  • confirm that response is sent to one of the other configured NIDs
snd-050 UT-0270
  • Configure an MR system
  • Start traffic
  • monitor traffic is being sent to all configured peers
  • bring down one of the peer NIDs
  • monitory traffic is no longer sent to that peer NID
  • no messages should be dropped
snd-050snd-060, snd-075UT-0275
  • Configure an MR system
  • Start traffic
  • monitor traffic is being sent to all configured peers
  • bring down one of the peer NIDs
  • monitory traffic is no longer sent to that peer NID
  • bring up the peer NID again
  • monitor traffic is being sent to it again
snd-055snd-060, snd-075UT-0280
  • Configure an MR system
  • Start traffic
  • monitor traffic is being sent to all configured peers
  • bring down one of the peer NIDs
  • monitory traffic is no longer sent to that peer NID
  • bring down all peer NIDs
  • message should fail.
snd-055snd-060, snd-075, snd-085UT-0285
  • Configure an MR system
  • Start traffic
  • monitor traffic is being sent to all configured peers over all NIs
  • bring down the local NIs one by one
  • note traffic is migrated to the NIs still up, until no NIs are left then messages are dropped
snd-055snd-060, snd-075, snd-085UT-0290
  • Configure an MR system
  • Start traffic
  • monitor traffic is being sent to all configured peers over all NIs
  • bring down the local NIs one by one
  • note traffic is migrated to the NIs still up, until no NIs are left then messages are dropped
  • bring up the NIs again and confirm that NIs are being reused.
snd-055snd-060, snd-075UT-0295
  • Configure two networks tcp and o2ib
  • Configure nodes to have multiple interfaces on each of the networks
  • start traffic over the o2ib network
  • o2ib should be used
  • bring down the o2ib network
  • traffic should migrage to the tcp network.
  • no traffic should be dropped.
snd-080 UT-0300
  • Configure an MR system
  • bring down an NI
  • confirm that the show info shows the NI as down
snd-080 UT-0305
  • TODO: how do we test device failure?
  UT-0310
  • Configure an MR system
  • Configure peers via DLC
  • Run traffic
  • Delete one of the peer_nis we're sending to via DLC
  • Traffic going over that peer_ni should continue but no more traffic should use that NI
  UT-0315
  • Configure an MR system
  • Configure peers via DLC
  • Run traffic
  • Delete one of the peer_nis we're sending to via DLC
  • Bring that peer_ni back
  • Note traffic stops and starts on that peer with no traffic loss
  • Repeat the deletion and reconfiguration of the peer_ni
  UT-0320
  • Configure an MR system
  • Configure peers via DLC
  • Run traffic
  • Delete the entire peer
  • The peer should be recreated on the next message, but it won't be MR capable.

...