Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added sketches for peer discovery unit tests. These sketches reflect the currently implemented functionality.

...

Primary Requirement IDSecondary Requirement IDUnit Test IDUnit Test DescriptionBehavior of Note
snd-005 UT-0220
  • Configure 3 NIs with equadistant NUMA distance
  • Send three or more messages
  • Dump statistics on each NI to verify that each NI was used to send messages
 
snd-010snd-015UT-0225
  • Configure 3 NIs closer to different NUMA nodes
  • dump the NI statistics
  • Verify that each NI has the correct device CPT
 
snd-020 UT-0230
  • Configure 3 NIs with different NUMA distances
  • Send messages
  • Confirm through statistics that messages are being sent over the nearest NI (NUMA wise)
 
snd-020 UT-0235
  • Configure 2 NIs with different NUMA distances
  • Send messages
  • Confirm through statistics that messages are being sent over the nearest NI (NUMA wise)
  • add another NI which is close NUMA wise than the current nearest
  • confirm through statistics that messages are not being sent over the newly added NI
 
snd-030 UT-0240
  • Configure 3 NIs, one EDR, one FDR and one QDR
  • set the NUMA range to a large value so all NIs are considered through RR
  • start traffic
  • monitor statistics on each NI.
  • Confirm that EDR is preferred until it becomes saturated, then FDR is selected then QDR
 
snd-030 UT-0245
  • Configure 3 NIs
  • set the NUMA range to a large value so all NIs are considered through RR
  • start traffic
  • monitor statistics on each NIs to confirm all are being used.
  • Remove one of the NIs
  • Confirm that that NI is no longer used for new messages
  • Confirm that the other 2 NIs are being used.
  • No messages should be dropped.
 
snd-035 UT-0250
  • Configure 3 NIs
  • Configure a peer with 3 NIDs
  • Send messages to the peer
  • Confirm through statistics that peer NIDs are being used based on their available credits.
 
snd-040 UT-0255
  • Configure 3 NIs which are not equadistant all on the same network
  • configure a peer with 3 NIDs all on the same network
  • start traffic
  • Confirm closest NUMA NI is being used
  • Confirm peer NIDs are being used
  • set NUMA range to a large value
  • Confirm all NIs are being used
  • Confirm no change in traffic pattern to the peers
 
snd-045snd-070UT-0260
  • Configure NIs A, B and C
  • Configure the peer with the same NIDs
  • Send 1 message which requires a response from NI A
  • Confirm that responses are being sent to the same NI
 
snd-050 UT-0265
  • Configure NIs A, B and C
  • Configure the peer with the same NIDs
  • Send 1 message which requires a response from NI A
  • bring down NI A
  • confirm that response is sent to one of the other configured NIDs
 
snd-050 UT-0270
  • Configure an MR system
  • Start traffic
  • monitor traffic is being sent to all configured peers
  • bring down one of the peer NIDs
  • monitory traffic is no longer sent to that peer NID
  • no messages should be dropped
 
snd-050snd-060, snd-075UT-0275
  • Configure an MR system
  • Start traffic
  • monitor traffic is being sent to all configured peers
  • bring down one of the peer NIDs
  • monitory traffic is no longer sent to that peer NID
  • bring up the peer NID again
  • monitor traffic is being sent to it again

When a peer NID is removed then added or if a new peer NID is added, then it's peer_ni->lpni_seq number will start off at 0.

In the case of credits being == then the newly added peer will always be picked until it's lpni_seq number catches up with the other peer NIs sequence numbers. See code below.

Code Block
		} else if (lpni->lpni_txcredits == best_lpni_credits) {
			/*
			 * The best peer found so far and the current peer
			 * have the same number of available credits let's
			 * make sure to select between them using Round
			 * Robin
			 */
			if (best_lpni) {
				if (best_lpni->lpni_seq <= lpni->lpni_seq)
					continue;
			}

This behavior will manifest itself in low bandwidth environment. In high bandwidth environment it is likely that the credits in the selection algorithm will be different and the peer_NI will be picked according to credits.

Another scenario to consider is when the lpni_seq number wraps. In low bandwidth environment this could cause the peer NI which wrapped to be picked until it catches up with the other peer NIs sequence numbers.

This in itself might not be significant enough, but does raise the question of the benefit of having a seq number to start with. Does it give much of a functional advantage, or having the credits criteria enough.

The same issue is present with local NI sequence numbers.

snd-055snd-060, snd-075UT-0280
  • Configure an MR system
  • Start traffic
  • monitor traffic is being sent to all configured peers
  • bring down one of the peer NIDs
  • monitory traffic is no longer sent to that peer NID
  • bring down all peer NIDs
  • message should fail.
 
snd-055snd-060, snd-075, snd-085UT-0285
  • Configure an MR system
  • Start traffic
  • monitor traffic is being sent to all configured peers over all NIs
  • bring down the local NIs one by one
  • note traffic is migrated to the NIs still up, until no NIs are left then messages are dropped
 
snd-055snd-060, snd-075, snd-085UT-0290
  • Configure an MR system
  • Start traffic
  • monitor traffic is being sent to all configured peers over all NIs
  • bring down the local NIs one by one
  • note traffic is migrated to the NIs still up, until no NIs are left then messages are dropped
  • bring up the NIs again and confirm that NIs are being reused.
 
snd-055snd-060, snd-075UT-0295
  • Configure two networks tcp and o2ib
  • Configure nodes to have multiple interfaces on each of the networks
  • start traffic over the o2ib network
  • o2ib should be used
  • bring down the o2ib network
  • traffic should migrage to the tcp network.
  • no traffic should be dropped.
 
snd-080 UT-0300
  • Configure an MR system
  • bring down an NI
  • confirm that the show info shows the NI as down
 
snd-080 UT-0305
  • TODO: how do we test device failure?
 
  UT-0310
  • Configure an MR system
  • Configure peers via DLC
  • Run traffic
  • Delete one of the peer_nis we're sending to via DLC
  • Traffic going over that peer_ni should continue but no more traffic should use that NI
 
  UT-0315
  • Configure an MR system
  • Configure peers via DLC
  • Run traffic
  • Delete one of the peer_nis we're sending to via DLC
  • Bring that peer_ni back
  • Note traffic stops and starts on that peer with no traffic loss
  • Repeat the deletion and reconfiguration of the peer_ni
 
  UT-0320
  • Configure an MR system
  • Configure peers via DLC
  • Run traffic
  • Delete the entire peer
  • The peer should be recreated on the next message, but it won't be MR capable.
 

Dynamic NID Discovery

The unit tests for peer NID discovery depend on lctl ping not triggering discovery. To force discovery, use lctl discover. Note that some of the tests require DLC configuration to include non-existing peer NIDs. These nids are marked with a *.

Primary Requirement IDSecondary Requirement IDUnit Test IDUnit Test Description
   Basic functionality 1-1: discovery of an MR peer via its primary.
  • MR Node with interfaces N1, N2
  • MR Peer with interfaces P1, P2, P3
  • Ping P1 from node
  • Ping P2 from node
  • Verify that node sees two different peers: P1, P2.
  • Discover P1 from node
  • Verify that node sees one MR peer with three NIDS: P1, P2, P3.
  • Verify that peer sees node as one MR peer with two NIDS: N1, N2.
   Basic functionality 1-2: discovery of an MR peer via a secondary.
  • MR Node with interfaces N1, N2
  • MR Peer with interfaces P1, P2, P3
  • Ping P1 from node
  • Ping P2 from node
  • Verify that node sees two different peers: P1, P2.
  • Discover P2 from node
  • Verify that node sees one MR peer with three NIDS: P1, P2, P3.
  • Verify that peer sees node as one MR peer with two NIDS: N1, N2.
   Basic functionality 1-3: discovery of an MR peer via a tertiary.
  • MR Node with interfaces N1, N2
  • MR Peer with interfaces P1, P2, P3
  • Ping P1 from node
  • Ping P2 from node
  • Verify that node sees two different peers: P1, P2.
  • Discover P3 from node
  • Verify that node sees one MR peer with three NIDS: P1, P2, P3.
  • Verify that peer sees node as one MR peer with two NIDS: N1, N2.
   

Compatibility 2-1: discovery of a non-MR peer via its primary.

  • MR Node with interface N1, N2.
  • Non-MR Peer with interfaces P1, P2, P3.
  • Ping P1 from node.
  • Ping P2 from node.
  • Verify that node sees two different peers: P1, P2.
  • Discover P1 from node
  • Verify that node sees one non-MR peer with three NIDS: P1, P2, P3.
  • Verify that peer sees one as one peer with one NID: N1.
   

Compatibility 2-2: discovery of a non-MR peer via a secondary.

  • MR Node with interface N1, N2.
  • Non-MR Peer with interfaces P1, P2, P3.
  • Ping P1 from node.
  • Ping P2 from node.
  • Verify that node sees two different peers: P1, P2.
  • Discover P2 from node
  • Verify that node sees one non-MR peer with three NIDS: P1, P2, P3.
  • Verify that peer sees one as one peer with one NID: N1.
   

Compatibility 2-3: discovery of a non-MR peer via a tertiary.

  • MR Node with interface N1, N2.
  • Non-MR Peer with interfaces P1, P2, P3.
  • Ping P1 from node.
  • Ping P2 from node.
  • Verify that node sees two different peers: P1, P2.
  • Discover P3 from node
  • Verify that node sees one non-MR peer with three NIDS: P1, P2, P3.
  • Verify that peer sees one as one peer with one NID: N1.
   

Interaction with DLC 3-1: DLC overrides Discovery of MR peer

  • MR node with interface N1
  • MR peer with interface P1, P2, P3
  • DLC configure MR peer on node with interfaces P1, P2, P4*.
  • Discover P1 from node.
  • Verify that node sees one MR peer with three NIDS: P1, P2, P4*.
  • Verify presence of error messages on node (error code is -EPERM):
    • Error adding NID P3 to peer P1: -1
    • Error deleting NID P3 from peer P1: -1
   

Interaction with DLC 3-2: DLC overrides Discovery of non-MR peer

  • MR node with interface N1
  • non-MR peer with interface P1, P2, P3
  • DLC configure non-MR peer on node with interfaces P1, P2, P4*.
  • Discover P1 from node.
  • Verify that node sees one non-MR peer with three NIDS: P1, P2, P4*.
  • Verify presence of error messages on node (error code is -EPERM):
    • Error adding NID P3 to peer P1: -1
    • Error deleting NID P3 from peer P1: -1
   

Interaction with DLC 3-3: DLC overrides Discovery of MR peer with primary conflict

  • MR node with interface N1
  • MR peer with interface P1, P2, P3
  • DLC configure MR peer on node with interfaces P2, P3, P4*.
  • Discover P2 from node.
  • Verify that node sees one MR peer with three NIDS: P2, P3, P4*.
  • Verify presence of error message on node (error code is -EEXIST):
    • Primary NID error P2 versus P1: -17
   

Interaction with DLC 3-4: DLC overrides Discovery of non-MR peer with primary conflict

  • MR node with interface N1
  • non-MR peer with interface P1, P2, P3
  • DLC configure non-MR peer on node with interfaces P2, P3, P4*.
  • Discover P2 from node.
  • Verify that node sees one MR peer with three NIDS: P2, P3, P4*.
  • Verify presence of error message on node (error code is -EEXIST):
    • Primary NID error P2 versus P1: -17
   

Interaction with DLC 3-5: "push MR bit" exception to DLC overrides Discovery

  • MR node with interface N1
  • MR peer with interface P1, P2, P3
  • DLC configure non-MR peer on node with interfaces P1, P2, P4*.
  • Discover N1 from peer.
  • Verify that node sees one MR peer with three NIDS: P1, P2, P4*.
  • Verify presence of error message on node (error code is -EEXIST):
    • Push says P1 is Multi-Rail, DLC says not
    • Error adding NID P3 to peer P1: -1
    • Error deleting NID P3 from peer P1: -1
   

Interaction with DLC 3-6: "push MR bit" exception to DLC overrides Discovery

  • MR node with interface N1
  • MR peer with interface P1, P2, P3
  • DLC configure non-MR peer on node with interfaces P2, P3, P4*.
  • Discover N1 from peer.
  • Verify that node sees one MR peer with three NIDS: P2, P3, P4*.
  • Verify presence of error message on node (error code is -EEXIST):
    • Push says P2 is Multi-Rail, DLC says not
    • Primary NID error P2 versus P1: -17

 

Debugging Requirements

Primary Requirement IDSecondary Requirement IDUnit Test IDUnit Test Description
dbg-005 dbg-010, dbg-015, dbg-020, dbg-025, dbg-030, dbg-035, dbg-080UT-0325
  • dump per NI statistics
    • transmitted
    • received
    • dropped
    • timeouts
    • state
dbg-040dbg-080, dbg-095UT-0330
  • configure multiple NIs
  • run traffic
  • dump stats on all NIs
dbg-040dbg-080UT-0335
  • configure multiple NIs
  • run traffic
  • dump stats on all NIs
  • Filter on specific NID
dbg-045dbg-080UT-0340
  • dump LNet level statistics
dbg-050dbg-080, dbg-100UT-0345
  • configure multiple peers
  • start traffic
  • dump per peer statistics
dbg-110 UT-0350
  • configure multiple NIs
  • toggle their state from ACTIVE to DOWN
  • confirm that state change is being printed to console.
dbg-115 UT-0355
  • configure an MR system
  • start traffic
  • bring down an NI
  • confirm that messages indicating that another NI/peer is being used is printed.
dbg-120 UT-0360
  • Configure an MR system
  • run traffic
  • stop traffic
  • dump NI statistics
  • dump peer statistics
  • dump LNet level statistics
  • zero out stats
  • dump all statistics above to confirm they've been zeroed out.

...