...
Local Network Configuration
In-Range UT
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|---|---|---|
cfg-020 | cfg-005, cfg-010, cfg-015,cfg-045, cfg-055, cfg-060, cfg-065 | UT-0005 |
|
UT-0010 |
| ||
UT-0015 |
| ||
UT-0020 |
| ||
cfg-025 | cfg-005, cfg-010, cfg-015,cfg-045, cfg-055, cfg-060, cfg-065 | UT-0025 |
|
cfg-035 | cfg-040, cfg-045, cfg-055, cfg-060, cfg-065 | UT-0030 |
|
UT-0035 |
| ||
UT-0040 |
| ||
UT-0045 |
| ||
UT-0050 |
|
|
| |||
cfg-060 | cfg-065 | UT-0055 | Go through the following lnetctl commands and excercise their parameters:
|
Out-of-Range UT
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|---|---|---|
cfg-020 | cfg-005, cfg-010, cfg-015 | UT-0060 |
|
UT-0065 |
| ||
UT-0070 |
| ||
cfg-060 | cfg-065 | UT-0075 | Go through the following lnetctl commands and excercise their parameters, by providing out of range values:
|
UT-0080 |
|
Error UT
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|
UT-0090 |
| ||
UT-0095 |
| ||
UT- |
0096 |
|
Go through the following lnetctl commands and excercise their parameters, by providing error values:
UT- |
0100 | Go through the following lnetctl commands and excercise their parameters, by providing error values:
| ||
UT-0105 | Delete a non-existent network Should return -EINVAL | ||
UT-0110 | Delete a non existent NID on tcp/o2ib Should return -EINVAL |
Remote Peer Remote Peer Configuration
Expected Behavior
- A peer can be added by specifying a list of NIDs
- The first NID shall be used as the primary NID. The rest of the NIDs will be added under the primary NID
- A peer can be added by explicitly specifying the key NID, and then by adding a set of other NIDs, all done through one API call
- If a key NID already exists, but it's not an MR NI, then adding that Key NID from DLC shall convert that NI to an MR NI
- If a key NID already exists, and it is an MR NI, then re-adding the Key NID shall have no effect
- if a Key NID already exists as part of another peer, then adding that NID as part of another peer shall fail
- if a NID is being added to a peer NI and that NID is a non-MR, then that NID is moved under the peer and is made to be MR capable
- if a NID is being added to a peer and that NID is an MR NID and part of another peer, then the operation shall fail
- if a NID is being added to a peer and it is already part of that Peer then the operation is a no-op.
In-Range UT
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|---|---|---|
cfg-070 | UT-0115 |
| |
cfg-070 | UT-0120 |
| |
cfg-070 | UT-0125 |
| |
cfg-070 | UT-0130 |
| |
cfg-070 | UT-0131 |
| |
cfg-070 | UT-0135 |
| |
cfg-070 | UT-0140 |
| |
cfg-070 | UT-0145 |
|
| |||
cfg-075 | UT-0150 |
|
Out-of-Range UT
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|---|---|---|
UT-0155 |
| ||
UT-0160 |
| ||
UT-0165 |
| ||
UT-0170 |
| ||
UT-0171 |
| ||
UT-0172 |
|
|
| ||
UT- |
0173 |
| ||
UT-0175 |
| ||
UT-0176 |
|
Error UT
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|---|---|---|
cfg-070 | UT-0180 |
| |
cfg-070 | UT-0185 |
| |
cfg-080 | snd-065 | UT-0190 |
|
Policy Configuration
In-Range UT
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|---|---|---|
cfg-090 | UT-0195 |
| |
cfg-090 | UT-0200 |
| |
cfg-090 | snd-025 | UT-0205 |
|
Error UT
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|---|---|---|
cfg-090 | UT-0210 |
|
General Configuration
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|---|---|---|
cfg-170 | UT-0215 |
|
Functional Requirements
Interface Selection and Message Sending Requirements
Note to test NUMA proximity, you can use python psutil to bind a process to a specific CPU then execute a write/read operation to the FS on that CPU.
The CPU distances can be acquired from /proc/sys/lnet/cpu_partition_distance
The NUMA cpu list can be acquired from /sys/devices/system/node/node*/cpulist
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description | Behavior of Note |
---|---|---|---|---|
snd-005 | UT-0220 |
| ||
snd-010 | snd-015 | UT-0225 |
| |
snd-020 | UT-0230 |
| ||
snd-020 | UT-0235 |
| ||
snd-030 | UT-0240 |
|
- Configure 3 NIs with equadistant NUMA distance
- Send three or more messages
- Dump statistics on each NI to verify that each NI was used to send messages
- Configure 3 NIs closer to different NUMA nodes
- dump the NI statistics
- Verify that each NI has the correct device CPT
- Configure 3 NIs with different NUMA distances
- Send messages
- Confirm through statistics that messages are being sent over the nearest NI (NUMA wise)
- Configure 2 NIs with different NUMA distances
- Send messages
- Confirm through statistics that messages are being sent over the nearest NI (NUMA wise)
- add another NI which is close NUMA wise than the current nearest
- confirm through statistics that messages are not being sent over the newly added NI
- Configure 3 NIs, one EDR, one FDR and one QDR
- set the NUMA range to a large value so all NIs are considered through RR
- start traffic
- monitor statistics on each NI.
- Confirm that EDR is preferred until it becomes saturated, then FDR is selected then QDR
|
| ||||
snd-030 | UT-0245 |
| ||
snd-035 | UT-0250 |
| ||
snd-040 | UT-0255 |
| ||
snd-045 | snd-070 | UT-0260 |
| |
snd-050 | UT-0265 |
| ||
snd-050 | UT-0270 |
|
- Configure an MR system
- Start traffic
- monitor traffic is being sent to all configured peers
- bring down one of the peer NIDs
- monitory traffic is no longer sent to that peer NID
- bring up the peer NID again
- monitor traffic is being sent to it again
- Configure an MR system
- Start traffic
- monitor traffic is being sent to all configured peers
- bring down one of the peer NIDs
- monitory traffic is no longer sent to that peer NID
- bring down all peer NIDs
- message should fail.
- Configure an MR system
- Start traffic
- monitor traffic is being sent to all configured peers over all NIs
- bring down the local NIs one by one
- note traffic is migrated to the NIs still up, until no NIs are left then messages are dropped
- Configure an MR system
- Start traffic
- monitor traffic is being sent to all configured peers over all NIs
- bring down the local NIs one by one
- note traffic is migrated to the NIs still up, until no NIs are left then messages are dropped
- bring up the NIs again and confirm that NIs are being reused.
- Configure two networks tcp and o2ib
- Configure nodes to have multiple interfaces on each of the networks
- start traffic over the o2ib network
- o2ib should be used
- bring down the o2ib network
- traffic should migrage to the tcp network.
- no traffic should be dropped.
- Configure an MR system
- bring down an NI
- confirm that the show info shows the NI as down
- TODO: how do we test device failure?
- Configure an MR system
- Configure peers via DLC
- Run traffic
- Delete one of the peer_nis we're sending to via DLC
- Traffic going over that peer_ni should continue but no more traffic should use that NI
- Configure an MR system
- Configure peers via DLC
- Run traffic
- Delete one of the peer_nis we're sending to via DLC
- Bring that peer_ni back
- Note traffic stops and starts on that peer with no traffic loss
- Repeat the deletion and reconfiguration of the peer_ni
- Configure an MR system
- Configure peers via DLC
- Run traffic
- Delete the entire peer
- The peer should be recreated on the next message, but it won't be MR capable.
Dynamic NID Discovery
...
| ||||||
snd-050 | snd-060, snd-075 | UT-0275 |
| When a peer NID is removed then added or if a new peer NID is added, then it's peer_ni->lpni_seq number will start off at 0. In the case of credits being == then the newly added peer will always be picked until it's lpni_seq number catches up with the other peer NIs sequence numbers. See code below.
This behavior will manifest itself in low bandwidth environment. In high bandwidth environment it is likely that the credits in the selection algorithm will be different and the peer_NI will be picked according to credits. Another scenario to consider is when the lpni_seq number wraps. In low bandwidth environment this could cause the peer NI which wrapped to be picked until it catches up with the other peer NIs sequence numbers. This in itself might not be significant enough, but does raise the question of the benefit of having a seq number to start with. Does it give much of a functional advantage, or having the credits criteria enough. The same issue is present with local NI sequence numbers. | ||
snd-055 | snd-060, snd-075 | UT-0280 |
| |||
snd-055 | snd-060, snd-075, snd-085 | UT-0285 |
| |||
snd-055 | snd-060, snd-075, snd-085 | UT-0290 |
| |||
snd-055 | snd-060, snd-075 | UT-0295 |
| |||
snd-080 | UT-0300 |
| ||||
snd-080 | UT-0305 |
| ||||
UT-0310 |
| |||||
UT-0315 |
| |||||
UT-0320 |
|
Dynamic NID Discovery
The unit tests for peer NID discovery depend on lnetctl ping
not triggering discovery. To force discovery, use lnetctl discover
. Note that some of the tests require DLC configuration to include non-existing peer NIDs. These nids are marked with a *.
Tests with discovery enabled.
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|---|---|---|
dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040 | UT-DD-CFG-0001 |
| |
UT-DD-CFG-0002 |
| ||
UT-DD-CFG-0003 |
| ||
UT-DD-CFG-0004 |
| ||
UT-DD-CFG-0005 |
| ||
UT-DD-CFG-0006 |
| ||
dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040 | UT-DD-EN-0001 | Basic functionality 1-1: discovery of an MR peer via its primary.
| |
dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040 | UT-DD-EN-0002 | Basic functionality 1-2: discovery of an MR peer via a secondary.
| |
dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040 | UT-DD-EN-0003 | Basic functionality 1-3: discovery of an MR peer via a tertiary.
| |
dyn-020 | dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040, dyn-055 | UT-DD-EN-0004 | Basic functionality 1-4: implicit discovery of an MR peer
|
dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040 | UT-DD-EN-0005 | Basic functionality 1-5: discovery of an MR peer with > 16 interfaces. (This test exercises the code path that resizes the push buffers.)
| |
dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040 | UT-DD-EN-0006 | Compatibility 2-1: discovery of a non-MR peer via its primary.
| |
dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040 | UT-DD-EN-0007 | Compatibility 2-2: discovery of a non-MR peer via a secondary.
| |
dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040 | UT-DD-EN-0008 | Compatibility 2-3: discovery of a non-MR peer via a tertiary.
| |
dyn-020 | dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040, dyn-055 | UT-DD-EN-0009 | Compatibility 2-4: implicit discovery of an MR peer
|
dyn-060 | dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040 | UT-DD-EN-0010 | Interaction with DLC 3-1: DLC overrides Discovery of MR peer
|
dyn-060 | dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040 | UT-DD-EN-0011 | Interaction with DLC 3-2: DLC overrides Discovery of non-MR peer
|
dyn-060 | dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040, dyn-065 | UT-DD-EN-0012 | Interaction with DLC 3-3: DLC overrides Discovery of MR peer with primary conflict
|
dyn-060 | dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040, dyn-065 | UT-DD-EN-0013 | Interaction with DLC 3-4: DLC overrides Discovery of non-MR peer with primary conflict
|
dyn-060 | dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040 | UT-DD-EN-0014 | Interaction with DLC 3-5: "push MR bit" exception to DLC overrides Discovery
|
dyn-060 | dyn-005, dyn-015, dyn-025, dyn-030, dyn-035, dyn-040, dyn-065 | UT-DD-EN-0015 | Interaction with DLC 3-6: "push MR bit" exception to DLC overrides Discovery
|
Tests with discovery disabled. Note that disabling discovery does not fully disable it. The MR capable node will continue to process pushes, and if there is a problem with a push it will ping the originator to obtain the information.
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|---|---|---|
dyn-005, dyn-025, dyn-030 | UT-DD-DIS-0001 | Discovery disabled 4-1: discovery of an MR peer via its primary
| |
dyn-005, dyn-025, dyn-030 | UT-DD-DIS-0002 | Discovery disabled 4-2: discovery of an MR peer via a secondary
| |
dyn-005, dyn-025, dyn-030 | UT-DD-DIS-0003 | Discovery disabled 4-3: discovery of an MR peer via a tertiary.
| |
dyn-005, dyn-025, dyn-030 | UT-DD-DIS-0004 | Discovery disabled 4-4: implicit discovery of an MR peer
| |
dyn-020 | dyn-005, dyn-025, dyn-030, dyn-055 | UT-DD-DIS-0005 | Discovery disabled 4-5: implicit discovery of an MR peer. (This test shows that if discovery is enabled on either node or peer, it happens on both.)
|
dyn-020 | dyn-005, dyn-025, dyn-030, dyn-055 | UT-DD-DIS-0006 | Discovery disabled 4-6: implicit discovery of an MR peer, > 16 interfaces. (This test shows that if discovery is enabled on either node or peer, it happens on both, including retries required because buffers need to be extended.)
|
dyn-060 | dyn-005, dyn-025, dyn-030 | UT-DD-DIS-0007 | Disabled with DLC 5-1: DLC overrides Discovery of MR peer
|
dyn-060 | dyn-005, dyn-025, dyn-030 | UT-DD-DIS-0008 | Disabled with DLC 5-2: DLC overrides Discovery of non-MR peer
|
dyn-060 | dyn-005, dyn-025, dyn-030 | UT-DD-DIS-0009 | Disabled with DLC 5-3: DLC overrides Discovery of MR peer with primary conflict
|
dyn-060 | dyn-005, dyn-025, dyn-030 | UT-DD-DIS-0010 | Disabled with DLC 5-4: DLC overrides Discovery of non-MR peer with primary conflict
|
dyn-060 | dyn-005, dyn-025, dyn-030 | UT-DD-DIS-0011 | Disabled with DLC 5-5: "push MR bit" exception to DLC overrides Discovery
|
dyn-060 | dyn-005, dyn-025, dyn-030, dyn-065 | UT-DD-DIS-0012 | Disabled with DLC 5-6: "push MR bit" exception to DLC overrides Discovery
|
Debugging Requirements
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|---|---|---|
dbg-005 | dbg-010, dbg-015, dbg-020, dbg-025, dbg-030, dbg-035, dbg-080 | UT-0325 |
|
dbg-040 | dbg-080, dbg-095 | UT-0330 |
|
dbg-040 | dbg-080 | UT-0335 |
|
dbg-045 | dbg-080 | UT-0340 |
|
dbg-050 | dbg-080, dbg-100 | UT-0345 |
|
dbg-110 | UT-0350 |
| |
dbg-115 | UT-0355 |
| |
dbg-120 | UT-0360 |
|
Network interface Health
Backwards Compatibility Requirements
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|
bck-025 | UT- |
- dump per NI statistics
- transmitted
- received
- dropped
- timeouts
- state
- configure multiple NIs
- run traffic
- dump stats on all NIs
- configure multiple NIs
- run traffic
- dump stats on all NIs
- Filter on specific NID
- dump LNet level statistics
- configure multiple peers
- start traffic
- dump per peer statistics
- configure multiple NIs
- toggle their state from ACTIVE to DOWN
- confirm that state change is being printed to console.
- configure an MR system
- start traffic
- bring down an NI
- confirm that messages indicating that another NI/peer is being used is printed.
- Configure an MR system
- run traffic
- stop traffic
- dump NI statistics
- dump peer statistics
- dump LNet level statistics
- zero out stats
- dump all statistics above to confirm they've been zeroed out.
Network interface Health
Backwards Compatibility Requirements
0365 |
|
Performance Requirements
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|---|---|---|
UT-0370 | Testing reconnects. In Large clusters it is possible that servers might need to handle a burst of client connects. The performance of such scenarios needs to be quantified. | ||
Misc Error Scenarios
Primary Requirement ID | Secondary Requirement ID | Unit Test ID | Unit Test Description |
---|---|---|---|
| |||
| |||
| |||
| |||
| |||
| |||
|