...
Selection Algorithm Scenarios
Test # | Tag | Procedure | Script | Result |
---|
1 | SRC_SPEC_LOCAL_MR_DST | - MR Node
- MR Peer
- Send a ping
- REPLY for PING should always come on the same interface that PING was sent on.
- Check the TRACE in the logs to verify
- Repeat the test. A different local NI should be used for each new PING.
|
| pass |
2 | SRC_SPEC_LOCAL_MR_DST | - MR Node
- MR Peer
- Initiate discovery
- Node → PING → Peer
- Node ← PUSH ← Peer
- Node should respond with an ACK to the same interface as the one it received the PUSH on
- Check the TRACE in the logs to verify
- Repeat the test
- Peer's local_ni when sending the PUSH should be different.
|
| pass |
3 | SRC_SPEC_ROUTER_MR_DST | - MR Node
- NMR Router
- MR Peer
- Send a ping
- REPLY for PING should always come on the same interface that PING was sent on.
- Check the TRACE in the logs to verify
- Router should be used
- Repeat the test. A different local NI should be used for each new PING.
|
| pass |
4 | SRC_SPEC_ROUTER_MR_DST | |
| pass |
5 | SRC_SPEC_ROUTER_MR_DST | - MR Node
- MR Router
- MR Peer
- Send a ping
- REPLY for PING should always come on the same interface that PING was sent on.
- Check the TRACE in the logs to verify
- Repeat sending
- Router interfaces should be used in round robin, while the peer destination should remain constant.
- Repeat the test. A different local NI should be used for each new PING.
|
| pass |
6 | SRC_SPEC_ROUTER_MR_DST | |
| pass |
7 | SRC_SPEC_LOCAL_NMR_DST | - Same as 1 and 2
- Except that repeating the test will not result in a different local_ni being used.
|
| pass |
8 | SRC_SPEC_ROUTER_NMR_DST | - Same as 3 - 6
- Except that repeating the test will not result in different local NIs being used.
|
| pass |
9 | SRC_ANY_LOCAL_MR_DST | - MR Node
- MR Peer
- Send multiple PINGs
- PING REPLYs should come on the same interface
- Every PING will select a new local/remote NIs
|
| pass |
10 | SRC_ANY_ROUTER_MR_DST | - MR Node
- NMR Router
- MR Peer
- Send Multiple PINGs
- Node will cycle over local_NIs
- Node will use the same destination NID as final destination
- Node will use the NMR Router
|
| pass |
11 | SRC_ANY_ROUTER_MR_DST | - MR Node
- MR Router
- MR Peer
- Send Multiple PINGs
- Node will cycle over local_NIs
- Node will use the same destination NID as final destination
- Node will use the different interfaces of the MR Router
- MR Router will cycle over the interfaces of the Final destination.
|
| pass |
12 | SRC_ANY_LOCAL_NMR_DST | - MR Node
- NMR Peer
- Send multiple PINGs
- Node will use same source/dst NID for all PINGs
|
| pass |
13 | SRC_ANY_ROUTER_NMR_DST | - MR Node
- NMR Router
- NMR Peer
- Send multiple PINGs
- Node will use the same source/dst NIDs for all PINGs
- Node will use the router interface
|
| pass |
14 | SRC_ANY_ROUTER_NMR_DST | - MR Node
- MR Router
- NMR Peer
- Send multiple PINGs
- Node will use the same source/dst NIDs for all PINGs
- Node will cycle through the Router's interfaces
|
| pass |
Error Scenarios
Synchronous Errors
Test # | Tag | Procedure | Script | Result |
---|
1 | Immediate Failure | - Send a PING
- simulate an immediate LND failure (EX: NOMEM)
- Message should not be resent
| lnetctl discover <nid> lctl net_drop_add with "-e local_error" lnetctl discover <nid> | pass |
Asynchronous Errors
Test # | Tag | Procedure | Script | Result |
---|
1 | LNET_MSG_STATUS_LOCAL_INTERRUPT LNET_MSG_STATUS_LOCAL_DROPPED LNET_MSG_STATUS_LOCAL_ABORTED LNET_MSG_STATUS_LOCAL_NO_ROUTE LNET_MSG_STATUS_LOCAL_TIMEOUT | - MR Node with Multiple interfaces
- Send a PING
- Simulate an <error>
- PING msg should be queued on resend queue
- PING msg will be resent on a different interface
- Failed interfaces' health value will be decremented
- Failed interface will be placed on the recovery queue
|
2 | Sensitivity == 0 | - Same setup as 1
- NI is not placed on the recovery queue
| 3 | Sensitivity > 0 | - Same setup as 1
- NI is placed on the recovery queue
- Monitor network activity as NI is pinged until health is back to maximum
| 4 | Sensitivity > 0 Buggy interface | - Same setup as 1
- NI is placed on recovery queue
- NI is pinged ever 1 second
- Simulate ping failure ever other ping
- NI's health should be decremented on failure
- NI should remain on the recovery queue
| 5 | Retry count == 0 | - Same setup as 1
- Message will not be retried and the message will be finalized immediately
| 6 | Retry count > 0 | - Same setup as 1
- Message will be transmitted for a maximum of retry count or until the message expires
| 7 | REPLY timeout | - Same setup as 1
- Except Use LNet selftest
- Simulate a local timeout
- Re-transmit
- No REPLY received
- Message is finalized and TIMEOUT event is propagated.
| 8 | ACK timeout | - Same setup as 7 except simulate ACK timeout
| 9 | LNET_MSG_STATUS_LOCAL_ERROR | - Same setup as 1
- Message is finalized immediately (not resent)
- Local NI is placed on the recovery queue
- Same procedure to recover the local NI
| 10 | LNET_MSG_STATUS_REMOTE_DROPPED | - Same setup as 1
- Message is queued for resend depending on retry_count
- peer_ni is placed on the recovery queue (not if sensitivity == 0)
- peer_ni is pinged every 1 second
| 11 | LNET_MSG_STATUS_REMOTE_ERROR LNET_MSG_STATUS_REMOTE_TIMEOUT LNET_MSG_STATUS_NETWORK_TIMEOUT | - Same setup as 1
- Message is not resent
- peer_ni recovery happens as outlined in previous cases
| Random Failures
...
- MR Node
- NMR Peer
- Self-test
- Randomize local NI failure
- Randomize Remote NI failure
Examples: lctl net_drop_add -s 10.9.10.3@tcp -d 10.9.10.4@tcp -m GET -i 20 -e local_dropped Key messages in debug log: (lib-msg.c:762:lnet_health_check()) 10.9.10.3@tcp->10.9.10.4@tcp:GET:LOCAL_DROPPED - queuing for resend (lib-msg.c:508:lnet_handle_local_failure()) ni 10.9.10.3@tcp added to recovery queue. Health = 950 (lib-move.c:2928:lnet_recover_local_nis()) attempting to recover local ni: 10.9.10.3@tcp | pass |
2 | Sensitivity == 0 | - Same setup as 1
- NI is not placed on the recovery queue
|
| pass |
3 | Sensitivity > 0 | - Same setup as 1
- NI is placed on the recovery queue
- Monitor network activity as NI is pinged until health is back to maximum
|
| pass |
4 | Sensitivity > 0 Buggy interface | - Same setup as 1
- NI is placed on recovery queue
- NI is pinged ever 1 second
- Simulate ping failure ever other ping
- NI's health should be decremented on failure
- NI should remain on the recovery queue
|
|
|
5 | Retry count == 0 | - Same setup as 1
- Message will not be retried and the message will be finalized immediately
|
| pass |
6 | Retry count > 0 | - Same setup as 1
- Message will be transmitted for a maximum of retry count or until the message expires
| Key messages in debug log: (lib-move.c:1715:lnet_handle_send()) TRACE: 10.9.10.3@tcp(10.9.10.3@tcp:<?>) -> 10.9.10.4@tcp(10.9.10.4@tcp:10.9.10.4@tcp) : GET try# 0 (lib-move.c:1715:lnet_handle_send()) TRACE: 10.9.10.3@tcp(10.9.10.3@tcp:<?>) -> 10.9.10.4@tcp(10.9.10.4@tcp:10.9.10.4@tcp) : GET try# 1 | pass |
7 | REPLY timeout | - Same setup as 1
- Except Use LNet selftest
- Simulate a local timeout
- Re-transmit
- No REPLY received
- Message is finalized and TIMEOUT event is propagated.
|
| pass |
8 | ACK timeout | - Same setup as 7 except simulate ACK timeout
|
| pass |
9 | LNET_MSG_STATUS_LOCAL_ERROR | - Same setup as 1
- Message is finalized immediately (not resent)
- Local NI is placed on the recovery queue
- Same procedure to recover the local NI
|
| pass |
10 | LNET_MSG_STATUS_REMOTE_DROPPED | - Same setup as 1
- Message is queued for resend depending on retry_count
- peer_ni is placed on the recovery queue (not if sensitivity == 0)
- peer_ni is pinged every 1 second
|
| pass |
11 | LNET_MSG_STATUS_REMOTE_ERROR LNET_MSG_STATUS_REMOTE_TIMEOUT LNET_MSG_STATUS_NETWORK_TIMEOUT | - Same setup as 1
- Message is not resent
- peer_ni recovery happens as outlined in previous cases
|
| pass |
Random Failures
Test # | Tag | Procedure | Script | Result |
---|
1 | self test | - MR Node
- NMR Peer
- Self-test
- Randomize local NI failure
- Randomize Remote NI failure
| ip link set eth1 down | pass |
2 | self test | - MR Node
- MR Peer
- Self-test
- Randomize local NI failure
- Randomize Remote NI failure
|
| pass |
3 | self test | - MR Node
- MR Router
- NMR Peer
- Self-test
- Randomize local NI failure
- Randomize Remote NI failure
|
|
|
4 | self test | - MR Node
- MR Router
- MR Peer
- Self-test
- Randomize local NI failure
- Randomize Remote NI failure
|
|
|
5 | self test | - MR Node
- NMR Router
- NMR Peer
- Self-test
- Randomize local NI failure
- Randomize Remote NI failure
|
|
|
6 | self test | - MR Node
- NMR Router
- MR Peer
- Self-test
- Randomize local NI failure
- Randomize Remote NI failure
|
|
|
MR Router Testing
...
- MR Node
- MR Peer
- Self-test
- Randomize local NI failure
- Randomize Remote NI failure
...
- MR Node
- MR Router
- NMR Peer
- Self-test
- Randomize local NI failure
- Randomize Remote NI failure
...
- MR Node
- MR Router
- MR Peer
- Self-test
- Randomize local NI failure
- Randomize Remote NI failure
...
- MR Node
- NMR Router
- NMR Peer
- Self-test
- Randomize local NI failure
- Randomize Remote NI failure
...
- MR Node
- NMR Router
- MR Peer
- Self-test
- Randomize local NI failure
- Randomize Remote NI failure
MR Router Testing
Test # | Tag | Procedure | Script | Result |
---|
Discovery triggered on route add | Bring up Router A with two interfacesTest # | Tag | Procedure | Script | Result |
---|
1 | Discovery triggered on route add | - Bring up Router A with two interfaces
- Bring up Peer A and add network on tcp0
- Add router to tcp1 on peerA
- Observe that a discovery occurs from peer A→ Router A
|
| pass |
2 | Discovery triggered on interval | - Bring up Router A with two interfaces
- Bring up Peer A and add network on tcp0
- Add router to tcp1 on peerA
- Observe that a discovery occurs from peer A→ Router A
- Keep the two nodes up for 4 minutes
- Every router_interval_timeout a discovery should occur from peerA→ RouterA
|
| pass |
3 | Router tcp1 down due to no traffic | - Bring up Router A with two interfaces
- Bring up Peer A and add network on tcp0
- Add router to tcp1 on peerA
- Observe that a discovery occurs from peer A→ Router A
- Keep the two nodes up for 4 minutes
- Every router_interval_timeout a discovery should occur from peerA→ RouterA
- Since there is no traffic on tcp1 RouterA tcp1 should be down
- verify via: lnetctl net show -v
|
| pass |
4 | Router tcp1 comes up when peerB is brought up | - Bring up Router A with two interfaces
- Bring up Peer A and add network on tcp0
- Add router to tcp1 on peerA
- Observe that a discovery occurs from peer A→ Router A
- Keep the two nodes up for 4 minutes
- Every router_interval_timeout a discovery should occur from peerA→ RouterA
- Since there is no traffic on tcp1 RouterA tcp1 should be down
- verify via: lnetctl net show -v
- Bring up Peer B and add network on tcp1
- Add router to tcp on peer B
- Observe that a discovery occurs from peerB → RouterA
- Observe that a RouterA tcp1 is now up
|
| pass |
5 | Add route without router there |
- Bring up Peer A and add network on tcp0
- Add
|
router - route to tcp1 on peerA
- Observe that a discovery
|
occurs from peer A→ Router ADiscovery triggered on interval- occurs but no response since router is not up
- lnetctl route show -v # shows that router is down
- lnetctl peer show -v # shows the peer is down
- Bring up Router A with two interfaces
|
Bring up Peer A - : tcp0, tcp1
- After router_interval_timeout a discovery should verify that router A is up
- lnetctl route show -v # shows that router is down because no routerA tcp1 network should be down
- lnetctl peer show -v # shows the peer is up
- Bring up PeerB and add network on
|
tcp0Add router to on peerAObserve that a discovery occurs from peer A→ Router AKeep the two nodes up for 4 minutesEvery router_interval_timeout a discovery should occur from peerA→ RouterARouter tcp1 down due to no traffic | Bring up Router A with two interfaces- lnetctl route show -v # shows that router is up
|
| pass |
6 | traffic should trigger an attempt at router discovery | - Bring up Peer A and add network on tcp0
- Add
|
router - route to tcp1 on peerA
- Observe that a discovery occurs
|
from peer A→ Router AKeep the two nodes up for 4 minutesEvery - but no response since router is not up
- lnetctl route show -v # shows that router is down
- lnetctl peer show -v # shows the router is down
- Bring up Router A with two interfaces: tcp0, tcp1
- Bring up PeerB and add network on tcp1
- Before the router_interval_timeout
|
a discovery should occur from peerA→ RouterA- expires do a:
- lnetctl discover Router@tcp
- This should trigger a discovery of router A
- lnetctl peer show -v # shows the peer is up and multi-rail
- lnetctl route show -v # shows the route up
|
| pass |
7 | Ping should not trigger discovery of router |
Since there is no traffic on tcp1 RouterA tcp1 should be down- verify via: lnetctl net show -v
Router tcp1 comes up when peerB is brought up | Bring up Router A with two interfacestcp0tcp1 | - Bring up Peer A and add network on tcp0
- Add router to tcp1 on peerA
- Observe that a discovery occurs
|
from peer A→ Router AKeep the two nodes up for 4 minutesEvery - but no response since router is not up
- lnetctl route show -v # shows that router is down
- lnetctl peer show -v # shows the router is down
- Bring up Router A with two interfaces: tcp0, tcp1
- Bring up PeerB and add network on tcp1
- Before the router_interval_timeout expires do a:
- lnetctl ping PeerB@tcp1
- This should NOT trigger a discovery
|
should occur from peerA→ RouterASince there is no traffic on tcp1 RouterA tcp1 should be down- verify via: lnetctl net show -v
Bring up Peer B and add network on tcp1Add router to tcp on peer BObserve that a discovery occurs from peerB → RouterAObserve that a RouterA tcp1 is now upAdd route without router there | - Bring up Peer A and add network on tcp0
- Add route to tcp1 on peerA
- Observe that a discovery occurs but no response since router is not up
- lnetctl route show -v # shows that router is down
- lnetctl peer show -v # shows the peer is down
- Bring up Router A with two interfaces: tcp0, tcp1
- After router_interval_timeout a discovery should verify that router A is up
- lnetctl route show -v # shows that router is down because no routerA tcp1 network should be down
- lnetctl peer show -v # shows the peer is up
- Bring up PeerB and add network on tcp1
- lnetctl route show -v # shows that router is up
| - of router A
- ping should fail
- lnetctl peer show -v # shows the peer is down
- lnetctl route show -v # shows the route down
|
| pass |
8 | Multi-interface router even traffic distribution | - Bring up Router A with 4 interfaces. 2 on tcp0 and 2 on tcp1
- Bring up Peer A with interface on tcp0
- Bring up Peer B with interface on tcp1
- Run traffic using selftest
- Observe that traffic is distributed on all router interfaces evenly
|
| pass |
9 | Multi-interface router with one bad interface | - Bring up Router A with 4 interfaces. 2 on tcp0 and 2 on tcp1
- Bring up Peer A with interface on tcp0
- Bring up Peer B with interface on tcp1
- Run traffic using selftest
- Observe that traffic is distributed on all router interfaces evenly
- Enable health (sensitivity, retries)
- Add a PUT drop rule on the router to drop traffic on one of the interfaces in tcp0
- Observe that traffic goes to the other interfaces. There shouldn't be any drop in traffic.
- As long as the interface has less than optimal health, it should never be used for routing.
|
| pass |
10 | Multi-interface router with a bad interface that recovers | - Bring up Router A with 4 interfaces. 2 on tcp0 and 2 on tcp1
- Bring up Peer A with interface on tcp0
- Bring up Peer B with interface on tcp1
- Run traffic using selftest
- Observe that traffic is distributed on all router interfaces evenly
- Enable health (sensitivity, retries)
- Add a PUT drop rule on the router to drop traffic on one of the interfaces in tcp0
- Observe that traffic goes to the other interfaces. There shouldn't be any drop in traffic.
- As long as the interface has less than optimal health, it should never be used for routing.
- Remove the PUT drop rule from the router
- Eventually that interface should be healthy again
- Traffic should resume using that interface
|
| pass In an idle system the bad peer interface will be pinged once every second causing its sequence number to go up. So when it comes back online it will not be used until the sequence numbers equalize. This will be the case if the system is busy, but the issue will be reversed. |
11 | Multi-Router/Multi-interface setup | - Bring up router A with 4 interfaces
|
traffic should trigger an attempt at router discovery | - Bring up Peer A and add network on tcp0
- Add route to tcp1 on peerA
- Observe that a discovery occurs but no response since router is not up
- lnetctl route show -v # shows that router is down
- lnetctl peer show -v # shows the router is down
- Bring up Router A with two interfaces: tcp0, tcp1
- Bring up PeerB and add network on tcp1
- Before the router_interval_timeout expires do a:
- lnetctl discover PeerB@tcp1
- This should trigger a discovery of router A
- lnetctl peer show -v # shows the peer is up and multi-rail
- lnetctl route show -v # shows the route up
| Ping should not trigger discovery of router | - Bring up Peer A and add network on tcp0
- Add router to tcp1 on peerA
- Observe that a discovery occurs but no response since router is not up
- lnetctl route show -v # shows that router is down
- lnetctl peer show -v # shows the router is down
- Bring up Router A with two interfaces: tcp0, tcp1
- Bring up PeerB and add network on tcp1
- Before the router_interval_timeout expires do a:
- lnetctl ping PeerB@tcp1
- This should NOT trigger a discovery of router A
- ping should fail
- lnetctl peer show -v # shows the peer is down
- lnetctl route show -v # shows the route down
| Multi-interface router even traffic distribution | Bring up Router A with 4 interfaces. - 2 on tcp0 and 2 on tcp1
- Bring up
|
Peer A with interface on tcp0Bring up Peer B with interface on tcp1Run traffic using selftestObserve that traffic is distributed on all router interfaces evenlyMulti-interface router with one bad interface | Bring up Router A with 4 interfaces. - router B with 4 interfaces 2 on tcp0 and 2 on tcp1
- Bring up Peer A with interface on tcp0
- Bring up Peer B with interface on tcp1
- Run traffic
|
using selftest- Observe that traffic is distributed
|
on all router interfaces Enable health (sensitivity, retries)Add a PUT drop rule router to drop traffic on one of the interfaces in tcp0Observe that traffic goes to the other interfaces. There shouldn't be any drop in traffic.As long as the interface has less than optimal health, it should never be used for routing.Multi-interface router with a bad interface that recovers | - interfaces of router A and B
|
| pass |
12 | Multi-Router/Multi-interface setup with failed gateway | |
Bring up Router . - 2 on tcp0 and 2 on tcp1
- Bring up
|
Peer A with interface - router B with 4 interfaces 2 on tcp0 and 2 on tcp1
- Bring up Peer
|
B tcp1Run traffic using selftestObserve that traffic is distributed on all router interfaces evenlyEnable health (sensitivity, retries)Add a PUT drop rule on the router to drop traffic on one of the interfaces in tcp0Observe that traffic goes to the other interfaces. There shouldn't be any drop in traffic.As long as the interface has less than optimal health, it should never be used for routing.Remove the PUT drop rule from the routerEventually that interface should be healthy again- tcp0
- Bring up Peer B with interface on tcp1
- Run traffic
- Observe that traffic is distributed evenly on the interfaces of router A and B
- Shutdown router A
- Observe that traffic is diverted to Router B with no drop in traffic.
|
| pass |
13 |
Traffic should resume using that interface | Multi-Router/Multi-interface setup with router recovery | - Bring up router A with 4 interfaces 2 on tcp0 and 2 on tcp1
- Bring up router B with 4 interfaces 2 on tcp0 and 2 on tcp1
- Bring up Peer A with interface on tcp0
- Bring up Peer B with interface on tcp1
- Run traffic
- Observe that traffic is distributed evenly on the interfaces of router A and
|
BMulti-Router/Multi-interface setup with failed gateway | - B
- Shutdown router A
- Observe that traffic is diverted to Router B with no failure.
- Startup Router A
- Observe that traffic starts going through Router A again. There should be no drop in traffic
|
| Problem found. Possibly with discovery. 1. bring up two routers |
Bring up router A tcp0 and 2 on tcp1Bring up router B with 4 interfaces 2 on tcp0 and 2 on tcp1Bring up Peer A with interface on tcp0Bring up Peer B with interface on tcp1Run trafficObserve that traffic is distributed evenly on the interfaces of router A and BAdd a drop rule on Router A that impacts all of its interfacesObserve that traffic is diverted to Router B with no drop in traffic.Multi-Router/Multi-interface setup with router recovery | Bring up router A with 4 interfaces 2 on tcp0 and 2 on tcp1each network 2. bring down one of the routers 3. bring it up again but with only 2 of its interfaces on 1 network 4. Client goes berserk, keeps trying to discover it. toggles between state: 0x139 and 39 There were a couple of issues here: - the sequence numbers were getting misalligned when the router was brought down. This caused discovery not to work correctly
- We restricted the router peer-NIs from being deleted. But we need to differentiate between configuration changes and discovery changes. The eariler should not allow deleting peer_nis from routers unless the route is removed first. The latter should allow peer updates because the peer itself is giving us new information.
Pass |
14 | router sensitivity < 100 | |
Bring up router B Observe that traffic is distributed evenly on the interfaces of router A and B- with 4 interfaces 2 on tcp0 and 2 on tcp1
- Bring up Peer A with interface on tcp0
- Bring up Peer B with interface on tcp1
|
Run traffic- modify the router_sensitivity to 50%
- Add a drop rule on
|
Router A that impacts all of its interfaces- router A
- Observe that traffic
|
is diverted to Router B with no failure.Remove the rule from Router AObserve that traffic starts going through Router A again. There should be no drop in traffic- doesn't completely stop to Router A until its health goes to 50% of the optimal value.
|
|
|
15 | Extra Health Testing | - Run through the health test cases above while there exists a multi-rail router.
|
|
|
User Interface
Test # | Tag | Procedure | Script | Result |
---|
1 | lnet_transaction_timeout | - Set lnet_transaction_timeout to a value < retry_count via lnetctl and YAML
- This should lead to a failure to set
- Set lnet_transaction_timeout to a value > retr_count via lnetctl and YAML
- lnet_lnd_timeout value should == lnet_transaction_timeout / retry_count
- Show value via "lnetctl global show"
| lnetctl set transaction_timeout <value> | pass |
2 | lnet_retry_count | - Set the lnet_retry_count to a value > lnet_transaction_timeout via lnetctl and YAML
- This should lead to a failure to set
- Set the lnet_retry_count to a value < lnet_transaction_timeout via lnetctl and YAML
- lnet_lnd_timeout value should == lnet_transaction_timeout / retry_count
- Show value via "lnetctl global show"
| lnetctl set retry_count <value> | pass |
3 | lnet_health_sensitivity | - Set the lnet_health sensitivity from lnetctl and from YAMLShow value via "lnetctl global show"YAML
- Show value via "lnetctl global show"
| lnetctl set health_sensitivity <value> | Jira |
---|
server | Whamcloud Community Jira |
---|
serverId | 8bba2dd1-4333-3006-bfcd-f35d4ebbd2ad |
---|
key | LU-11530 |
---|
|
|
4 | NI statistics | - verify LNet health statistics
|
| pass |
5 | Peer NI statistics | - verify LNet health statistics for peer NIs
|
| pass |
6 | NI Health value | - verify setting the local NI health statistics
- lnetctl net set --nid <nid> --health <value>
- Redo from YAML
|
| Jira |
---|
server | Whamcloud Community Jira |
---|
serverId | 8bba2dd1-4333-3006-bfcd-f35d4ebbd2ad |
---|
key | LU-11529 |
---|
|
|
7 | Peer NI Health value | - verify setting the local NI health statistics
- lnetctl peer set --nid <nid> --health <value>
- Redo from YAML
|
| Jira |
---|
server | Whamcloud Community Jira |
---|
serverId | 8bba2dd1-4333-3006-bfcd-f35d4ebbd2ad |
---|
key | LU-11529 |
---|
|
|
Testing Tools
The drop policy has been modified to drop outgoing messages with specific errors. This can be done via the following commands. Unfortunately, for details on these commands you'll need to look at the code. A combination of these commands on the different nodes should cover approximately 75% of the health code paths.
...