2.12 Discovery of Non-MR Peer May Yield Unreachable NID
If non-MR peer (2.10.8) is discovered by a 2.12 MR peer, the following problem may happen: if non-MR peer has LNets that are not defined on the MR peer, it is possible that a NID on the undefined LNet is listed as primary. Later this causes communication problems when mounting.
Status
Open, reproducible in 2.12. Not an issue on master and 2.13.
Steps To Reproduce
Configuration
Prepare two nodes, one running 2.10.8, another 2.12.4 build. Configure LNet similar to the following:
PeerA (2.10.8 non-MR) | PeerB (2.12.4 MR) |
---|
Code Block |
---|
#> lnetctl net show |
- net lo local - 0@lo status: up - net type: o2ib local NI(s): - nid: 0@lo
status: up
- net type: o2ib
local NI(s):
- nid: 192.168.1.123@o2ib |
up interfaces: 0: ib0 - net type: o2ib4 local NI(s): - nid: up
interfaces:
0: ib0
- net type: o2ib4
local NI(s):
- nid: 192.168.1.123@o2ib4 |
up interfaces: 0: ib0lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: o2ib4 local NI(s): - nid: 192.168.1.105@o2ib4 status: up interfaces: 0: | Code Block |
---|
#> lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
- net type: o2ib4
local NI(s):
- nid: 192.168.1.105@o2ib4
status: up
interfaces:
0: ib0 |
|
Procedure
Run discovery of PeerA from PeerB and check the results:
Problem Behaviour | Expected Behaviour |
---|
Code Block |
---|
#> lnetctl discover 192.168.1.123@o2ib4 |
|
- primary
- primary nid: 192.168.1.123@o2ib |
|
False peer ni: - nid: False
peer ni:
- nid: 192.168.1.123@o2ib4 |
|
-
- nid: 192.168.1.123@o2ib |
|
- primary
- primary nid: 192.168.1.123@o2ib |
|
False peer ni: - nid: False
peer ni:
- nid: 192.168.1.123@o2ib4 |
|
NA - nid: NA
- nid: 192.168.1.123@o2ib |
|
| Code Block |
---|
#> lnetctl discover 192.168.1.123@o2ib4 |
|
- primary
- primary nid: 192.168.1.123@o2ib4 |
|
False peer ni:
- nid: False
peer ni:
- nid: 192.168.1.123@o2ib4 |
|
- primary
- primary nid: 192.168.1.123@o2ib4 |
|
False peer ni:
- nid: False
peer ni:
- nid: 192.168.1.123@o2ib4 |
|
Note that in "problem" scenario, PeerA's primary NID is on o2ib net, which is not accessible from PeerB.
Resolution
Port of
Jira |
---|
server | Whamcloud Community Jira |
---|
serverId | 8bba2dd1-4333-3006-bfcd-f35d4ebbd2ad |
---|
key | LU-11641 |
---|
|
resolves the issue.
References
Jira |
---|
server | Whamcloud Community Jira |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 8bba2dd1-4333-3006-bfcd-f35d4ebbd2ad |
---|
key | DDN-1228 |
---|
|
, Jira |
---|
server | Whamcloud Community Jira |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 8bba2dd1-4333-3006-bfcd-f35d4ebbd2ad |
---|
key | LU-13548 |
---|
|