Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

2.12 Discovery of Non-MR Peer May Yield Unreachable NID

If non-MR peer (2.10.8) is discovered by a 2.12 MR peer, the following problem may happen: if non-MR peer has LNets that are not defined on the MR peer, it is possible that a NID on the undefined LNet is listed as primary. Later this causes communication problems when mounting.

Status

Open, reproducible in 2.12. Not an issue on master and 2.13.

Steps To Reproduce

Configuration

Prepare two nodes, one running 2.10.8, another 2.12.4 build.  Configure LNet similar to the following:

PeerA (2.10.8 non-MR)PeerB (2.12.4 MR)
Code Block
#> lnetctl net show


net:

    - net

    - net type:
lo
      local
 lo
      local NI(s):

        -

        - nid:
0@lo
          status: up
    - net type: o2ib
      local NI(s):
        - nid:
 0@lo
          status: up
    - net type: o2ib
      local NI(s):
        - nid: 192.168.1.123@o2ib

         

          status:
up
          interfaces:
              0: ib0
    - net type: o2ib4
      local NI(s):
        - nid:
 up
          interfaces:
              0: ib0
    - net type: o2ib4
      local NI(s):
        - nid: 192.168.1.123@o2ib4

         

          status:
up
          interfaces:
              0: ib0

lnetctl net show

net:

    - net type: lo

      local NI(s):

        - nid: 0@lo

          status: up

    - net type: o2ib4

      local NI(s):

        - nid: 192.168.1.105@o2ib4

          status: up

          interfaces:

              0:
 up
          interfaces:
              0: ib0
Code Block
#> lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
    - net type: o2ib4
      local NI(s):
        - nid: 192.168.1.105@o2ib4
          status: up
          interfaces:
              0: ib0

Procedure

Run discovery of PeerA from PeerB and check the results:

Problem Behaviour Expected Behaviour
Code Block
#> lnetctl discover 192.168.1.123@o2ib4


discover:

    - primary

    - primary nid: 192.168.1.123@o2ib

     

      Multi-Rail:
False
      peer ni:
        - nid:
 False
      peer ni:
        - nid: 192.168.1.123@o2ib4

        -

        - nid: 192.168.1.123@o2ib



#> lnetctl peer show


peer:

    - primary

    - primary nid: 192.168.1.123@o2ib

     

      Multi-Rail:
False
      peer ni:
        - nid:
 False
      peer ni:
        - nid: 192.168.1.123@o2ib4

         

          state:
NA
        - nid:
 NA
        - nid: 192.168.1.123@o2ib

         

          state: NA
Code Block
#> lnetctl discover 192.168.1.123@o2ib4

discover:
    - primary

    - primary nid: 192.168.1.123@o2ib4
     

      Multi-Rail:
False

      peer ni:

        - nid:
 False
      peer ni:
        - nid: 192.168.1.123@o2ib4


#> lnetctl peer show

peer:
    - primary

    - primary nid: 192.168.1.123@o2ib4
     

      Multi-Rail:
False

      peer ni:

        - nid:
 False
      peer ni:
        - nid: 192.168.1.123@o2ib4
         

          state: NA

Note that in "problem" scenario, PeerA's  primary NID is on o2ib net, which is not accessible from PeerB

Resolution

Port of

Jira
serverWhamcloud Community Jira
serverId8bba2dd1-4333-3006-bfcd-f35d4ebbd2ad
keyLU-11641
resolves the issue.

References

Jira
serverWhamcloud Community Jira
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId8bba2dd1-4333-3006-bfcd-f35d4ebbd2ad
keyDDN-1228
,
Jira
serverWhamcloud Community Jira
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId8bba2dd1-4333-3006-bfcd-f35d4ebbd2ad
keyLU-13548