You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

This is a page tracks issues that we run into while testing the Multi-Rail Feature

Problem List

  • with 17 interfaces trying to discover on the 17th interface returns an error "no route to host". This is an intermittent issue. Attempting to reproduce
  • With 17 interface discovered "show peer" hangs
    • looks like it's hanging in the ioctl call.
  • "lnetctl discover" command hangs with discovery off. This happened once, so an intermittent issue. Will try to reproduce.
  • "lnetctl discover" discovers the peer even with discovery off.
  • Doug: I configured a Parallels VM with 16 interfaces (won't let me do 17 as 16 is a limit).  When I "lctl network configure" with no YAML or module parameters, I get this error from ksocklnd: "Mar 7 14:01:16 centos-7 kernel: LNet: 5111:0:(socklnd.c:2652:ksocknal_enumerate_interfaces()) Ignoring interface virbr0 (too many interfaces)".
  • Doug: I configured a node with three interfaces on the same network using module parameters: 

    options lnet networks=tcp0(eth0,eth1,eth2)


    On "lctl network configure", they are all properly configured according to the logs: 

    Mar  7 14:04:46 centos-7 kernel: LNet: Added LNI 10.211.55.58@tcp [8/256/0/180]
    Mar  7 14:04:46 centos-7 kernel: LNet: Added LNI 10.211.55.60@tcp [8/256/0/180]
    Mar  7 14:04:46 centos-7 kernel: LNet: Added LNI 10.211.55.61@tcp [8/256/0/180]
    

    However, when I do an "lnetctl net show", I get this: 

    net:
        - net type: lo
          local NI(s):
            - nid: 0@lo
              status: up
        - net type: tcp
          local NI(s):
            - nid: 10.211.55.58@tcp
              status: up
              interfaces:
                  0: eth0
            - nid: 10.211.55.60@tcp
              status: up
              interfaces:
                  0: eth1
            - nid: 10.211.55.61@tcp
              status: up
              interfaces:
                  0: eth2
    

    All interfaces are treated as being on different NIs.  When I "lctl ping" from another node, it sees the 3 interfaces as being on the same lnet: 

    [root@centos-7 ~]# lctl ping 10.211.55.58@tcp
    12345-0@lo
    12345-10.211.55.58@tcp
    12345-10.211.55.60@tcp
    12345-10.211.55.61@tcp
    

    And, when I trigger discover on that other node, it sees these interfaces as being on the same lnet: 

    [root@centos-7 ~]# lnetctl discover --nid 10.211.55.58@tcp
    discover:
        - primary nid: 10.211.55.58@tcp
          Multi-Rail: True
          peer ni:
            - nid: 10.211.55.58@tcp
            - nid: 10.211.55.60@tcp
            - nid: 10.211.55.61@tcp
    

    That is not what I am expecting given the syntax of the module parameters or the logs.  Is this a bug?


  • No labels