This is a page tracks issues that we run into while testing the Multi-Rail Feature
Problem List
- with 17 interfaces trying to discover on the 17th interface returns an error "no route to host". This is an intermittent issue. Attempting to reproduce
- With 17 interface discovered "show peer" hangs
- looks like it's hanging in the ioctl call.
- "lnetctl discover" command hangs with discovery off. This happened once, so an intermittent issue. Will try to reproduce.
- "lnetctl discover" discovers the peer even with discovery off.
- Doug: I configured a Parallels VM with 16 interfaces (won't let me do 17 as 16 is a limit). When I "lctl network configure" with no YAML or module parameters, I get this error from ksocklnd: "Mar 7 14:01:16 centos-7 kernel: LNet: 5111:0:(socklnd.c:2652:ksocknal_enumerate_interfaces()) Ignoring interface virbr0 (too many interfaces)".
Doug: I configured a node with three interfaces on the same network using module parameters:
options lnet networks=tcp0(eth0,eth1,eth2)
On "lctl network configure", they are all properly configured according to the logs:Mar 7 14:04:46 centos-7 kernel: LNet: Added LNI 10.211.55.58@tcp [8/256/0/180] Mar 7 14:04:46 centos-7 kernel: LNet: Added LNI 10.211.55.60@tcp [8/256/0/180] Mar 7 14:04:46 centos-7 kernel: LNet: Added LNI 10.211.55.61@tcp [8/256/0/180]
However, when I do an "lnetctl net show", I get this:
net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 10.211.55.58@tcp status: up interfaces: 0: eth0 - nid: 10.211.55.60@tcp status: up interfaces: 0: eth1 - nid: 10.211.55.61@tcp status: up interfaces: 0: eth2
All interfaces are treated as being on different NIs. When I "lctl ping" from another node, it sees the 3 interfaces as being on the same lnet:
[root@centos-7 ~]# lctl ping 10.211.55.58@tcp 12345-0@lo 12345-10.211.55.58@tcp 12345-10.211.55.60@tcp 12345-10.211.55.61@tcp
And, when I trigger discover on that other node, it sees these interfaces as being on the same lnet:
[root@centos-7 ~]# lnetctl discover --nid 10.211.55.58@tcp discover: - primary nid: 10.211.55.58@tcp Multi-Rail: True peer ni: - nid: 10.211.55.58@tcp - nid: 10.211.55.60@tcp - nid: 10.211.55.61@tcp
That is not what I am expecting given the syntax of the module parameters or the logs. Is this a bug?