You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Due to Linux routing quirks, if there are two network interfaces on the same node, the HW address returned in the ARP for a specific IP might not necessarily be the one for the exact interface being ARPed.

This causes problems for o2iblnd, because it resolves the address using IPoIB, and gets the wrong Infiniband address. This causes problems with connections.

To get around this problem we need to setup routing entries and rules to tell the linux Kernel to respond with the correct HW address.

I use trevis-402 as an example. But this will need to be done for other nodes with multiple interfaces of the same kind, MLX, OPA, ETH

trevis-402 

Configuration

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 00:1e:67:d3:f9:41 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 00:1e:67:d3:f9:42 brd ff:ff:ff:ff:ff:ff
4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP mode DEFAULT qlen 256
    link/infiniband 80:00:00:67:fe:80:00:00:00:00:00:00:24:8a:07:03:00:93:9a:24 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
5: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP mode DEFAULT qlen 256
    link/infiniband 80:00:00:67:fe:80:00:00:00:00:00:00:24:8a:07:03:00:93:9c:3c brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

Setup

#Setting ARP so it doesn't broadcast
sysctl -w net.ipv4.conf.all.rp_filter=0
sysctl -w net.ipv4.conf.ib0.arp_ignore=1
sysctl -w net.ipv4.conf.ib0.arp_filter=0
sysctl -w net.ipv4.conf.ib0.arp_announce=2
sysctl -w net.ipv4.conf.ib0.rp_filter=0

sysctl -w net.ipv4.conf.ib1.arp_ignore=1
sysctl -w net.ipv4.conf.ib1.arp_filter=0
sysctl -w net.ipv4.conf.ib1.arp_announce=2
sysctl -w net.ipv4.conf.ib1.rp_filter=0

ip neigh flush dev ib0
ip neigh flush dev ib1
 
ip route add 192.168.2.0/24 dev ib1 proto kernel scope link src 192.168.2.2 table ib1
ip route add 192.168.1.0/24 dev ib0 proto kernel scope link src 192.168.1.2 table ib0
ip rule add from 192.168.1.2 table ib0
ip rule add from 192.168.2.2 table ib1
ip route flush cache

# Try to get the system in the following state:
[root@trevis-402 ~]# ip route show table ib1
192.168.2.0/24 dev ib1  proto kernel  scope link  src 192.168.2.2 


[root@trevis-402 ~]# ip route show table ib1
192.168.2.0/24 dev ib1  proto kernel  scope link  src 192.168.2.2 
 
# Also make sure to flush the arp cache from the other nodes, so that there is no confusion with addressing.
  • No labels