Sometimes customers report that Lustre mount fails on reboot. Upon investigation in many cases this can be tracked down to slow IB interface initialization and consequently LNet failing to start up properly due to the assigned interfaces being down.
This article https://support.hpe.com/hpesc/public/docDisplay?docId=a00040272en_us&docLocale=en_US| provides some background at least on one possible cause of the issue and recommends setting POST_START_DELAY=90 in /etc/openibd.conf as a workaround.
Another workaround option is using lnet.service. This service looks like this:
/etc/systemd/system/multi-user.target.wants/lnet.service: [Service] Type=oneshow ReamainAfterExit=true ExecStartPre=/usr/bin/sleep 90 ExecStart=/sbin/modprobe lnet ExecStart=/usr/sbin/lnetctl lnet configure ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf ExecStop=/usr/sbin/lustre_rmmod ptlrpc ExecStop=/usr/sbin/lnetctl lnet unconfigure ExecStop=/usr/sbin/lustre_rmmod libcfs ldiskfs
Note "ExecStartPre=/usr/bin/sleep 90" which has been added to make sure that network interfaces have enough time to come up before lnet tries to use them on start-up. This can be improved by replacing the long "sleep" with a short "sleep" and periodic polling of the interfaces, expiring on a timeout.
The workaround relies on "/etc/lnet.conf" containing valid configuration for the node. The default lnet.conf typically appears to contain commented out output of "lnetctl export" command. Many users prefer to have a simplified version of lnet.conf which doesn't list anything but interfaces to use per lnet, for example:
net: - net type: o2ib1 local NI(s): - interfaces: 0: ib0 - net type: o2ib4 local NI(s): - interfaces: 0: ib1