| Table of Contents |
|---|
How do I build
...
Lustre?
Refer to Walk-thru- Build Lustre MASTER on RHEL 7.3/CentOS 7.3 from Git
Also refer to Building Lustre/LNet Centos/RHEL 7.x for some quirks when building.
How do I load LNet
Load the module
...
- Make sure there's more than one router in the system and that they can handle the load if one router is decommissioned
- If you want the router not to be
On the nodes connected to and configured to use this router,
- used immediately after it comes back up, remove the routes pointing to that router from both the clients and the servers
On the decommissioned router run:
Code Block watch -d -n 1 "lnetctl net
lnetctl routeCode Block show -v
should show that the router is down. Remote LNet ping should succeed because other routers should get used instead:
Make changes to the decommissioned router configuration and bring it back onlineCode Block lnetctl ping <remote nid>| grep -E 'send_|recv'"That will show you when the traffic has stopped to that router. Once there is no more traffic, proceed to step 2.
- Unload LNet modules on the router to be reconfigured.
On the nodes connected to and configured to use the this router,
Code Block lnetctl route show -v 4
may should show that the router is still down. In that case, rediscover the router:
lnetctl discover <router nid>Code Block down or if you have removed the routes using that router, then it should no longer appear in the list.
Remote LNet ping should succeed because other routers should get used instead:Code Block lnetctl ping <remote nid>- Make changes to the decommissioned router configuration and bring it back online.
- Perform any testing required on the router, using LNet selftest, to verify correct operations. Once satisfied with the router's operation, move to step 6.
- If you've removed the routes to that router from the clients and servers in step 1a, then re-add them.
On the nodes connected to and configured to use the router,
| Code Block |
|---|
lnetctl route show -v 4 |
may show that the router is still down. In that case, rediscover the router:
| Code Block |
|---|
lnetctl discover <router nid> |
Which sysctl setting are optimal?
On systems using tcp, default settings for arp cache thresholds may be too low:
| Code Block |
|---|
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024 |
Replace these with:
| Code Block |
|---|
net.ipv4.neigh.default.gc_thresh1 = 8192
net.ipv4.neigh.default.gc_thresh2 = 16384
net.ipv4.neigh.default.gc_thresh3 = 32768 |
There are other sysctl settings required for proper MR operation. For these, refer to MR Cluster Setup