Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: fix build link


Table of Contents

How do I build

...

Lustre?

Refer to Walk-thru- Build Lustre MASTER on RHEL 7.3/CentOS 7.3 from Git

Also refer to Building Lustre/LNet Centos/RHEL 7.x for some quirks when building.

Building Lustre from Source

How do I load LNet

Load the module

...

  1. Make sure there's more than one router in the system and that they can handle the load if one router is decommissioned
      Unload Lnet modules on
      1. If you want the router not to be
      reconfigured.

      On the nodes connected to and configured to use this router,

      1. used immediately after it comes back up, remove the routes pointing to that router from both the clients and the servers
      2. On the decommissioned router run:

        Code Block
        watch -d -n 1 "lnetctl net
      Code Blocklnetctl route
      1.  show -v 
      4

      should show that the router is down. Remote LNet ping should succeed because other routers should get used instead:

      Code Block
      lnetctl ping <remote nid>
      Make changes to the decommissioned router configuration and bring it back online
      1. | grep -E 'send_|recv'"

        That will show you when the traffic has stopped to that router. Once there is no more traffic, proceed to step 2.

    1. Unload LNet modules on the router to be reconfigured.
    2. On the nodes connected to and configured to use the this router, 

      Code Block
      lnetctl route show -v 4

      may should show that the router is still down. In that case, rediscover the router:

      Code Blocklnetctl discover <router nid>

      down or if you have removed the routes using that router, then it should no longer appear in the list.
      Remote LNet ping should succeed because other routers should get used instead:

      Code Block
      lnetctl ping <remote nid>
    3. Make changes to the decommissioned router configuration and bring it back online.
    4. Perform any testing required on the router, using LNet selftest, to verify correct operations. Once satisfied with the router's operation, move to step  6.
    5. If you've removed the routes to that router from the clients and servers in step 1a, then re-add them.

    On the nodes connected to and configured to use the router, 

    Code Block
    lnetctl route show -v 4

    may show that the router is still down. In that case, rediscover the router:

    Code Block
    lnetctl discover <router nid>

    Which sysctl setting are optimal?

    On systems using tcp, default settings for arp cache thresholds may be too low:

    Code Block
    net.ipv4.neigh.default.gc_thresh1 = 128
    net.ipv4.neigh.default.gc_thresh2 = 512
    net.ipv4.neigh.default.gc_thresh3 = 1024

    Replace these with:

    Code Block
    net.ipv4.neigh.default.gc_thresh1 = 8192
    net.ipv4.neigh.default.gc_thresh2 = 16384
    net.ipv4.neigh.default.gc_thresh3 = 32768

    There are other sysctl settings required for proper MR operation. For these, refer to MR Cluster Setup