Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Make sure there's more than one router in the system and that they can handle the load if one router is decommissioned
    1. If you want the router not to be used immediately after it comes back up, remove the routes pointing to that router from both the clients and the servers
    2. On the decommissioned router run:

      Code Block
      watch -d -n 1 "lnetctl net show -v | grep -E 'send_|recv'"

      That will show you when the traffic has stopped to that router. Once there is no more traffic, proceed to step 2.

  2. Unload LNet Unload Lnet modules on the router to be reconfigured.
  3. On the nodes connected to and configured to use this router,

    Code Block
    lnetctl route show -v 4

    should show that the router is down or if you have removed the routes using that router, then it should no longer appear in the list.
    Remote LNet ping should succeed because other routers should get used instead:

    Code Block
    lnetctl ping <remote nid>
  4. Make changes to the decommissioned router configuration and bring it back online.
  5. Perform any testing required on the router, using LNet selftest, to verify correct operations. Once satisfied with the router's operation, move to step  6.
  6. If you've removed the routes to that router from the clients and servers in step 1a, then re-add them.
  7. On the nodes connected to and configured to use the router, 

    Code Block
    lnetctl route show -v 4

    may show that the router is still down. In that case, rediscover the router:

    Code Block
    lnetctl discover <router nid>