Overview

This design attempts to add two new features

  1. Allow Lustre clients to continue using servers even if they change their NIDs during a boot cycle
  2. Allow new Lustre clients to mount the file system on networks which have been added dynamically

Requirements

Feature One Overview

When a Lustre client mounts a server it receives the lustre log describing the servers and their NIDs. It will then attempt to connect to each one of the servers on the first reachable NID. If any of these connections fail, the mount fails. Subsequently it is expected that the server NIDs will remain static. However, this is not true in a dynamically assigned IP address environment. For example, if a server reboots and its NIC gets assigned a different IP address, when lustre comes up the NID it will use will be different. The MGS will send an Imperative Recovery message to the client informing it of the new server NID. The client, however, will not use this server because there is a discrepancy between the NID in the lustre log and the one reported by the Imperative Recovery protocol.

Solution

Feature Two Overview

LNet allows the addition of new NIDs and new Network. For example, if a server initially starts with tcp449. Clients which are only on that network can mount the server. However, during the lifetime of the server other networks can be added, ex: tcp450, tcp451, etc. Since these NIDs were not part of the initial configuration, they will not be recorded in the llog. Client which are only on these networks will not be able to mount the File System, since they will not know how to reach them based on the NIDs provided in the llog.

This feature is intended to allow this scenario. It can be used in a multi-tenancy environments, where clients need to be segregated, with no traffic allowed between the different clients.

Solution

Description of Behavior