Overview

This design attempts to add two new features

Allow Lustre clients to continue using servers even if they change their NIDs during a boot cycle
Allow new Lustre clients to mount the file system on networks which have been added dynamically

Requirements

Feature One Overview

When a Lustre client mounts a server it receives the lustre log describing the servers and their NIDs. It will then attempt to connect to each one of the servers on the first reachable NID. If any of these connections fail, the mount fails. Subsequently it is expected that the server NIDs will remain static. However, this is not true in a dynamically assigned IP address environment. For example, if a server reboots and its NIC gets assigned a different IP address, when lustre comes up the NID it will use will be different. The MGS will send an Imperative Recovery message to the client informing it of the new server NID. The client, however, will not use this server because there is a discrepancy between the NID in the lustre log and the one reported by the Imperative Recovery protocol.

Solution

An IR log is sent with the current NID information of the server. When the client receives the IR log it checks the entry there against what it has already stored from the llog.
- If the entry is not there, then add a new connection to the import
- If the entry is there but the NID list is different, then update the NID information with the latest NID information provided in the IR log.
Since allowing new servers NIDs previously unknown during the initial mount to be used, it could be considered a security risk on some sites.
- Add a new File system level module parameter to enable this feature. The feature is disabled by default.
  - lctl set_param mgc.*.dynamic_nids=1

Feature Two Overview

LNet allows the addition of new NIDs and new Network. For example, if a server initially starts with tcp449. Clients which are only on that network can mount the server. However, during the lifetime of the server other networks can be added, ex: tcp450, tcp451, etc. Since these NIDs were not part of the initial configuration, they will not be recorded in the llog. Client which are only on these networks will not be able to mount the File System, since they will not know how to reach them based on the NIDs provided in the llog.

This feature is intended to allow this scenario. It can be used in a multi-tenancy environments, where clients need to be segregated, with no traffic allowed between the different clients.

Solution

When a NID (possibly for a new network) is added dynamically on a server via lnetctl utility, all peers including the MGS are informed of the addition of the NID.
- Update the MGS internal IR log with the new NID information
- send the IR log to the currently mounted clients.
When a new Client mounts the server:
- the MGS will send it the IR log.
- Prior to this patch the server will only check the llog. The IR log is sent and processed but no new connections are added.
  - This feature allows the client to process the IR log and create new connections only at mount time.
  - After mount, new IR log notifications from the MGS will not alter the connections created unless dynamic_nids is enabled on the servers and the clients.
- The client will update its own internal NID database with the NIDs provided in the IR log and will attempt to connect to the servers on the first reachable NID.
  - If the client is restricted on a specific network, then only NIDs reported in the IR log which are on that network are processed.
  - If the client has LNet discovery disabled, then it will only use the first reachable NID reported in the IR log

Overview

Requirements

Feature One Overview

Solution

Feature Two Overview

Solution

Description of Behavior