Overview

Dynamic Discovery (DD) is a feature that allows nodes to dynamically discover a peer's interfaces without having to explicitly configure them. This is very useful for Multi-Rail (MR) configuration. In large clusters there could be hundreds of nodes, having to configure MR peers on each node becomes error prone. DD is enabled by default and uses a new protocol based on LNet pings to discover the interfaces of the remote peers on first message.

DD Protocol

When LNet on a node is requested to send a message to a peer it first attempts to ping the peer. The reply to the ping contains the peer's NIDs as well as a feature bit outlining what the peer supports. DD adds a Multi-Rail feature bit. If the peer is Multi-Rail capable it sets the MR bit in the ping reply. When the node receives the reply it checks the MR bit, if it is set it pushes its own list of NIDs to the peer using a new PUT message, a "push ping". After this brief protocol, both the peer and the node will have each other's list of interfaces. The MR algorithm can then proceed to use the list of interfaces of the corresponding peer.

If the peer is not MR capable, it will not set the MR feature bit in the ping reply. The node will understand that the peer is not MR capable and will only use the interface provided by upper layers for sending messages.

DD and User-space Configuration

It is possible to configure the peer manually while DD is running. Manual peer configuration always takes precedence over DD. If there is a discrepancy between the manual configuration and the dynamically discovered information, a warning is printed.

Configuration

DD is very light on the configuration side. It can only be turned on and off.

lnetctl set discovery [0 | 1]

Initiating DD on Demand

It is possible to initiate the DD protocol on demand without having to wait for a message to be sent to the peer. This can be done with the following command

lnetctl discover <peer_nid> [<peer_nid> ...]