Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

It is important to know how the customer is using the network with Lustre. 

...

  • which underlying network types are configured (IB, RoCE, Ethernet/TCP, etc.)
  • which LNet types are used (o2ib, tcp, kfi, etc.)
  • whether LNet routing is configured
  • number of servers and clients in the cluster, per type (there may be o2ib and tcp clients, for example)
  • is the system single-rail or multi-rail
  • which subnets are used (if there are multiple subnets)

Configuration

It should be made clear how Lustre and LNet modules are configured.

  • scripts under /etc/modprobe.d/ 
  • lnet.service
  • dynamic configuration by custom scripts
  • combination of the above

Separately, configuration outside of Lustre/LNet should be checked also:

  • Linux version
  • Driver version (e.g. MOFED)
  • Linux network configuration (network interfaces, MTU, routing tables, select sysctl settings)

Consistency

It is important to know whether all nodes of the same type are similar and are configured the same. Check the following across the nodes of the same type:

  • Lustre version (lctl --vesion)
  • Lustre/LNet configuration
  • Linux network configuration

For example, inconsistent MTU setting in the system can often be found responsible for bulk transfer failures

Logs


Lnetctl Outputs


Scenarios

Connectivity

...