Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Sometimes nodes can have different types of interfaces, for example, MLX and OPA. It is desired to configure both of those with different tunables. To do that, we must use the YAML configuration.

Assuming a node with two interfaces one OPA and one MLX. Configure MLX on o2ib and OPA on 2ib1

Code Block
#> cat networkConfig.yaml
 net:
    - net type: o2ib
      local NI(s):
        - interfaces:
              0: ib0
          tunables:
              peer_timeout: 180
              peer_credits: 8
              peer_buffer_credits: 0
              credits: 256
          lnd tunables:
              peercredits_hiw: 4
              map_on_demand: 0
              concurrent_sends: 8
              fmr_pool_size: 512
              fmr_flush_trigger: 384
              fmr_cache: 1
              conns_per_peer: 1
    - net type: o2ib1
      local NI(s):
        - interfaces:
              0: ib2
          tunables:
              peer_timeout: 180
              peer_credits: 128
              peer_buffer_credits: 0
              credits: 1024
          lnd tunables:
              peercredits_hiw: 4
              map_on_demand: 32
              concurrent_sends: 256
              fmr_pool_size: 2048
              fmr_flush_trigger: 512
              fmr_cache: 1
              conns_per_peer: 4
              ntx: 2048
 
#> lnetctl import < networkConfig.yaml

How can I configure routing?

...

Code Block
tiny_router_buffers # 512 min for each CPT
small_router_buffers # 4096 min for each CPT
large_router_buffers # 256 min for each CPT

They can also dynamically change them via lnetctl. Note that the values you enter are divided among the CPTs configured. The minimum value restriction is enforced per CPT.

Code Block
lnetctl set tiny_buffers <value>
lnetctl set small_buffers <value>
lnetctl set large_buffers <value>

They can also be set dynamically via lnetctlYAML config:

Code Block
 buffers:
    tiny: <value>
    small: <value>
    large: <value>

Other parameters of interest

Code Block
check_routers_before_use # Assume routers are down and ping them before use. Defaults to disabled.
avoid_asym_router_failure # Avoid asymmetrical router failures (0 to disable). Defaults to enabled
dead_router_check_interval # Seconds between dead router health checks (<= 0 to disable). Defaults to 60 seconds
live_router_check_interval # Seconds between live router health checks (<= 0 to disable). Defaults to 60 seconds
router_ping_timeout # Seconds to wait for the reply to a router health query. Defaults to 50 seconds

What is a router pinger?

Whenever a route entry is configured on a node, the gateway specified is added to a list. The router pinger is a thread that periodically pings the gateways to ensure that they are up. The gateway entries are segregated into two different categories, live gateways and dead gateways. dead_router_check_interval is the time interval used to ping the dead gateways, while live_router_check_interval is the time interval used to ping the live routers.

The router pinger thread wakes up every second in the following cases:

  1. The node is a router
  2. There are gateways configured and either the live or dead check intervals are configured.

The pinger waits for router_ping_timeout for the gateway to respond to a ping health check. 

What is LNet Multi-Rail?

LNet Multi-Rail allows multiple interfaces to be used for sending LNet messages. This feature boosts performance. A follow up feature, LNet Resiliency, currently being worked on is aimed at increasing resiliency.

Refer to: http://wiki.lustre.org/Multi-Rail_LNet for the Requirements, HLD and LUG presentations.

How can I configure multiple network interfaces per network?

Via command line:

Code Block
lnetctl net add --net <network> --if <list of comma separated interfaces>
# Example
lnetctl net add --net o2ib --if ib0,ib1

From YAML configuration. The values of the tunabes can be changed to whatever value desired.

Code Block
net:
    - net type: o2ib
      local NI(s):
        - interfaces:
              0: ib0
          tunables:
              peer_timeout: 180
              peer_credits: 8
              peer_buffer_credits: 0
              credits: 256
          lnd tunables:
              peercredits_hiw: 4
              map_on_demand: 0
              concurrent_sends: 8
              fmr_pool_size: 512
              fmr_flush_trigger: 384
              fmr_cache: 1
              conns_per_peer: 1
        - interfaces:
              0: ib1
          tunables:
              peer_timeout: 180
              peer_credits: 8
              peer_buffer_credits: 0
              credits: 256
          lnd tunables:
              peercredits_hiw: 4
              map_on_demand: 0
              concurrent_sends: 8
              fmr_pool_size: 512
              fmr_flush_trigger: 384
              fmr_cache: 1
              conns_per_peer: 1

How can I statically configure Multi-Rail?

There are two steps to configuring multi-rail:

  1. Configuring the local network interfaces as shown above
  2. Configuring the Multi-rail enabled peers.

The first step ensures that the local node knows the different interfaces it can send messages over. The second steps tell the local node which peers are multi-rail enabled and which interfaces of these peers to use.

For more information on exact configuration examples, refer to: https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#lnetmr

How can I dynamically configure Multi-Rail?

How should I configure a system to use LNet Multi-Rail?

Configuring the peers manually is error prone process. It's best if a node is able to discover its peers dynamically. The Dynamic Discovery feature allows a node to discover the interfaces of its peers the first time it communicates with it.

Whenever the local interface list change an update is sent to all connected peers.

This feature reduces the configuration burden to only configuring the local interfaces of the node.

For more information on the feature refer to the HLD: http://wiki.lustre.org/images/b/bb/Multi-Rail_High-Level_Design_20150119.pdf. Dynamic Behavior section.

Are there any specific consideration when configuring linux routing for LNet Multi-Rail node?

Refer to: MR Cluster Setup

Are there any specific routing consideration with Multi-Rail?

Refer to: Multi-Rail (MR) Routing

How can I test LNet performance?

lnet_selftest is available for performance testing. Refer to: https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#lnetselftest

For a sample lnet_selftest script: self-test template script

What's the best OPA LND tunables to use?

Code Block
net:
    - net type: o2ib1
      local NI(s):
        - interfaces:
              0: ib2
          tunables:
              peer_timeout: 180
              peer_credits: 128
              peer_buffer_credits: 0
              credits: 1024
          lnd tunables:
              peercredits_hiw: 4
              map_on_demand: 32
              concurrent_sends: 256
              fmr_pool_size: 2048
              fmr_flush_trigger: 512
              fmr_cache: 1
              conns_per_peer: 4
              ntx: 2048

What's the best HFI tunables to use with Luster?

Code Block
options hfi1 krcvqs=8 piothreshold=0 sge_copy_mode=2 wss_threshold=70

...

Can you tell me more about how to configure LNet and QoS?

Refer to: Lustre QoS

How can I find out the service level of LNet traffic?

...