Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: minor cleanups


Table of Contents

How do I build lustre on

...

CentOS 7.x?

Refer to Walk-thru- Build Lustre MASTER on RHEL 7.3/CentOS 7.3 from Git

...

For more information on the feature refer to the HLD: http://wiki.lustre.org/images/b/bb/Multi-Rail_High-Level_Design_20150119.pdf. Dynamic Behavior section.

How should I setup a

...

cluster with a combination of Multi-Rail enabled nodes and non-Multi-Rail enabled nodes? (#TODO)


What should I do if I upgrade from a non-Multi-Rail Lustre version (<=2.10 ) to a Multi-Rail Lustre version (> 2.10) with regards to Multi-Rail? (#TODO)


Are there any specific consideration when configuring

...

Linux routing for LNet Multi-Rail node?

Refer to: MR Cluster Setup

...

For a sample lnet_selftest script: self-test template script

Is there a way to

...

functionally test LNet?

We're currently working on a functional test tool, LNet Unit Test Framework. The documents will be made available soon.

What

...

are the best OPA LND tunables to use?

Code Block
net:
    - net type: o2ib1
      local NI(s):
        - interfaces:
              0: ib2
          tunables:
              peer_timeout: 180
              peer_credits: 128
              peer_buffer_credits: 0
              credits: 1024
          lnd tunables:
              peercredits_hiw: 4
              map_on_demand: 32
              concurrent_sends: 256
              fmr_pool_size: 2048
              fmr_flush_trigger: 512
              fmr_cache: 1
              conns_per_peer: 4
              ntx: 2048

What

...

are the best HFI tunables to use with

...

Lustre?

Code Block
options hfi1 krcvqs=8 piothreshold=0 sge_copy_mode=2 wss_threshold=70
Info
titleTID RDMA

It is NOT recommended to enable the OPA TID RDMA feature ({{cap_mask=0x4c09a0*1*cbba}}) as this can cause significant memory usage and service errors when there are a large number of connections.

Can you tell me more about how to configure LNet and QoS?

...

  1. Check that you can ping your MGS first.
  2. Check that you can "lnetctl ping" the MGS NID.
    1. If you're able to "lnetctl ping" the MGS NID, then check if you can RDMA using ib_write_bw:
      1. Start ib_write_bw  on your server. This will start a receiver process
      2. Run "ib_write_bw <MGS IP address> "
      3. This will run an RDMA traffic test independent of LNet. If this works then RDMA works and move on to further LNet debugging. Otherwise contact your IB service provider.
    2. Next step is to run run lnet_selftest to verify LNet traffic.
      1. Look at: https://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#lnetselftest
    3. If lnet_selftest works, then verify your MGS is setup properly.

If If ib_write_bw works, but LNet doesn't work, then check your your o2iblnd configuration, as shown below.

I have a routed setup and my clients can't mount? (#TODO)


How do I check

...

my o2iblnd

...

configurations on

...

different nodes are compatible?

  1. Make sure sure peer_credits are the same across your nodes.
    1. peer_credits are dynamically negotiated, such that the lowest lowest peer_credits are used. However, if it's not your intention to have different different peer_credits across the different nodes, it is recommended to ensure they all have the same value.
  2. Make sure your your peer_credits_hiw are the same across your nodes.
    1. peer_credits_hiw define the High Water Mark value, which when reached the outstanding credits on the connection are returned using a No-op message.
  3. Make sure your concurrent_sends are the same across your nodes.
    1. concurrent_sends define the number of concurrent transmits per conneciton
  4. The recommended values for the above parameters are:
    1. peer_credits = 32
    2. peer_credits_hiw = 16
    3. concurrent_sends = 64
    4. Generally speaking you want peer_credits_hiw to be half of peer_credits and concurrent_sends to be two times peer_credits.
  5. Make sure conns_per_peer are the same across the nodes. It defines the number of IB connections to create to 1 peer.
    1. On OPA it is recommend to set this value to 4
    2. On MLX it is recommended to leave it a at the default value of 1

What is is map_on_demand and what should I set it to? (#TODO)

...