Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Negotiation

There are two parameters which are negotiated between peers on connection creation: 

...

  1. Remove the ability to configure map-on-demand via tunables or lnetctl/YAML
  2. Default the max_send_wr to a multiple of a constant: 256
  3. Keep the ability to dial down the number of fragments if the peer supports lower number of fragments. I still don't think there is any actual need to set max_send_wr to anything less than a multiple of 256.
    1. The underlying assumption in the code was that FMR and FastReg both used only 1 fragment, which is no longer the case. If the number of fragments of the message is greater than the number of fragments supported by the peer (or the connection) what should we do? Only option is to divide that into multiple TXs. I contacted Doug Ledford from Redback to see if there is a way to handle gaps in the buffers with FMR on MLX4. If we're able to do that, then it will greatly reduce the complexity of the code.
  4. Optimize the case where all the fragments have no gaps so that in the FMR case we only end up setting rd_nfrags to 1. This will reduce the resource usage on the card; less work requests
  5. In the case of gaps, then flag the RDMA write as requiring gaps, and when it comes time to map it, check that the connection can support the number of gaps, and if it doesn't then fail with a clear message suggesting that peer set the map-on-demand value to 256.
  6. Document the interactions between the ko2iblnd module parameters. Currently there is a spider web of dependencies between the different parameters. Each dependency needs to be justified and documented and removed if it's unnecessary.
  7. Create a simple calculator to calculate the impact of changing the parameters.
    1. For example if you set concurrent_sends to a value X, then how many work requests will be created?
      1. This will be handy to easily understand the cluster configuration without having to go through the pain of re-examining the code.

Ticket tracking changes

Jira
serverHPDD Community Jira
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId8bba2dd1-4333-3006-bfcd-f35d4ebbd2ad
keyLU-10129

o2iblnd Calculator

I created a calculator for the o2iblnd tunables. Given two peers with the tunables, it calculates any adjustments to the tunables that will be performed by the o2iblnd, and calculates several connection attributes, that might be of interest.

Image Added

Image Added

Kernel version pivots on 693, which is the RHEL7.4 release. This is significant because in this release there is no more support for global memory regions, which impacts the calculations.

Any version below 693 will use the calculations assuming that global memory regions is supported.

The tool is written in python. Download here.

Python Requirements

Refer to http://pyforms.readthedocs.io/en/latest/ for more details.

Running it

Code Block
tar -zxvf lustre_2_10_54_o2iblnd_calc.tar.gz
cd lustre_2_10_54
python o2iblnd_tun_gui.py

Even looking at o2iblnd_tun_calc.py makes it simpler to understand how the different tunables impact each other.