Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

After looking at the 2.7 code base, it appears that the only real use of map_on_demand was to use it had two uses:

  1. Used as a flag to

...

  1. turn on the use of FMR or PMR. It wouldn't really matter if it was set to 1 or 256, since again in the FMR case rd_nfrags == 1

...

  1. .
  2. Used to allocate the maximum size of work request queue.

NOTE: init_qp_attr->cap.max_send_wr is set to IBLND_SEND_WRS(conn) on connection creation. That macro derives its value from ibc_max_frags which reflects the negotiated value based on the configured map_on_demand.

...

Conclusion on map_on_demand

It appears the intended usage The main purpose of map_on_demand  is to control the maximum number of RDMA fragments transferred. However, when calculating the rd_nfrags in kiblnd_map_tx(), there is no consideration given to the negotiated max_frags value. The underlying assumption in the code then is that if rd_nfrags exceeds the number of negotiated max_frags, we can use FMR/FastReg which maps all the fragments into 1 FMR/FastReg fragment and if we are using FMR/Fast Reg there is no real impact to this tunable. An assumption now broken due to https://review.whamcloud.com/29290/. This patch handles gaps in the fragments by describe each fragment in the RD as a zero based address; however, there could be up to 256 fragments (on x86_64), and if the negotiated max_frags is less than that, then write will fail.Given the usage of map_on_demand described above I find it difficult to understand the necessity of having this tunable. It appears to only complicate the code without adding any significant functionality. is to negotiate the size of the work requests queue size on the opposite sides of the QP. By setting it to, for example 32, the behavior would be to use global memory regions (for RHEL7.3 and earlier) for RDMAing buffers which have < 32 fragments or use FMR/FastReg for buffers that have >= 32 fragments. When using FMR we need only 1 WR for RDMA transfer message. This is true because we map the pages to the fmr pool using: ib_fmr_pool_map_phys(), which maps the list of pages to a FMR region, which requires only 1 WR to transfer.

When using FastReg we need 1 for RDMA transfer, 1 for map operation and 1 for invalidate operation, so 3 in total

The benefit, therefore, that map-on-demand provides is the ability to reduce the size of the qp send work requests queue.

However, given the o2iblnd's backwards compatibility requirements, we need to be able to interface with older Lustre versions which use up to 256 fragments. Therefore we decided to remove the map-on-demand configuration and default it to 256. Look below for the proposed solution.

This has the advantage of reducing the complexity of the code; however, in many cases it would consume more memory than needed. This has been observed on OPA using TID-RDMA. Look at 

Jira
serverHPDD Community Jira
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId8bba2dd1-4333-3006-bfcd-f35d4ebbd2ad
keyLU-10875
for more details.

Proposal

Overview

The way the RMDA write is done in the o2iblnd is as follows:

...