Purpose
The purpose of this page is to describe the structure of the code. There is some confusion around how memory registration works in o2iblnd, particularly around kib_tx, kib_fmr_pool_set, kib_fmr_pool and kib_fmr. Below is an attempt to describe a rather complex area of the code.
On Startup
- FMR/FastReg pools are created per cpt
- For fastreg each pool will have fmr_pool_size kib_fast_reg_descriptor
- Each kib_fast_reg_descriptor will have LNET_MAX_IOV pages allocated using ib_alloc_fast_reg_page_list()
- ib_alloc_fast_reg_mr() is used to allocate the memory region
- For fastreg each pool will have fmr_pool_size kib_fast_reg_descriptor
- A pool of transmits is also created per cpt
- Each transmit gets LNET_MAX_IOV pages associated with it for immediate messages.
- That page is subsequently mapped via kiblnd_dma_map_sing()
- Each tx will have IBLND_MAX_RDMA_FRAGS scatterlist pointed to by tx→tx_frags
- Each tx will have IBLND_MAX_RDMA_FRAGS ib work request entries pointed to by tx→tx_wrq
- Each tx will have IBLND_MAX_RDMA_FRAGS ib_sge entries pointed to by tx→tx_sge
- Each tx will have IBLND_MAX_RDMA_FRAGS kib_rdma_descr a wire structure which holds the fragments describing the data. It is pointed to by tx→tx_rx
Code Flow
/* When adding a new o2iblnd NET */ kiblnd_startup() /* allocate memory pools per cpt for the network */ kiblnd_net_init_pools() /* * create the memory pools depending on the type of * memory registration available. * memory pools can be added to the list of pools as needed. * new ones added have a life time of 300 seconds. They are * cleaned up if they pass the deadline while idle. * Only the very first pool created on startup remains */ kiblnd_create_fmr_pool() /* * always prefer FMR if supported * fps_pool_size is passed in as a parameter for * the FMR pool allocation */ kiblnd_alloc_fmr_pool() /* * otherwise use Fast Reg * We explicitly allocated fps_pool_size of pools */ kiblnd_alloc_freg_pool() /* * allocate pools of txs per cpt. Each TX is assigned a page * on x86-64 that's going to 4K big. That's why immediate messages * are used for message sizes < 4K. They can fit in one page * This raises a question with regards to PPC which has 64k page sizes * IBLND_MSG_SIZE didn't change. NOTE: ask James Simmons */ kiblnd_init_poolset()
Structures
'
On Transmit
- kiblnd_setup_rd_kiov() takes the actual kiov containing the data pages to transmit
- Map those pages into the tx→tx_frags scatter list
- call kiblnd_map_tx()
- setup the kib_rdma_descr wire structure to describe the data and addresses in the device
- kiblnd_sg_dma_len() and kiblnd_sg_dma_address()
- setup the kib_rdma_descr wire structure to describe the data and addresses in the device
- kiblnd_fmr_pool_map() to do the actual mapping into the device.
- For FMR: kib_fmr_pool_map()
- for FastReg pick a free pool and use it for mapping: ib_map_mr_sg()
- If you run out of pools then expand the list
Therefore there are two steps to RDMAing data.
- Allocating the pools in the device
- Mapping the physical memory to RDMA into these pools
Code Flow
/* lnet calls kiblnd_send() as a callback */ kiblnd_send() /* for IOV */ kiblnd_setup_rd_iov() /* for KIOV */ kiblnd_setup_rd_kiov() /* * these two functions setup the scatter/gatter list * given the kiov/iov */ /* * map the scatter/gather list to physical memory pools * created on startup (look at the parent wiki page for * more details) */ kiblnd_map_tx() kiblnd_fmr_map_tx() /* use the pool set for the CPT of the tx pool */ /* Does the actual work of mapping the memory to be transmitted */ kiblnd_fmr_pool_map()
Structures