Purpose
The purpose of this page is to describe the structure of the code. There is some confusion around how memory registration works in o2iblnd, particularly around kib_tx, kib_fmr_pool_set, kib_fmr_pool and kib_fmr. Below is an attempt to describe a rather complex area of the code.
On Startup
- FMR/FastReg pools are created per cpt
- For fastreg each pool will have fmr_pool_size kib_fast_reg_descriptor
- Each kib_fast_reg_descriptor will have LNET_MAX_IOV pages allocated using ib_alloc_fast_reg_page_list()
- ib_alloc_fast_reg_mr() is used to allocate the memory region
- A pool of transmits is also created per cpt
- Each transmit gets LNET_MAX_IOV pages associated with it for immediate messages.
- That page is subsequently mapped via kiblnd_dma_map_sing()
- Each tx will have IBLND_MAX_RDMA_FRAGS scatterlist pointed to by tx→tx_frags
- Each tx will have IBLND_MAX_RDMA_FRAGS ib work request entries pointed to by tx→tx_wrq
- Each tx will have IBLND_MAX_RDMA_FRAGS ib_sge entries pointed to by tx→tx_sge
- Each tx will have IBLND_MAX_RDMA_FRAGS kib_rdma_descr a wire structure which holds the fragments describing the data. It is pointed to by tx→tx_rx
Code Flow
| Code Block |
|---|
/* When adding a new o2iblnd NET */
kiblnd_startup()
/* allocate memory pools per cpt for the network */
kiblnd_net_init_pools()
/*
* create the memory pools depending on the type of
* memory registration available.
* memory pools can be added to the list of pools as needed.
* new ones added have a life time of 300 seconds. They are
* cleaned up if they pass the deadline while idle.
* Only the very first pool created on startup remains
*/
kiblnd_create_fmr_pool()
/*
* always prefer FMR if supported
* fps_pool_size is passed in as a parameter for
* the FMR pool allocation
*/
kiblnd_alloc_fmr_pool()
/*
* otherwise use Fast Reg
* We explicitly allocated fps_pool_size of pools
*/
kiblnd_alloc_freg_pool()
/*
* allocate pools of txs per cpt. Each TX is assigned a page
* on x86-64 that's going to 4K big. That's why immediate messages
* are used for message sizes < 4K. They can fit in one page
* This raises a question with regards to PPC which has 64k page sizes
* IBLND_MSG_SIZE didn't change. NOTE: ask James Simmons
*/
kiblnd_init_poolset() |
Structures
| Gliffy Diagram |
|---|
| name | MemoryPoolStructures |
|---|
| pagePin | 1 |
|---|
|
''
On Transmit
- kiblnd_setup_rd_kiov() takes the actual kiov containing the data pages to transmit
- Map those pages into the tx→tx_frags scatter list
- call kiblnd_map_tx()
- setup the kib_rdma_descr wire structure to describe the data and addresses in the device
- kiblnd_sg_dma_len() and kiblnd_sg_dma_address()
- kiblnd_fmr_pool_map() to do the actual mapping into the device.
- For FMR: kib_fmr_pool_map()
- for FastReg pick a free pool and use it for mapping: ib_map_mr_sg()
- If you run out of pools then expand the list
Therefore there are two steps to RDMAing data.
- Allocating the pools in the device
- Mapping the physical memory to RDMA into these pools
Code Flow
| Code Block |
|---|
/* lnet calls kiblnd_send() as a callback */
kiblnd_send()
/* for IOV */
kiblnd_setup_rd_iov()
/* for KIOV */
kiblnd_setup_rd_kiov()
/*
* these two functions setup the scatter/gatter list
* given the kiov/iov
*/
/*
* map the scatter/gather list to physical memory pools
* created on startup (look at the parent wiki page for
* more details)
*/
kiblnd_map_tx()
kiblnd_fmr_map_tx()
/* use the pool set for the CPT of the tx pool */
/* Does the actual work of mapping the memory to be transmitted */
kiblnd_fmr_pool_map() |
Structures
| Gliffy Diagram |
|---|
| name | TXMemorPoolStructure |
|---|
| pagePin | 2 |
|---|
|