Purpose

The purpose of this page is to describe the structure of the code. There is some confusion around how memory registration works in o2iblnd, particularly around kib_tx, kib_fmr_pool_setkib_fmr_pool and kib_fmr. Below is an attempt to describe a rather complex area of the code.

On Startup

  • FMR/FastReg pools are created per cpt
    • For fastreg each pool will have fmr_pool_size kib_fast_reg_descriptor
      • Each kib_fast_reg_descriptor will have LNET_MAX_IOV pages allocated using ib_alloc_fast_reg_page_list()
      • ib_alloc_fast_reg_mr() is used to allocate the memory region
  • A pool of transmits is also created per cpt
    • Each transmit gets LNET_MAX_IOV pages associated with it for immediate messages.
    • That page is subsequently mapped via kiblnd_dma_map_sing()
    • Each tx will have IBLND_MAX_RDMA_FRAGS scatterlist pointed to by tx→tx_frags
    • Each tx will have IBLND_MAX_RDMA_FRAGS ib work request entries pointed to by tx→tx_wrq
    • Each tx will have IBLND_MAX_RDMA_FRAGS ib_sge entries pointed to by tx→tx_sge
    • Each tx will have IBLND_MAX_RDMA_FRAGS kib_rdma_descr a wire structure which holds the fragments describing the data. It is pointed to by tx→tx_rx

Code Flow

/* When adding a new o2iblnd NET */
kiblnd_startup()
	/* allocate memory pools per cpt for the network */
	kiblnd_net_init_pools()
		/*
		 * create the memory pools depending on the type of 
		 * memory registration available.
		 * memory pools can be added to the list of pools as needed.
		 * new ones added have a life time of 300 seconds. They are
		 * cleaned up if they pass the deadline while idle.
		 * Only the very first pool created on startup remains
		 */
		kiblnd_create_fmr_pool()
			/*
			 * always prefer FMR if supported
			 * fps_pool_size is passed in as a parameter for
			 * the FMR pool allocation
			 */
			kiblnd_alloc_fmr_pool()
			/*
			 * otherwise use Fast Reg
			 * We explicitly allocated fps_pool_size of pools
			 */
			kiblnd_alloc_freg_pool()
		/*
		 * allocate pools of txs per cpt. Each TX is assigned a page
		 * on x86-64 that's going to 4K big. That's why immediate messages
		 * are used for message sizes < 4K. They can fit in one page
		 * This raises a question with regards to PPC which has 64k page sizes
		 * IBLND_MSG_SIZE didn't change. NOTE: ask James Simmons
		 */
		kiblnd_init_poolset()

Structures

MemoryPoolStructures '


On Transmit

  • kiblnd_setup_rd_kiov() takes the actual kiov containing the data pages to transmit
    • Map those pages into the tx→tx_frags scatter list
  • call kiblnd_map_tx()
    • setup the kib_rdma_descr wire structure to describe the data and addresses in the device
      • kiblnd_sg_dma_len() and kiblnd_sg_dma_address()
  • kiblnd_fmr_pool_map() to do the actual mapping into the device.
    • For FMR: kib_fmr_pool_map()
    • for FastReg pick a free pool and use it for mapping: ib_map_mr_sg()
    • If you run out of pools then expand the list

Therefore there are two steps to RDMAing data.

  1. Allocating the pools in the device
  2. Mapping the physical memory to RDMA into these pools

Code Flow

/* lnet calls kiblnd_send() as a callback */
kiblnd_send()
	/* for IOV */
	kiblnd_setup_rd_iov()
	/* for KIOV */
	kiblnd_setup_rd_kiov()
	/*
	 * these two functions setup the scatter/gatter list
	 * given the kiov/iov
	 */
		/*
		 * map the scatter/gather list to physical memory pools 
		 * created on startup (look at the parent wiki page for
		 * more details)
		 */
		kiblnd_map_tx()
			kiblnd_fmr_map_tx()
				/* use the pool set for the CPT of the tx pool */
				/* Does the actual work of mapping the memory to be transmitted */
				kiblnd_fmr_pool_map()

Structures

TXMemorPoolStructure

  • No labels