You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Queue Depth and Credit Management Logic

  • When a connection is established on the passive or active side conn→ibc_credtis is set to the negotiated queue depth
    • For the passive side this happens in kiblnd_passive_connect()
    • For the active side this happens in kiblnd_check_connreply()
  • When posting a transmit, kiblnd_post_tx_locked() is passed a credit value.
    • The credit value depends on the message. It's either 0 or 1.
    • IBLND_MSG_PUT_NAK, IBLND_MSG_PUT_ACK, IBLND_MSG_PUT_DONE, IBLND_MSG_GET_DONE, IBLND_MSG_NOOP (for v2 protocol) require no credits
    • All other messages consume 1 credit.
    • If we fail to post then we return the credit to the connection

 

  • In kiblnd_handl_rx() if msg→ibm_credits is not == 0, then we add these credits to the connection and attempt to send queued messages.
  • After posting the rx we increment ibc_outstanding_credits by one if need be. This indicates the credits we are returning to the other side.
  • If we have any messages to tx over the connection, then we return these credits (ibc_outstanding_credits) to the peer.
  • These credits are then added to the peer's conn→ibc_credits and used to send further messages.

 

  • It is expected that the credits will remain <= ibc_queue_depth
    • Unfortunately, the way it's been implemented the code is not clear about the connection between conn→ibc_credits, conn→ibc_queue_depth, and conn→ibc_outstanding_credits.
    • Wouldn't make sense to simply make sure that the number of credits on the connection is consistent, instead of passing back and forth the number of credits to accumulate? (Needs some thinking)

Comments on Alexy's changes

Recently Alexy submitted the following patch:

https://review.whamcloud.com/#/c/28850

This changes the IBLND_SEND_WRS as follows:

# original code
#define IBLND_SEND_WRS(c)	\
	((c->ibc_max_frags + 1) * kiblnd_concurrent_sends(c->ibc_version, \
							  c->ibc_peer->ibp_ni))
 
# modified code
#define IBLND_SEND_WRS(c)	\
	((c->ibc_max_frags + 1) * c->ibc_queue_depth)

The assumption that's being made here is that it is enough to set the init_qp_attr→cap.max_send_wr to the negotiated number of fragments * the queue depth. This is the wrong assumption, since the WRS are actually consumed per fragment you're sending over, not per ib_post_send(). This might work for fastreg or FMR because they only have one fragment. And with the latest changes for FMR that is no longer true.

However, we still need to support the global memory region case, where we can have up to 256 fragments to send over and we can send up to kiblnd_concurrent_send() at the same time, therefore if we make the change as noted above, we'll end up overflowing the Work Request Queue pretty quickly.

Furthermore, the change in kiblnd_create_conn() to reduce both the init_qp_attr→cap.max_send_wr and init_qp_attr→cap.max_recv_wr is problematic because we end up reducing a negotiated value. This will cause a disconnect between the negotiated value and what will be used for the queue pair being created. I'm unsure what impact that'll have, or if we open ourselves to corner cases we have not considered.

Conclusion

Some of Alexy's patches are useful, but the above patch and https://review.whamcloud.com/#/c/28851/5 are problematic because of the reasons I outlined below

Tasks

  • No labels