Credit Management Logic

When a connection is established on the passive or active side conn→ibc_credits is set to the negotiated queue depth
- For the passive side this happens in kiblnd_passive_connect()
- For the active side this happens in kiblnd_check_connreply()
When posting a transmit, kiblnd_post_tx_locked() is passed a credit value.
- The credit value depends on the message. It's either 0 or 1.
- IBLND_MSG_PUT_NAK, IBLND_MSG_PUT_ACK, IBLND_MSG_PUT_DONE, IBLND_MSG_GET_DONE, IBLND_MSG_NOOP (for v2 protocol) require no credits
- All other messages consume 1 credit.
- If the message consumes a credit then subtract that from con->ibc_credits;
- If we fail to post the message we return the credit to the connection
- When packing the message msg→ibm_credits is set to conn→ibc_outstanding_credits ( conn→ibc_outstanding_credits is described below)
When handling a receive in kiblnd_handle_rx(), depending on the message we'll either increment conn→ibc_outstanding_credits or conn→ibc_reserved_credits by 1 after we call ib_post_recv() to receive buffers.
- The ibc_reserved_credits is used to transfer messaged for the ibc_tx_queue_rsrved queue to to the ibc_tx_queue for sending.
When it comes time to post a transmit, then as shown above, the ibc_outstanding_credits is assigned to msg→ibm_credits and sent to the peer

The general effect of that algorithm

When posting a transmit the connection credits is reduced by 1
When handling a message if the transmit is returning any credits then we add them back to the connection credits and attempt to send outstanding messages, which takes us to (1)
When we post buffers to receive we increment the ibc_outstanding_credits. This is passed in the next transmit to the peer. That takes us back to (2)

The underlying assumption in the connection management algorithm is that both sides are exchanging messages. If there is a change in the call flow where one side simply sends events with the other side not responding using IMMEDIATE messages, the initiating side will run out of credits and will be stuck since none of the credits are being returned.

It might be better to return the credit on an IMMEDIATE message once the tx is completed. When receiving immediate message do not increment ibc_outstanding_credits.

Queue Depth Negotiation

Queue depth is negotiated as follows:

active creates its qp and sends it's queue depth to the passive
passive creates its end of the qp and then sends back its own queue depth which should be <= of the active's
active receive's the passive's queue depth and sets that to the connection queue depth and credits. Now both ends have the same queue depth and starting credits.

Patch https://review.whamcloud.com/#/c/28850/5 leverages this algorithm by decreasing the active's queue depth when attempting to create the qp, then sending the adjusted queue depth to the passive. The passive creates its connection structure and reduces the queue depth if necessary, then sends it back to the active, which uses that as the connection queue depth and credits.

Concurrent Sends

Concurrent sends were intended to limit the maximum number of in-flight transfers for the entire system. However, we were multiplying the max_send_wr with concurrent sends which implied that it's per connection, which is not true.

It is better to remove concurrent sends tunable completely as that will simplify the code and instead rely on the queue depth to limit the in-flight transfers per connection.

The jury is still up on this change. It needs to be tested in the filed to see if it'll have a negative impact on performance.

Patches

https://review.whamcloud.com/28279 LU-9810 lnd: use less CQ entries for each connection
https://review.whamcloud.com/29995 LU-10129 lnd: rework map_on_demand behavior
https://review.whamcloud.com/30309 LU-10129 lnd: set device capabilities
https://review.whamcloud.com/30310 LU-10213 lnd: calculate qp max_send_wrs properly
https://review.whamcloud.com/30311 LU-9943 lnd: correct WR fast reg accounting
https://review.whamcloud.com/30312 LU-10291 lnd: remove concurrent_sends tunable