...
- IB/TCP/GNI re-send timeout
- LND transmit timeout
- The timeout to wait for before a transmit fails and
lnet_finalize()
is called with an appropriate error code. This will result in a resend.
- The timeout to wait for before a transmit fails and
- Transaction timeout
- timeout after which LNet sends a timeout event for a missing REPLY/ACK.
- Message timeout
- timeout after which LNet abandons resending a message.
- Resend interval
- The interval between each (re)send procedure.
- RPC timeout
- The
INITIAL_CONNECT_TIMEOUT
is set to 5 sec - ldlm_timeout and obd_timeout are tunables and default to
LDLM_TIMEOUT_DEFAULT
andOBD_TIMEOUT_DEFAULT
.
- The
- transaction timeout
- A PUT or a GET can be sent successfully. LNet needs to wait on the ACK/REPLY respectively.
- The transaction timeout defines the amount of time to wait before sending a timeout event upwards.
- this value is user specified and defaults to the peer_timeout default (180s)
- This value can be overridden by the caller of LNetGet()/LNetPut()
IB/TCP/GNI re-send timeout < LND transmit timeout < LNet message timeout < LNet transaction timeout < RPC timeout.
A retry count can be specified. That's the number of times to resend after the LND transmit timeout expires.
...
- refactor lnet_select_pathway() as described above.
- Health Value Maintenance/Demerit system
- Selection based on Health Value and not resending over already used interfaces unless non are available.
- Handling the new events in IBLND and passing them to LNet
- Handling the new events in SOCKLND and passing them to LNet
- Adding LNet level transaction timeout (or reuse the peer timeout) and cancelling a resend on timeout
- Handling timeout case in ptlrpc
Progress
Patches
...
Code Block |
---|
LNet Health Refactor lnet_select_pathway |
...
() |
...
add health value per ni
add lnet_health_range
handle local timeouts
When re-sending a message we don't need to ensure we send to the same peer_ni as the original send. There are two cases to consider:
MR peer: we can just use the current selection algorithm to resend a message
Non-MR peer: there will only be on peer_ni anyway and we'll need to use the same local NI when sending to a Non-MR.
Modify the LNDs to set the appropriate error code on timeout
handle tx timeout due being stuck on the queues for too long
Due to local problem.
At this point we should be able to handle trying different interfaces if there is an interface timeout
o2iblnd
socklnd
Introduce retry_count
Only resend up to the retry_count
This should be user configurable
Should have a max value of 5 retries
Rate limit resend rate
Introduce resend_interval
Make sure to pace out the resends by that interval
We need to guard against situations where there is an immediate failure which triggers an immediate resend, causing a resend tight loop
Refactor the router pinger thread to handle resending.
lnet_finalize() queues those messages on a queue and wakes up the router pinger thread
router pinger wakes up every second (or if woken up manually) goes through the queue, timesout and fails any messages that have passed their deadline. Checks if a message to be resent is not being resent before its resend interval. Resends any messages that need to be resent.
Introduce an LND API to read the retransmit timeout.
Calculate the message timeout as follows:
message timeout = (retry count * LND transmit timeout) + (resend interval * retry count)
Message timeout is the timeout by which LNet abandons retransmits
This implies that LNet has detected some sort of a failure while sending a message
use the message timeout instead of the peer timeout as the deadline for the message
If the message timesout a failure event is propagated to the top layer.
o2iblnd
socklnd
handle local NIs down events from the LND.
NIs are flagged as down and are not considered as part of the selection process.
Can only come up by another event from the LND.
o2iblnd
socklnd
Move the peer timeout from the LND to the LNet.
It should still be per NI.
Add userspace support for setting retry count
Add userspace support for setting retransmit interval
Add peer_ni_healthvalue
This value will reflect the health of the peer_ni and should be initially set the peer credits.
Modify the selection algorithm to select the peer_ni based on the average of the health value and the credits
Adjust the peer_ni health value due to failure/successs
On Success the health value should be incremented if it's not at its maximum value.
On Failure the health value should be decremented (stays >= 0)
Failures will either be due to remote tx timeout or network error
Modify the LNDs to set the appropriate error code on tx timeout
o2iblnd
socklnd
Handle transaction timeout
Transaction timeout is the deadline by which LNet knows that a PUT or a GET did not receive the ACK or REPLY respectively.
When a PUT or a GET is sent successfully.
It is then put on a queue if it expects and ACK or a REPLY
router pinger will wake up every second and will check if these messages have not received the expected response within the timeout specified. If not then we'll need to time it out.
Provide a mechanism to over ride the transaction timeout.
When sending a message the caller of LNetGet()/LNetPut() should specify a timeout for the transaction. If not provided then it defaults to the global transaction timeout.
Add a transaction timeout even to be send to the upper layer.
Handle transaction timeout in the upper layer (ptlrpc)
Add userspace support for maximum transaction timeout
This was added in 2.11 to solve the blocked mount
Add the following statistics
The number of resends due to local tx timeout per local NI
The number of resends due to the remote tx timeout per peer NI
The number of resends due to a network timeout per local and peer NI
The number of local tx timeouts
The number of remote tx timeouts
The number of network timeouts
The number of local interface down events
The number of local interface up events.
The average time it takes to successfully send a message per peer NI
The average time it takes to successfully complete a transaction per peer NI |
O2IBLND Detailed Discussion
...