Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. IB/TCP/GNI re-send timeout
  2. LND transmit timeout
    1. The timeout to wait for before a transmit fails and lnet_finalize() is called with an appropriate error code. This will result in a resend.
  3. Transaction timeout
    1. timeout after which LNet sends a timeout event for a missing REPLY/ACK.
  4. Message timeout
    1. timeout after which LNet abandons resending a message.
  5. Resend interval
    1. The interval between each (re)send procedure.
  6. RPC timeout
    1. The INITIAL_CONNECT_TIMEOUT is set to 5 sec
    2. ldlm_timeout and obd_timeout are tunables and default to LDLM_TIMEOUT_DEFAULT and OBD_TIMEOUT_DEFAULT.
  7. transaction timeout
    1. A PUT or a GET can be sent successfully. LNet needs to wait on the ACK/REPLY respectively.
    2. The transaction timeout defines the amount of time to wait before sending a timeout event upwards.
    3. this value is user specified and defaults to the peer_timeout default (180s)
    4. This value can be overridden by the caller of LNetGet()/LNetPut()

IB/TCP/GNI re-send timeout < LND transmit timeout  < LNet message timeout < LNet transaction timeout < RPC timeout.

A retry count can be specified. That's the number of times to resend after the LND transmit timeout expires.

...

  • refactor lnet_select_pathway() as described above.
  • Health Value Maintenance/Demerit system
  • Selection based on Health Value and not resending over already used interfaces unless non are available.
  • Handling the new events in IBLND and passing them to LNet
  • Handling the new events in SOCKLND and passing them to LNet
  • Adding LNet level transaction timeout (or reuse the peer timeout) and cancelling a resend on timeout
  • Handling timeout case in ptlrpc

Progress

Patches

...

Code Block
LNet Health
	Refactor lnet_select_pathway

...

()

...


	add health value per ni
	add lnet_health_range
	handle local timeouts
		When re-sending a message we don't need to ensure we send to the same peer_ni as the original send. There are two cases to consider:
			MR peer: we can just use the current selection algorithm to resend a message
			Non-MR peer: there will only be on peer_ni anyway and we'll need to use the same local NI when sending to a Non-MR.
	Modify the LNDs to set the appropriate error code on timeout
		handle tx timeout due being stuck on the queues for too long
			Due to local problem.
		At this point we should be able to handle trying different interfaces if there is an interface timeout
		o2iblnd
		socklnd
	Introduce retry_count
		Only resend up to the retry_count
		This should be user configurable
		Should have a max value of 5 retries
	Rate limit resend rate
		Introduce resend_interval
			Make sure to pace out the resends by that interval
		We need to guard against situations where there is an immediate failure which triggers an immediate resend, causing a resend tight loop
	Refactor the router pinger thread to handle resending.
		lnet_finalize() queues those messages on a queue and wakes up the router pinger thread
		router pinger wakes up every second (or if woken up manually) goes through the queue, timesout and fails any messages that have passed their deadline. Checks if a message to be resent is not being resent before its resend interval. Resends any messages that need to be resent.
	Introduce an LND API to read the retransmit timeout.
		Calculate the message timeout as follows:
			message timeout = (retry count * LND transmit timeout) + (resend interval * retry count)
			Message timeout is the timeout by which LNet abandons retransmits
				This implies that LNet has detected some sort of a failure while sending a message
		use the message timeout instead of the peer timeout as the deadline for the message
		If the message timesout a failure event is propagated to the top layer.
		o2iblnd
		socklnd
	handle local NIs down events from the LND.
		NIs are flagged as down and are not considered as part of the selection process.
		Can only come up by another event from the LND.
		o2iblnd
		socklnd
	Move the peer timeout from the LND to the LNet.
		It should still be per NI.
	Add userspace support for setting retry count
	Add userspace support for setting retransmit interval
	Add peer_ni_healthvalue
		This value will reflect the health of the peer_ni and should be initially set the peer credits.
	Modify the selection algorithm to select the peer_ni based on the average of the health value and the credits
	Adjust the peer_ni health value due to failure/successs
		On Success the health value should be incremented if it's not at its maximum value.
		On Failure the health value should be decremented (stays >= 0)
			Failures will either be due to remote tx timeout or network error
	Modify the LNDs to set the appropriate error code on tx timeout
		o2iblnd
		socklnd
	Handle transaction timeout
		Transaction timeout is the deadline by which LNet knows that a PUT or a GET did not receive the ACK or REPLY respectively.
		When a PUT or a GET is sent successfully.
		It is then put on a queue if it expects and ACK or a REPLY
		router pinger will wake up every second and will check if these messages have not received the expected response within the timeout specified. If not then we'll need to time it out.
	Provide a mechanism to over ride the transaction timeout.
		When sending a message the caller of LNetGet()/LNetPut() should specify a timeout for the transaction. If not provided then it defaults to the global transaction timeout.
	Add a transaction timeout even to be send to the upper layer.
	Handle transaction timeout in the upper layer (ptlrpc)
	Add userspace support for maximum transaction timeout
		This was added in 2.11 to solve the blocked mount
	Add the following statistics
		The number of resends due to local tx timeout per local NI
		The number of resends due to the remote tx timeout per peer NI
		The number of resends due to a network timeout per local and peer NI
		The number of local tx timeouts
		The number of remote tx timeouts
		The number of network timeouts
		The number of local interface down events
		The number of local interface up events.
		The average time it takes to successfully send a message per peer NI
		The average time it takes to successfully complete a transaction per peer NI


O2IBLND Detailed Discussion

...