Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
static int lnet_handle_send_failure_locked(msg, local_nid, status)
{
	switch (status)
...
	case LNET_PEER_NI_ADDR_ERROR:
		lpni->stats.stat_addr_err++
		goto peer_ni_resend
		break
	case LNET_PEER_NI_UNREACHABLE:
		lpni->stats.stat_unreacheable++
		goto peer_ni_resend
		break
	case LNET_PEER_NI_CONNECT_ERROR:
		lpni->stats.stat_connect_err++
		goto peer_ni_resend
		break
	case LNET_PEER_NI_CONNECTION_REJECTED:
		lpni->stats.stat_connect_rej++
		goto peer_ni_resend
		break
	default:
		/* unexpected failure. failing message */
		return
 
peer_ni_resend
	lnet_send(msg, src_nid)
}

...

  • find peer_ni using dst_nid (non-MR, so this is the only peer_ni candidate)
    • no issue if peer_ni is healthy
    • try this peer_ni even if it is unhealthy if this is the 1st attempt to send this message
    • fail if resending to an unhealthy peer_ni
  • pick the preferred local_NI for this peer_ni if set
    • If the preferred local_NI is not healthy, then find a healty local NI and set it to be the preferred local_NI for this NMR peer.TODO: What are the impact of switching the preferred NI?
    • The NMR peer might think that this message is coming from a different peer. Would that lead to the failure of the RPC message?
    • IF so, should we fail the re- sending the message if the preferred NI is set and not healthy?and let the upper layers deal with recovery.
    • otherwise if preferred local_NI is not set, then pick a healthy local NI and make it the preferred NI for this peer_ni
  • send over this path

...

  • find route to dst_nid
  • find peer_ni of router
    • no issue if peer_ni is healthy
    • try this peer_ni even if it is unhealthy if this is the 1st attempt to send this message
    • fail if resending to an unhealthy peer_ni
  • pick the preferred local_NI for the dst_nid if set
    • TODO: Same question as above.
    • If the preferred local_NI is not healthy, fail sending the message and let the upper layers deal with recovery.
    • otherwise if preferred local_NI is not set, then otherwise pick a healthy local NI and make it the preferred NI for this dstpeer_nidni
  • send over this path

Work Items

...