Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Purpose

This document attempts to describe the current socklnd design and some proposed improvements to clean up the design and increase performance.

Overview

Gliffy Diagram
nameLNetSystemDiagram
pageid149883499

...

Below is an overview of the socklnd design.

Socklnd Overview

Gliffy Diagram
namesocklnd_overview
pagePin11

Network Interface Management

LNet calls ksock_startup()  on every lnet_ni  created either dynamically when called via lnetctl or via module parameters on initial startup.

...

Scheduler threads are created per CPT and they are intended to serve transmit and receive operations.

Peer and Connection Management

Peer and Route Management

When LNet requests a message to be sent by calling ksocknal_send() ksocklnd  will create a peer if one doesn't exist. ksocklnd identifies a peer via its source and destination NIDs. This lends itself to how Multi-Rail works. At the LNet level then, messages which are going over the local Net to the same peer, can traverse multiple peer_nis  at the ksocklnd  level.

...

Therefore the use of ksock_route is purely to serve the legacy TCP bonding implementation, which has been superseded by the LNet Multi-Rail feature.

Connection Management

Once the routes are created TCP sockets must be created with the remote peer. This is referred to in the code as "connecting routes". The process is triggered by ksocknal_launch_all_connections_locked().

...

Since TCP bonding is now deprecated, this code can be removed, simplifying the over all design of the socklnd code.

Sending Messages

LNet calls ksocknal_send() to send messages. This function will trigger the following steps:

  • Connect any extra routes if they aren't connected
  • If there are no connections, then queue the connection on the peer
  • If there is a connection which exists to the peer, then queue that transmit on that connection
  • A scheduler thread on the specified CPT will then pick up the transmit and send it.
    • The CPT is identified when the connection is created by calling: lnet_cpt_of_nid() providing the peer's NID and the NI associated with the peer.

CPT Confusion?

In lnet_get_best_ni() one of the criteria we use to determine the best NI to send from is NUMA. In that case, we use the MD CPT, since we need to determine the nearest NUMA wise interface to the memory described by that MD. However, the LND scheduler which picks up the transmit and process it, could be associated with a different CPT other than the MD CPT. Since TCP is not a zero copy protocol, would that introduce a performance penalty?

Receiving Messages

When a connection is created a set of callbacks are registered with the socket in the call to ksocknal_lib_set_callback().

...

This queues the connection to receive the data from on the scheduler associated with that thread. The message header is read in first and lnet_parse() is called. lnet_parse() can call into the LND again to receive the payload data.

socklnd  Improvements

Remove TCP Bonding

  • Remove the storage of multiple IP addresses in the ksock_net 
  • Remove all associated managment code of the multiple IP addresses
  • Remove all the route constructs and the code which uses the route constructs
  • Connections should be associated directly with the peer
  • Hello message can be kept for backwards compatibility, however, they will always include only one IP address

Multiple Connections Per Peer

LU-12815 indicates that creating multiple virtual interfaces under the same interface and then grouping these in one LNet in MR configuration increases performance. By creating multiple virtual interfaces the following effects take place:

...