Purpose
This document attempts to describe the current socklnd design and some proposed improvements to clean up the design and increase performance.
Overview
| Gliffy Diagram | ||||||
|---|---|---|---|---|---|---|
|
...
Below is an overview of the socklnd design.
Socklnd Overview
| Gliffy Diagram | ||||
|---|---|---|---|---|
|
Network Interface Management
LNet calls ksock_startup() on every lnet_ni created either dynamically when called via lnetctl or via module parameters on initial startup.
...
Scheduler threads are created per CPT and they are intended to serve transmit and receive operations.
Peer and Connection Management
Peer and Route Management
When LNet requests a message to be sent by calling ksocknal_send() ksocklnd will create a peer if one doesn't exist. ksocklnd identifies a peer via its source and destination NIDs. This lends itself to how Multi-Rail works. At the LNet level then, messages which are going over the local Net to the same peer, can traverse multiple peer_nis at the ksocklnd level.
...
Therefore the use of ksock_route is purely to serve the legacy TCP bonding implementation, which has been superseded by the LNet Multi-Rail feature.
Connection Management
Once the routes are created TCP sockets must be created with the remote peer. This is referred to in the code as "connecting routes". The process is triggered by ksocknal_launch_all_connections_locked().
...
Since TCP bonding is now deprecated, this code can be removed, simplifying the over all design of the socklnd code.
Sending Messages
LNet calls ksocknal_send() to send messages. This function will trigger the following steps:
- Connect any extra routes if they aren't connected
- If there are no connections, then queue the connection on the peer
- If there is a connection which exists to the peer, then queue that transmit on that connection
- A scheduler thread on the specified CPT will then pick up the transmit and send it.
- The CPT is identified when the connection is created by calling:
lnet_cpt_of_nid()providing the peer's NID and the NI associated with the peer.
- The CPT is identified when the connection is created by calling:
CPT Confusion?
In lnet_get_best_ni() one of the criteria we use to determine the best NI to send from is NUMA. In that case, we use the MD CPT, since we need to determine the nearest NUMA wise interface to the memory described by that MD. However, the LND scheduler which picks up the transmit and process it, could be associated with a different CPT other than the MD CPT. Since TCP is not a zero copy protocol, would that introduce a performance penalty?
Receiving Messages
When a connection is created a set of callbacks are registered with the socket in the call to ksocknal_lib_set_callback().
...
This queues the connection to receive the data from on the scheduler associated with that thread. The message header is read in first and lnet_parse() is called. lnet_parse() can call into the LND again to receive the payload data.
socklnd Improvements
Remove TCP Bonding
- Remove the storage of multiple IP addresses in the
ksock_net - Remove all associated managment code of the multiple IP addresses
- Remove all the route constructs and the code which uses the route constructs
- Connections should be associated directly with the peer
- Hello message can be kept for backwards compatibility, however, they will always include only one IP address
Multiple Connections Per Peer
LU-12815 indicates that creating multiple virtual interfaces under the same interface and then grouping these in one LNet in MR configuration increases performance. By creating multiple virtual interfaces the following effects take place:
...