This document attempts to describe the current socklnd design and some proposed improvements to clean up the design and increase performance.
![]()
LNet calls ksock_startup() on every lnet_ni created either dynamically when called via lnetctl or via module parameters on initial startup.
A ksock_net block is created and assigned to the lnet_ni.ni_data field. On all APIs call from LNet to the socklnd this field is used to pull up the ksock_net.
Of particular interest is the ksock_net.ksock_interface. This is an array of LNET_INTERFACES_NUM length. This is so because of the legacy tcp bonding feature. There could be multiple interfaces assigned to one ksock_net. However, since Multi-Rail feature manages the multiple network interfaces per network, there is no need to continue supporting tcp bonding.
Once a ksock_net block is created it's added on the global network list, ksocknal_data.ksnd_nets.
This list is traversed when adding a new network. If the interface being added is already being used by one of the configured networks, then we do not need to create a set of scheduler threads. However, if it's a new interface then we'll increase the number of scheduler threads, as long as we stay below the maximum number of configured scheduler interfaces. This is so we can aid in processing the transmits and receives on the new interface.
Scheduler threads are created per CPT and they are intended to serve transmit and receive operations.
When LNet requests a message to be sent by calling ksocknal_send() ksocklnd will create a peer if one doesn't exist. ksocklnd identifies a peer via its source and destination NIDs. This lends itself to how Multi-Rail works. At the LNet level then, messages which are going over the local Net to the same peer, can traverse multiple peer_nis at the ksocklnd level.
When a transmit is initially launched then a peer is created if none exist. The following steps take place:
ksock_peer_ni block is created and initializeksock_route block is created and initialized with the IP address and portNotes on routes and their use:
These restrictions defined by the code means there could exist only one route between one specific local interface and one peer.
With legacy TCP bonding there could exist multiple routes between each interface stored in the ksock_net.ksock_interface array and the peer_ni.
Therefore the use of ksock_route is purely to serve the legacy TCP bonding implementation, which has been superseded by the LNet Multi-Rail feature.
Once the routes are created TCP sockets must be created with the remote peer. This is referred to in the code as "connecting routes". The process is triggered by ksocknal_launch_all_connections_locked().
The route is placed on the ksocknal_data.ksnd_connd_routes queue. One of the connd threads then picks that up and starts the actual connection procedure by calling ksocknal_connect().
The number of sockets to create are configurable via the typed_conns module parameter. If This is set to 1 then three sockets will be created:
A connection is created per socket. This connection will be added to the ksock_peer_ni list of connections.
A hello message is sent by the active side of the connection. This hello message contains the list of IP addresses stored in the ksnd_data.ksock_interfaces to which we don't have routes yet. When the passive side receives the hello message it sends its own hello as a response. The active side will receive that hello which contains the list of the remote's peer IP addresses. It will then create additional routes to these interfaces, which we would create on demand when sending messages are sent.
Connection creation management is unduly complex due to TCP bonding. In fact the purpose of the hello message appears to be primarily for passing around the IP addresses of the peer.
Since TCP bonding is now deprecated, this code can be removed, simplifying the over all design of the socklnd code.
LNet calls ksocknal_send() to send messages. This function will trigger the following steps:
lnet_cpt_of_nid() providing the peer's NID and the NI associated with the peer.In lnet_get_best_ni() one of the criteria we use to determine the best NI to send from is NUMA. In that case, we use the MD CPT, since we need to determine the nearest NUMA wise interface to the memory described by that MD. However, the LND scheduler which picks up the transmit and process it, could be associated with a different CPT other than the MD CPT. Since TCP is not a zero copy protocol, would that introduce a performance penalty?
When a connection is created a set of callbacks are registered with the socket in the call to ksocknal_lib_set_callback().
When there is data ready to be received ksocknal_data_ready()-> ksocknal_read_callback()
This queues the connection to receive the data from on the scheduler associated with that thread. The message header is read in first and lnet_parse() is called. lnet_parse() can call into the LND again to receive the payload data.
socklnd Improvementsksock_net LU-12815 indicates that creating multiple virtual interfaces under the same interface and then grouping these in one LNet in MR configuration increases performance. By creating multiple virtual interfaces the following effects take place:
It would be better to make these benefits available without adding more configuration complexity from the user side. To do that we can increase the number of connections by controlling it via the conns_per_peer module parameter.
ksock_conn to ksock_socket_connksock_conn to encapsulate up to 3 ksock_socket_conn data structures. Each ksock_socket_conn would be the same as what is current day ksock_conn ; they describe a socket connection to the peer.ksock_peer should include a linked list of ksock_conn . The number of ksock_conn created will be controled by conns_per_peer module parameterconns_per_peer dynamically configurable