Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This document attempts to describe the current socklnd design and some proposed improvements to clean up the design and increase performance.

Overview

Gliffy Diagram
nameLNetSystemDiagram
pageid149883499

LNet uses the LND via a set of APIs defined here

Code Block
struct lnet_lnd {
»·······/* fields initialized by the LND */
»·······__u32»··»·······»·······lnd_type;

»·······int  (*lnd_startup)(struct lnet_ni *ni);
»·······void (*lnd_shutdown)(struct lnet_ni *ni);
»·······int  (*lnd_ctl)(struct lnet_ni *ni, unsigned int cmd, void *arg);

»·······/* In data movement APIs below, payload buffers are described as a set
»······· * of 'niov' fragments which are in pages.
»······· * The LND may NOT overwrite these fragment descriptors.
»······· * An 'offset' and may specify a byte offset within the set of
»······· * fragments to start from
»······· */

»·······/* Start sending a preformatted message.  'private' is NULL for PUT and
»······· * GET messages; otherwise this is a response to an incoming message
»······· * and 'private' is the 'private' passed to lnet_parse().  Return
»······· * non-zero for immediate failure, otherwise complete later with
»······· * lnet_finalize() */
»·······int (*lnd_send)(struct lnet_ni *ni, void *private,
»·······»·······»·······struct lnet_msg *msg);

»·······/* Start receiving 'mlen' bytes of payload data, skipping the following
»······· * 'rlen' - 'mlen' bytes. 'private' is the 'private' passed to
»······· * lnet_parse().  Return non-zero for immedaite failure, otherwise
»······· * complete later with lnet_finalize().  This also gives back a receive
»······· * credit if the LND does flow control. */
»·······int (*lnd_recv)(struct lnet_ni *ni, void *private, struct lnet_msg *msg,
»·······»·······»·······int delayed, unsigned int niov,
»·······»·······»·······struct bio_vec *kiov,
»·······»·······»·······unsigned int offset, unsigned int mlen, unsigned int rlen);

»·······/* lnet_parse() has had to delay processing of this message
»······· * (e.g. waiting for a forwarding buffer or send credits).  Give the
»······· * LND a chance to free urgently needed resources.  If called, return 0
»······· * for success and do NOT give back a receive credit; that has to wait
»······· * until lnd_recv() gets called.  On failure return < 0 and
»······· * release resources; lnd_recv() will not be called. */
»·······int (*lnd_eager_recv)(struct lnet_ni *ni, void *private,
»·······»·······»·······      struct lnet_msg *msg, void **new_privatep);

»·······/* notification of peer down */
»·······void (*lnd_notify_peer_down)(lnet_nid_t peer);

»·······/* accept a new connection */
»·······int (*lnd_accept)(struct lnet_ni *ni, struct socket *sock);
};

These APIs are called from within the context of the ptlrpc threads. In general the LND performs connections, transmits and receives from the context of a pool of threads they create. The threads which do the transmits and receives are affinitized to particular CPU Partitions. The LND requests generated by calling the LND APIs get queued and then processed by one of the LND threads.

Below is an overview of the socklnd design.

Socklnd Overview

Gliffy Diagram
namesocklnd_overview
pagePin10

...

The number of sockets to create are configurable via the typed_conns module parameter. If This this is set to 1 then three sockets will be created:

...

A connection is created per socket. This connection will be is added to the ksock_peer_ni list of connections.

A hello message is sent by the active side of the connection. This hello message contains the list of IP addresses stored in the ksnd_data.ksock_interfaces to which we don't have routes yet. When the passive side receives the hello message it sends its own hello as a response. The active side will receive that hello which contains the list of the remote's peer IP addresses. It will then create additional routes to these interfaces, which we would create on demand when sending messages are sent..

Gliffy Diagram
nameksocknal_passive
pageid149883710

Connection creation management is unduly complex due to TCP bonding. In fact the purpose of the hello message appears to be primarily for passing around the IP addresses of the peer.

...