You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

Overview

LNet is a virtual networking layer which allows Lustre nodes to communicate with each other.

System Level Overview

System Diagram

LNetSystemDiagram

  • lnetctl: User space utility used to configure and query LNet kernel module
  • DLC Library: User space library which communicates with LNet kernel module primarily via IOCTL
  • LNet IOCTL: Module which handles the IOCTLs and calls appropriate callbacks in the LNet kernel module
  • PTLRPC: Kernel module which implements an RPC protocol. It's the primary user of LNet.
  • LNet: Kernel module which implements the Lustre Networking communication protocol
  • o2iblnd: Verbs driver. It executes RDMA operations via verbs
  • socklnd: TCP/IP driver. It sends/receives TCP messages
  • gnilnd: Cray/HPE driver not maintained by us
Block Level Overview

LNet Block Level Diagram

LNetBlockDiagram

The diagram above represents the different functional blocks in LNet. A quick overview will help in understanding the code

  • When LNet starts up it reads various module parameters and configures itself based on these values.
  • Further configuration can be added dynamically via lnetctl  utility.
  • The main APIs to request LNet to send messages are LNetPut()  and LNetGet() .
    • When a message is sent a peer block is created to track messages to and from that peer.
    • When a message is received a peer block is created to track messages to and from that peer.
  • When sending messages LNet has to select the local and remote interfaces (IE the path the message will traverse to reach its destination). It does so through the selection algorithm.
    • In that process it selects the local network interfaces and remote network interfaces for the destination peer.
  • Each peer has its own set of credits used to rate limit messages to it. LNet checks and manages these credits before sending the message.
    • When a message is sent a credit is consumed.
    • When a message is received a credit is returned.
  • If the destination peer is not on the same network as the node, then lookup a route to the final destination. If no route is present then the message can not be sent.
  • If a node is acting as a router, then it can receive messages to which it is not the final destination. It then can forward these messages to the final destination.
    • When a received message is to be forwarded then a router buffer is used to receive the message data. Router buffers have their own credits.
  • A fault injection module can be activated for testing. That module will simulate message send/receive failures.

Useful Documentation


LNet Source Directory
  • lustre-release/lnet 
    • Top LNet director
  • include 
    • lnet
      • Internal includes
    • uapi 
      • include used by user space and other kernel modules
  • klnds 
    • gnilnd 
      • Cray LNet Driver (LND). Developed and tested by Cray/HPE
    • o2iblnd 
      • IB LND used by mellanox and Intel OmniPath. It uses the Verbs API
      • Only uses IBoIP for connection establishment
    • socklnd
      • Socket LND used for ethernet interfaces. It uses TCP/IP
  • lnet 
    • LNet kernel source directory
  • selftest 
    • LNet selftest tool. Generated RDMA traffic. Runs in knernel
  • utils 
    • User space tools including lnetctl and liblnetconfig.
    • lnetctl is a CLI used to configure lnet
    • liblnetconfig is the library used by lnetctl to communicate with the lnet kernel module
General Tips and Tricks

TitleCreatorModified
Useful LinksSerguei SmirnovJun 03, 2024
Frequently Asked QuestionsAmir ShehataJun 03, 2024
MLX Info and TipsAmir ShehataAug 09, 2023
Crash course on CrashAmir ShehataAug 03, 2023
MR Cluster SetupAmir ShehataMay 19, 2023
Adhoc Lustre TipsAmir ShehataJan 04, 2023
GIT tipsAmir ShehataJan 04, 2023
Loading hfi1.conf parameters on bootAmir ShehataNov 23, 2022
Useful Lustre commandsAmir ShehataAug 31, 2022
Kernel Debugging MiscAmir ShehataJun 10, 2022
Installing MOFEDAmir ShehataOct 05, 2021
Installing debug symbols on UbuntuAmir ShehataJun 18, 2021
Virsh cheat sheetAmir ShehataMay 20, 2021
Issues to look out forAmir ShehataMay 04, 2021
Kernel GDB live Debugging with KVMAmir ShehataApr 24, 2021
Lustre QoSAmir ShehataApr 22, 2021
Setting up a Failover Pair with virsh/virt-managerAmir ShehataJul 23, 2020
self-test template scriptAmir ShehataJul 12, 2020
Building LustreAmir ShehataJun 22, 2020
Changing and Building the Linux KernelAmir ShehataMay 14, 2020
Mounting Lustre using a File instead of devAmir ShehataMay 06, 2020
IB_WC_WR_FLUSH_ERRAmir ShehataApr 22, 2020
LNet selftestSerguei SmirnovFeb 21, 2020
Creating a merge-commitSonia SharmaApr 23, 2019
Debugging a DeadlockAmir ShehataApr 11, 2019
Site ConfigurationAmir ShehataApr 04, 2019
Use Case for Multi-RailAmir ShehataSep 19, 2018
Installing IFS package for OPAAmir ShehataSep 13, 2017
OPA Performance ConfigurationAmir ShehataJun 28, 2017
proc files of interestAmir ShehataMay 24, 2017
Recommended Development EnvironmentAmir ShehataMay 15, 2017

Presentations


Tasks


Medium
  • LNet Router Testing
    • We need to expand our testing of LNet. The link above lists a set of routing tests. We need to write LUTF scripts for them
      • Benefits:
        • Learn how to configure LNet routers
        • Learn how to use the LUTF
        • Learn how to test LNet
        • Learn the code
  • LU-12041
  • No labels