- Created by Amir Shehata, last modified on Oct 16, 2020
You are viewing an old version of this page. View the current version.
Compare with Current View Page History
« Previous Version 3 Next »
Overview
LNet is a virtual networking layer which allows Lustre nodes to communicate with each other.
System Level Overview
System Diagram
- lnetctl: User space utility used to configure and query LNet kernel module
- DLC Library: User space library which communicates with LNet kernel module primarily via IOCTL
- LNet IOCTL: Module which handles the IOCTLs and calls appropriate callbacks in the LNet kernel module
- PTLRPC: Kernel module which implements an RPC protocol. It's the primary user of LNet.
- LNet: Kernel module which implements the Lustre Networking communication protocol
- o2iblnd: Verbs driver. It executes RDMA operations via verbs
- socklnd: TCP/IP driver. It sends/receives TCP messages
- gnilnd: Cray/HPE driver not maintained by us
Block Level Overview
LNet Block Level Diagram
The diagram above represents the different functional blocks in LNet. A quick overview will help in understanding the code
- When LNet starts up it reads various module parameters and configures itself based on these values.
- Further configuration can be added dynamically via
lnetctl
utility. - The main APIs to request LNet to send messages are
LNetPut()
andLNetGet()
.- When a message is sent a peer block is created to track messages to and from that peer.
- When a message is received a peer block is created to track messages to and from that peer.
- When sending messages LNet has to select the local and remote interfaces (IE the path the message will traverse to reach its destination). It does so through the selection algorithm.
- In that process it selects the local network interfaces and remote network interfaces for the destination peer.
- Each peer has its own set of credits used to rate limit messages to it. LNet checks and manages these credits before sending the message.
- When a message is sent a credit is consumed.
- When a message is received a credit is returned.
- If the destination peer is not on the same network as the node, then lookup a route to the final destination. If no route is present then the message can not be sent.
- If a node is acting as a router, then it can receive messages to which it is not the final destination. It then can forward these messages to the final destination.
- When a received message is to be forwarded then a router buffer is used to receive the message data. Router buffers have their own credits.
- A fault injection module can be activated for testing. That module will simulate message send/receive failures.
Useful Documentation
LNet Source Directory
lustre-release/lnet
- Top LNet director
include
lnet
- Internal includes
uapi
- include used by user space and other kernel modules
klnds
gnilnd
- Cray LNet Driver (LND). Developed and tested by Cray/HPE
o2iblnd
- IB LND used by mellanox and Intel OmniPath. It uses the Verbs API
- Only uses IBoIP for connection establishment
- socklnd
- Socket LND used for ethernet interfaces. It uses TCP/IP
lnet
- LNet kernel source directory
selftest
- LNet selftest tool. Generated RDMA traffic. Runs in knernel
utils
- User space tools including lnetctl and liblnetconfig.
- lnetctl is a CLI used to configure lnet
- liblnetconfig is the library used by lnetctl to communicate with the lnet kernel module
General Tips and Tricks
Title | Creator | Modified |
---|---|---|
Useful Links | Serguei Smirnov | Jun 03, 2024 |
Frequently Asked Questions | Amir Shehata | Jun 03, 2024 |
MLX Info and Tips | Amir Shehata | Aug 09, 2023 |
Crash course on Crash | Amir Shehata | Aug 03, 2023 |
MR Cluster Setup | Amir Shehata | May 19, 2023 |
Adhoc Lustre Tips | Amir Shehata | Jan 04, 2023 |
GIT tips | Amir Shehata | Jan 04, 2023 |
Loading hfi1.conf parameters on boot | Amir Shehata | Nov 23, 2022 |
Useful Lustre commands | Amir Shehata | Aug 31, 2022 |
Kernel Debugging Misc | Amir Shehata | Jun 10, 2022 |
Installing MOFED | Amir Shehata | Oct 05, 2021 |
Installing debug symbols on Ubuntu | Amir Shehata | Jun 18, 2021 |
Virsh cheat sheet | Amir Shehata | May 20, 2021 |
Issues to look out for | Amir Shehata | May 04, 2021 |
Kernel GDB live Debugging with KVM | Amir Shehata | Apr 24, 2021 |
Lustre QoS | Amir Shehata | Apr 22, 2021 |
Setting up a Failover Pair with virsh/virt-manager | Amir Shehata | Jul 23, 2020 |
self-test template script | Amir Shehata | Jul 12, 2020 |
Building Lustre | Amir Shehata | Jun 22, 2020 |
Changing and Building the Linux Kernel | Amir Shehata | May 14, 2020 |
Presentations
LUG and OFA presentations
Conference | Presentation | Video |
---|---|---|
LUG 2014 | ||
OFA 2016 | ||
LUG 2017 | ||
OFA 2017 | ||
LUG 2018 | ||
LUG 2018 | ||
LUG 2019 | ||
OFA 2020 |
Tasks
Medium
- LNet Router Testing
- We need to expand our testing of LNet. The link above lists a set of routing tests. We need to write LUTF scripts for them
- Benefits:
- Learn how to configure LNet routers
- Learn how to use the LUTF
- Learn how to test LNet
- Learn the code
- Benefits:
- We need to expand our testing of LNet. The link above lists a set of routing tests. We need to write LUTF scripts for them
- LU-12041
- No labels