- Created by Amir Shehata, last modified by Serguei Smirnov on Oct 20, 2020
You are viewing an old version of this page. View the current version.
Compare with Current View Page History
« Previous Version 8 Next »
Overview
LNet is a virtual networking layer which allows Lustre nodes to communicate with each other.
System Diagram
- lnetctl: User space utility used to configure and query LNet kernel module
- DLC Library: User space library which communicates with LNet kernel module primarily via IOCTL
- LNet IOCTL: Module which handles the IOCTLs and calls appropriate callbacks in the LNet kernel module
- PTLRPC: Kernel module which implements an RPC protocol. It's the primary user of LNet.
- LNet: Kernel module which implements the Lustre Networking communication protocol
- o2iblnd: Verbs driver. It executes RDMA operations via verbs
- socklnd: TCP/IP driver. It sends/receives TCP messages
- gnilnd: Cray/HPE driver not maintained by us
LNet Block Level Diagram
The diagram above represents the different functional blocks in LNet. A quick overview will help in understanding the code
- When LNet starts up it reads various module parameters and configures itself based on these values.
- Further configuration can be added dynamically via
lnetctl
utility. - The main APIs to request LNet to send messages are
LNetPut()
andLNetGet()
.- When a message is sent a peer block is created to track messages to and from that peer.
- When a message is received a peer block is created to track messages to and from that peer.
- When sending messages LNet has to select the local and remote interfaces (IE the path the message will traverse to reach its destination). It does so through the selection algorithm.
- In that process it selects the local network interfaces and remote network interfaces for the destination peer.
- Each peer has its own set of credits used to rate limit messages to it. LNet checks and manages these credits before sending the message.
- When a message is sent a credit is consumed.
- When a message is received a credit is returned.
- If the destination peer is not on the same network as the node, then lookup a route to the final destination. If no route is present then the message can not be sent.
- If a node is acting as a router, then it can receive messages to which it is not the final destination. It then can forward these messages to the final destination.
- When a received message is to be forwarded then a router buffer is used to receive the message data. Router buffers have their own credits.
- A fault injection module can be activated for testing. That module will simulate message send/receive failures.
Useful Documentation
lustre-release/lnet
- Top LNet director
include
lnet
- Internal includes
uapi
- include used by user space and other kernel modules
klnds
gnilnd
- Cray LNet Driver (LND). Developed and tested by Cray/HPE
o2iblnd
- IB LND used by mellanox and Intel OmniPath. It uses the Verbs API
- Only uses IBoIP for connection establishment
- socklnd
- Socket LND used for ethernet interfaces. It uses TCP/IP
lnet
- LNet kernel source directory
selftest
- LNet selftest tool. Generated RDMA traffic. Runs in knernel
utils
- User space tools including lnetctl and liblnetconfig.
- lnetctl is a CLI used to configure lnet
- liblnetconfig is the library used by lnetctl to communicate with the lnet kernel module
- Multi-Rail Scope and Requirements Document
- A good overview of LNet as a whole
Title | Creator | Modified |
---|---|---|
Multi-Rail Routing | Amir Shehata | Aug 03, 2020 |
LNet Multi-Rail Health | Amir Shehata | Aug 03, 2020 |
Presentations | Amir Shehata | Mar 18, 2020 |
User Defined Selection Policy (UDSP) | Amir Shehata | Nov 29, 2019 |
Multi-Rail Main Feature | Amir Shehata | Nov 29, 2019 |
Multi-Rail Dynamic Discovery | Amir Shehata | Mar 22, 2018 |
Presentations
Conference | Presentation | Video |
---|---|---|
LUG 2014 | ||
OFA 2016 | ||
LUG 2017 | ||
OFA 2017 | ||
LUG 2018 | ||
LUG 2018 | ||
LUG 2019 | ||
OFA 2020 |
Tasks
- LNet Router Testing
- We need to expand our testing of LNet. The link above lists a set of routing tests. We need to write LUTF scripts for them
- Benefits:
- Learn how to configure LNet routers
- Learn how to use the LUTF
- Learn how to test LNet
- Learn the code
- Benefits:
- We need to expand our testing of LNet. The link above lists a set of routing tests. We need to write LUTF scripts for them
- LU-12041
- No labels