- Correlation between job stats and LNet stats
- During the DDN sponsor presentation: http://cdn.opensfs.org/wp-content/uploads/2018/04/Buisson-Multitenancy_v1.0.pdf an interesting question came up on how to correlate job stats with LNet stats.
- Currently LNet stats are based on peers and networks, which doesn't give a clear view of what job is triggering what traffic. Some correlations can be surmised from client NIDs, but it might be an interesting task to see how we can create a clear correlation between job stats and LNet stats. I believe that would be a helpful way to understand the operation of the system.
- RDMA issues in mixed mlx4 with Lustre 2.5 + FDR and mlx5 + Lustre 2.10 + EDR clusters
- Talked to a guy from Exxonmobil (http://corporate.exxonmobil.com/) that was trying to use Lustre 2.5 with FDR servers and Lustre 2.10 with EDR clients
- Was having an issue with ECONNABORTS. After some discussions I figured that their LND tunables between the servers and clients are not compatible.
- Suggested that he can adjust the map-on-demand/concurrent peers value and pointed him to

- Their work is quite interesting:
- They collect seismic data.
- Ships cross the ocean dragging long cords behind them. These cords have many sensors attached to them.
- The boats emit a sound wave, which travels to the bottom of the ocean and then bounces off the ocean floor.
- The reflection is captured by the sensors on the cords.
- They have another parallel FS on the ships which is used to store the data collected by the sensors.
- This data is stored on tapes and then flown back to their labs where they are transferred to the Lustre FS and analyzed.
- Router configuration on CEA. Ability to select routers based on policy. Add this to the UDSP project requirements.
- had a discussion with Aurelien Degremont from CEA about how they use LNet routers
- Currently they have multiple clients/servers going through a set of routers. Not all routers provide an optimal path. To get around that they configure their client sets under different o2iblnd network and they have to configure their servers on the same LNets
- They assign routers efficiently to the different networks. But this results in a complex configuration.
- They'd like to be able to configure all clients on one network and all servers on another network (different fabrics) and then use the same pool of routers.
- In order to do that they need to specify a set of policies for router selection.
- This is very similar to the work being done on the UDSP project and can easily be integrated.
- Will be adding that requirement to the project.
- Talked with Sabastien Buisson from DDN after his Multi-Tenancy presentation (link below). While using MR for routing, entire system needs to be configured with MR in order for it to work across routers.
- NASA wants to export a directory from a secure Lustre FS to the public (over the internet). Currently they do that by exporting the directory over nfs to another client which mounts a different Lustre FS.
- Is there a way to do it within lustre securely?
- sub-directory mount
- but it's not secure in case of compromised root client.
- Use V-lans to lock the UUID and GID (as mentioned in the Multi-tenancy presentation above)
- but that doesn't prevent the client from guessing a FID and read a different insecure file.
- using nodemaps to remap GID/UUID
- using root squash?
- According to JohnH there is some work needed on the nodemaps in order to close the hole above.
- Separating Metadata from Data on Metadata (from discussions with Bob Ciotti from NASA and later with Andreas)
- There is a new feature to write data on metadata.
- There is a concern that data on metadata traffic can interfere with metadata traffic
- A suggestion came up to use UDSP to redirect traffic
- problem with that is that MDT might not have multiple interfaces
- And the upper layer must communicate the type of traffic to LNet, which create an unwanted tight coupling between Lustre and LNet
- The real ask is to prioritize one type of traffic over the other. QoS is the best solution for that
- The final solution should look like this:
- Create multiple QPs between peers
- Each QP could be created with different service level
- Will need to look at how we can set the service level directly. Currently we use the port which indirectly changes the service level.
- For more QoS info: Lustre QoS
- Use opensm/opafm to map the service level to a virtual lane.
- Assign each virtual lane a different priority.
- In order to use that solution the users of the LNet API must provide a priority to each request (or set of requests).
- Based on that priority LNet can find a QP with a correlated service level, or create a new one.
- With this solution we can add the ability for lustre to prioritize different types of traffic or even prioritize traffic from different job ids.
- Discussing this with Andreas he pointed me to NRS TBD: https://jira.whamcloud.com/secure/attachment/14201/Lustre%20NRS%20TBF%20documentation%200.1.pdf
- There is already a mechanism to rate limit RPC messages using jobid/NID etc.
- So to start with, the proposal is to use QoS to prioritize traffic based on the portal the traffic is being sent to. Which would provide enough flexibility to assign different priorities to MDT traffic vs DoM traffic.
- I'll be adding this feature on the LNet Project road map.
- Talked with a couple of people from Monash and Swinburne about how they want to use LNet in wide area networks:
- Related presentation: http://cdn.opensfs.org/wp-content/uploads/2018/04/Rao-Wide_Area_Throughput_LNet_Routers_ORNL-1.pdf
- While listening to this presentation, I didn't get the impressions they were using router buffers correctly
- Basically, they are trying to connect two parts of a clusters which are geographically dispersed, Australia and North America, over LNet.
- Mount a Lustre FS over a wide area network
- This is an interesting work to follow to see the performance they get and what type of LNet features can be developed to enhance this use case.
- Talked with Bob Ciotti from NASA, he had an interesting use case that he wanted to see if they can use LNet for

- LNet routing is flexible on their setup, but currently it's only used for Lustre traffic
- He has clients which do IP traffic over IB and he was looking for a way to use LNet routers to route that traffic.
- Currently that's not supported, but does present an interesting use case for LNet outside of Lustre.
- One of the potential future project is to detect multi-hop router failure.
- Was looking at the gossip protocol to do that, but Isaac pointed me to another more efficient protocol called SWIM, which they are going to use in DAOS.
- It's worth tracking this project to see how we can integrate it in LNet
- Talked with Kevin Harms (ANL) regarding a question he asked me after the presentation on whether we're planning to support OFI as well as verbs
- Currently the plans I know is that OPA2 will only support verbs
- Kevin was saying that there was some stability issues with OPA using verbs in other non Lustre workloads.
- Again, this is worth keeping track of to understand what kind of issues and how/if it'll impact LNet.