Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  • Correlation between job stats and LNet stats
  • RDMA issues in mixed mlx4 with Lustre 2.5 + FDR and mlx5 + Lustre 2.10 + EDR clusters
  • Router configuration on CEA. Ability to select routers based on policy. Add this to the UDSP project requirements.
  • MR routing, entire system needs to be configured with MR in order for it to work across routers.
    • I thought that the selection algorithm should care about the next hop. So we need to look into that.
  • NASA wants to export a directory from a secure Lustre FS to the public (over the internet). Currently they do that by exporting the directory over nfs to another client which mounts a different Lustre FS.
    • Is there a way to do it within lustre securely
    • sub-directory mount
    • but it's not secure in case of compromised root client.
    • Use V-lans to lock the UUID and GID
    • but that doesn't prevent the client from guessing a FID and read a different insecure file.
    • using nodemaps to remap GID/UUID
    • using root squash.
    • According to JohnH there is some work needed on the nodemaps in order to close the hole above.
  • Separating Metadata from Data on Metadata (from discussions with Bob Ciotti from NASA)
    • There is a new feature to write data on metadata.
    • There is a concern that data on metadata traffic can interfere with metadata traffic
    • A suggestion came up to use UDSP to redirect traffic
      • problem with that is that MDT might not have multiple interfaces
    • The real ask is to prioritize one type of traffic over the other. QoS is the best solution for that
    • The final solution should look like this:
      • Create multiple QPs with the MDT
      • Each QP would have a different service level
      • Use opensm to map the service level to a virtual lane.
      • Assign each virtual lane a different priority.
    • In order to use that solution the users of the LNet API must provide a priority to each request.
    • Based on that priority LNet can find a QP with a correlated service level.
    • With this solution we can add the ability for lustre to prioritize different types of traffic or even prioritize traffic from different job ids.