Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

IB QoS

In standard IB terms, VL stands for Virtual Lane and SL for Service Level.

The the way to specify VL for o2iblnd would be:

  1. use the ko2iblnd 'service' option, which indrectly sets the IB SL, i.e. service level
    1. configure OpenSM or whatever SM, to map the o2iblnd SL into VL

    This has been working for years. LLNL did several talks in LUGs about how to do Lustre QoS on IB which involves using both SL->VL mappings and somethings just use the server IP to distinguish Lustre traffic

    The o2iblnd SL is set by the OFED RDMA CM, indirectly based on the o2iblnd service port (set via ko2iblnd option 'service', 987 by default) and its port space (RDMA_PS_TCP).

    QoS configuration on Infiniband

      1. The service_id is calculated based on the port space and service port used as the destination port
        1. The service port when connecting to a peer:
          1. Code Block
            1240 static void
            1241 kiblnd_connect_peer(struct kib_peer *peer)
            1242 {
            ...
            1269 »·······memset(&dstaddr, 0, sizeof(dstaddr));
            1270 »·······dstaddr.sin_family = AF_INET;
            1271 »·······dstaddr.sin_port = htons(*kiblnd_tunables.kib_service);
            1272 »·······dstaddr.sin_addr.s_addr = htonl(LNET_NIDADDR(peer->ibp_nid));
            1273 
            1274 »·······kiblnd_peer_addref(peer);»······       /* cmid's ref */
            1275 
            1276 »·······if (*kiblnd_tunables.kib_use_priv_port) {
            1277 »·······»·······rc = kiblnd_resolve_addr(cmid, &srcaddr, &dstaddr,
            1278 »·······»·······»·······»·······»······· *kiblnd_tunables.kib_timeout * 1000);
            1279 »·······} else {
            1280 »·······»·······rc = rdma_resolve_addr(cmid,
            1281 »·······»·······»·······»·······       (struct sockaddr *)&srcaddr,
            1282 »·······»·······»·······»·······       (struct sockaddr *)&dstaddr,
            1283 »·······»·······»·······»·······       *kiblnd_tunables.kib_timeout * 1000);
            1284 »·······}
            ...
            1306 }
        2. When the address is resolved successfully then rdma_resolve_route() is called
          1. Code Block
            rdma_resolve_route() -> cma_resolve_ib_route() -> cma_query_ib_route()
          2. Code Block
            path_rec.service_id = rdma_get_service_id(&id_priv->id,
            »·······»·······»·······»·······»·······  cma_dst_addr(id_priv));
             
            2031 __be64 rdma_get_service_id(struct rdma_cm_id *id, struct sockaddr *addr)                                                       
            2032 {                                                                                                                              
            2033 »·······if (addr->sa_family == AF_IB)                                                                                          
            2034 »·······»·······return ((struct sockaddr_ib *) addr)->sib_sid;                                                                 
            2035                                                                                                                                
            2036 »·······return cpu_to_be64(((u64)id->ps << 16) + be16_to_cpu(cma_port(addr)));                                                 
            2037 }                                                                                                                              
            2038 EXPORT_SYMBOL(rdma_get_service_id);
             
            /*
             * The port space is: RDMA_PS_TCP
             */
             67 enum rdma_port_space {                                                                                                          
             68 »·······RDMA_PS_SDP   = 0x0001,                                                                                                 
             69 »·······RDMA_PS_IPOIB = 0x0002,                                                                                                 
             70 »·······RDMA_PS_IB    = 0x013F,                                                                                                 
             71 »·······RDMA_PS_TCP   = 0x0106,                                                                                                 
             72 »·······RDMA_PS_UDP   = 0x0111,                                                                                                 
             73 }; 
             
            /* The service ID == 0x010603DB */
    1. configure OpenSM to match the service ID with the a specific QoS policy. From the OpenSM tree: doc/QoS_management_in_OpenSM.txt
      1. Code Block
        # in /etc/opensm/qos-policy.conf
         
            qos-levels
        
                # Having a QoS Level named "DEFAULT" is a must - it is applied to
                # PR/MPR requests that didn't match any of the matching rules.
                qos-level
                    name: DEFAULT
                    use: default QoS Level
                    sl: 0
                end-qos-level
        
                # the whole set: SL, MTU-Limit, Rate-Limit, PKey, Packet Lifetime
                qos-level
                    name: LustreTraffic
                    sl: 1
                end-qos-level
        
            end-qos-levels
        
        
            # Match rules are scanned in order of their apperance in the policy file.
            # First matched rule takes precedence.
            qos-match-rules
        
                # show matching by destination group and service id
                qos-match-rule
                    service-id: 0x010603db
                    qos-level-name: LustreTraffic
                end-qos-match-rule
        
            end-qos-match-rules
        
        

        The above qos-policy matches any connections with service-id 0x010603db to service level 1.

        1. This has been verified by printing out the service_id. Note it's being printed in network byte order:
          1. Code Block
            (o2iblnd_cb.c:851:kiblnd_post_tx_locked())
            	192.168.2.2@o2ib: cmid service_id = 0x010603db,
                sl = 0 ps = 0x106 mtu = 5 rate = 16·
    2. The SL2VL and VLArb tables should be configured in: /etc/opensm/opensm.conf. Refer to the SL2VL Mapping and VL Arbitration section in doc/QoS_management_in_OpenSM.txt for more details.

    OPA QoS

    OPA performs similar functions to what's described above but through configuring the opa-fm.

    For more details refer to: https://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_OP_FabricSuite_Fabric_Manager_UG_H76468_v1_0.pdf section 4.

    Overview

    Configuring opa-fm is done through /etc/sysconfig/opa-fm.xml.

    The configuration of vFabrics consists of the following sections:

    1. Applications - describes applications that can run on one or more end nodes
    2. DeviceGroups - describes a set of end nodes in the fabric
    3. VirtualFabrics - defines a vFabric consisting of a group of applications, a set of devices, and the operating parameters for the vFabric

    Traffic can be matched to a specific application via the Service-ID.

    The application is defined within a Virtual Fabric.

    A QoS policy can be defined within a Virtual Fabric

    Example

    Code Block
    <VirtualFabric>
    <Name>Storage</Name>
    <Enable>1</Enable>
    <Security>1</Security>
    <QOS>1</QOS>
    <Bandwidth>20%</Bandwidth>
    <!--  <Member>group_with_storage_targets</Member>  -->
    <LimitedMember>All</LimitedMember>
    <Application>Storage</Application>
    
    <Application>
    <Name>Storage</Name>
    <IncludeApplication>iSER</IncludeApplication>
    <IncludeApplication>Lustre</IncludeApplication>
    </Application>
    
    <Application>
    <Name>Lustre</Name>
    <!--
     ServiceID = 0x000000000106XXXX where XXXX is socket "port number" 
    -->
    <!--
     <ServiceIDRange>0x0000000001060000-0x000000000106FFFF</ServiceIDRange> 
    -->
    <!--  Default port 987 = 0x03DB  -->
    <ServiceID>0x00000000010603DB</ServiceID>
    </Application>
    
    

    Here is a sample opafm.xml file

    QoS for RoCE

    https://community.mellanox.com/docs/DOC-2894

    Resources

    ...