IB QoS

In standard IB terms, VL stands for Virtual Lane and SL for Service Level.

The way to specify VL for o2iblnd would be:

  1. use the ko2iblnd 'service' option, which indrectly sets the IB SL, i.e. service level
    1. The service_id is calculated based on the port space and service port used as the destination port
      1. The service port when connecting to a peer:
        1. 1240 static void
          1241 kiblnd_connect_peer(struct kib_peer *peer)
          1242 {
          ...
          1269 »·······memset(&dstaddr, 0, sizeof(dstaddr));
          1270 »·······dstaddr.sin_family = AF_INET;
          1271 »·······dstaddr.sin_port = htons(*kiblnd_tunables.kib_service);
          1272 »·······dstaddr.sin_addr.s_addr = htonl(LNET_NIDADDR(peer->ibp_nid));
          1273 
          1274 »·······kiblnd_peer_addref(peer);»······       /* cmid's ref */
          1275 
          1276 »·······if (*kiblnd_tunables.kib_use_priv_port) {
          1277 »·······»·······rc = kiblnd_resolve_addr(cmid, &srcaddr, &dstaddr,
          1278 »·······»·······»·······»·······»······· *kiblnd_tunables.kib_timeout * 1000);
          1279 »·······} else {
          1280 »·······»·······rc = rdma_resolve_addr(cmid,
          1281 »·······»·······»·······»·······       (struct sockaddr *)&srcaddr,
          1282 »·······»·······»·······»·······       (struct sockaddr *)&dstaddr,
          1283 »·······»·······»·······»·······       *kiblnd_tunables.kib_timeout * 1000);
          1284 »·······}
          ...
          1306 }
      2. When the address is resolved successfully then rdma_resolve_route() is called
        1. rdma_resolve_route() -> cma_resolve_ib_route() -> cma_query_ib_route()
        2. path_rec.service_id = rdma_get_service_id(&id_priv->id,
          »·······»·······»·······»·······»·······  cma_dst_addr(id_priv));
           
          2031 __be64 rdma_get_service_id(struct rdma_cm_id *id, struct sockaddr *addr)                                                       
          2032 {                                                                                                                              
          2033 »·······if (addr->sa_family == AF_IB)                                                                                          
          2034 »·······»·······return ((struct sockaddr_ib *) addr)->sib_sid;                                                                 
          2035                                                                                                                                
          2036 »·······return cpu_to_be64(((u64)id->ps << 16) + be16_to_cpu(cma_port(addr)));                                                 
          2037 }                                                                                                                              
          2038 EXPORT_SYMBOL(rdma_get_service_id);
           
          /*
           * The port space is: RDMA_PS_TCP
           */
           67 enum rdma_port_space {                                                                                                          
           68 »·······RDMA_PS_SDP   = 0x0001,                                                                                                 
           69 »·······RDMA_PS_IPOIB = 0x0002,                                                                                                 
           70 »·······RDMA_PS_IB    = 0x013F,                                                                                                 
           71 »·······RDMA_PS_TCP   = 0x0106,                                                                                                 
           72 »·······RDMA_PS_UDP   = 0x0111,                                                                                                 
           73 }; 
           
          /* The service ID == 0x010603DB */
  2. configure OpenSM to match the service ID with the a specific QoS policy. From the OpenSM tree: doc/QoS_management_in_OpenSM.txt
    1. # in /etc/opensm/qos-policy.conf
       
          qos-levels
      
              # Having a QoS Level named "DEFAULT" is a must - it is applied to
              # PR/MPR requests that didn't match any of the matching rules.
              qos-level
                  name: DEFAULT
                  use: default QoS Level
                  sl: 0
              end-qos-level
      
              # the whole set: SL, MTU-Limit, Rate-Limit, PKey, Packet Lifetime
              qos-level
                  name: LustreTraffic
                  sl: 1
              end-qos-level
      
          end-qos-levels
      
      
          # Match rules are scanned in order of their apperance in the policy file.
          # First matched rule takes precedence.
          qos-match-rules
      
              # show matching by destination group and service id
              qos-match-rule
                  service-id: 0x010603db
                  qos-level-name: LustreTraffic
              end-qos-match-rule
      
          end-qos-match-rules
      
      

      The above qos-policy matches any connections with service-id 0x010603db to service level 1.

      1. This has been verified by printing out the service_id. Note it's being printed in network byte order:
        1. (o2iblnd_cb.c:851:kiblnd_post_tx_locked())
          	192.168.2.2@o2ib: cmid service_id = 0x010603db,
              sl = 0 ps = 0x106 mtu = 5 rate = 16·
  3. The SL2VL and VLArb tables should be configured in: /etc/opensm/opensm.conf. Refer to the SL2VL Mapping and VL Arbitration section in doc/QoS_management_in_OpenSM.txt for more details.

OPA QoS

OPA performs similar functions to what's described above but through configuring the opa-fm.

For more details refer to: https://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_OP_FabricSuite_Fabric_Manager_UG_H76468_v1_0.pdf section 4.

Overview

Configuring opa-fm is done through /etc/sysconfig/opa-fm.xml.

The configuration of vFabrics consists of the following sections:

  1. Applications - describes applications that can run on one or more end nodes
  2. DeviceGroups - describes a set of end nodes in the fabric
  3. VirtualFabrics - defines a vFabric consisting of a group of applications, a set of devices, and the operating parameters for the vFabric

Traffic can be matched to a specific application via the Service-ID.

The application is defined within a Virtual Fabric.

A QoS policy can be defined within a Virtual Fabric

Example

<VirtualFabric>
<Name>Storage</Name>
<Enable>1</Enable>
<Security>1</Security>
<QOS>1</QOS>
<Bandwidth>20%</Bandwidth>
<!--  <Member>group_with_storage_targets</Member>  -->
<LimitedMember>All</LimitedMember>
<Application>Storage</Application>

<Application>
<Name>Storage</Name>
<IncludeApplication>iSER</IncludeApplication>
<IncludeApplication>Lustre</IncludeApplication>
</Application>

<Application>
<Name>Lustre</Name>
<!--
 ServiceID = 0x000000000106XXXX where XXXX is socket "port number" 
-->
<!--
 <ServiceIDRange>0x0000000001060000-0x000000000106FFFF</ServiceIDRange> 
-->
<!--  Default port 987 = 0x03DB  -->
<ServiceID>0x00000000010603DB</ServiceID>
</Application>

Here is a sample opafm.xml file

QoS for RoCE

https://community.mellanox.com/docs/DOC-2894

Resources