Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

IB QoS

In standard IB terms, VL stands for Virtual Lane and SL for Service Level.

...

  1. use the ko2iblnd 'service' option, which indrectly sets the IB SL, i.e. service level
    1. The service_id is calculated based on the port space and service port used as the destination port
      1. The service port when connecting to a peer:
        1. Code Block
          1240 static void
          1241 kiblnd_connect_peer(struct kib_peer *peer)
          1242 {
          ...
          1269 »·······memset(&dstaddr, 0, sizeof(dstaddr));
          1270 »·······dstaddr.sin_family = AF_INET;
          1271 »·······dstaddr.sin_port = htons(*kiblnd_tunables.kib_service);
          1272 »·······dstaddr.sin_addr.s_addr = htonl(LNET_NIDADDR(peer->ibp_nid));
          1273 
          1274 »·······kiblnd_peer_addref(peer);»······       /* cmid's ref */
          1275 
          1276 »·······if (*kiblnd_tunables.kib_use_priv_port) {
          1277 »·······»·······rc = kiblnd_resolve_addr(cmid, &srcaddr, &dstaddr,
          1278 »·······»·······»·······»·······»······· *kiblnd_tunables.kib_timeout * 1000);
          1279 »·······} else {
          1280 »·······»·······rc = rdma_resolve_addr(cmid,
          1281 »·······»·······»·······»·······       (struct sockaddr *)&srcaddr,
          1282 »·······»·······»·······»·······       (struct sockaddr *)&dstaddr,
          1283 »·······»·······»·······»·······       *kiblnd_tunables.kib_timeout * 1000);
          1284 »·······}
          ...
          1306 }
      2. When the address is resolved successfully then rdma_resolve_route() is called
        1. Code Block
          rdma_resolve_route() -> cma_resolve_ib_route() -> cma_query_ib_route()
        2. Code Block
          path_rec.service_id = rdma_get_service_id(&id_priv->id,
          »·······»·······»·······»·······»·······  cma_dst_addr(id_priv));
           
          2031 __be64 rdma_get_service_id(struct rdma_cm_id *id, struct sockaddr *addr)                                                       
          2032 {                                                                                                                              
          2033 »·······if (addr->sa_family == AF_IB)                                                                                          
          2034 »·······»·······return ((struct sockaddr_ib *) addr)->sib_sid;                                                                 
          2035                                                                                                                                
          2036 »·······return cpu_to_be64(((u64)id->ps << 16) + be16_to_cpu(cma_port(addr)));                                                 
          2037 }                                                                                                                              
          2038 EXPORT_SYMBOL(rdma_get_service_id);
           
          /*
           * The port space is: RDMA_PS_TCP
           */
           67 enum rdma_port_space {                                                                                                          
           68 »·······RDMA_PS_SDP   = 0x0001,                                                                                                 
           69 »·······RDMA_PS_IPOIB = 0x0002,                                                                                                 
           70 »·······RDMA_PS_IB    = 0x013F,                                                                                                 
           71 »·······RDMA_PS_TCP   = 0x0106,                                                                                                 
           72 »·······RDMA_PS_UDP   = 0x0111,                                                                                                 
           73 }; 
           
          /* The service ID == 0x10603DB0x010603DB */
  2. configure OpenSM to match the service ID with the a specific QoS policy. From the OpenSM tree: doc/QoS_management_in_OpenSM.txt
    1. Code Block
      # in /etc/opensm/qos-policy.conf
       
          qos-levels
      
              # Having a QoS Level named "DEFAULT" is a must - it is applied to
              # PR/MPR requests that didn't match any of the matching rules.
              qos-level
                  name: DEFAULT
                  use: default QoS Level
                  sl: 0
              end-qos-level
      
              # the whole set: SL, MTU-Limit, Rate-Limit, PKey, Packet Lifetime
              qos-level
                  name: LustreTraffic
                  sl: 1
              end-qos-level
      
          end-qos-levels
      
      
          # Match rules are scanned in order of their apperance in the policy file.
          # First matched rule takes precedence.
          qos-match-rules
      
              # show matching by destination group and service id
              qos-match-rule
                  service-id: 0x10603DB0x010603db
                  qos-level-name: LustreTraffic
              end-qos-match-rule
      
          end-qos-match-rules
      
      

      The above qos-policy matches any connections with service-id 0x10603DB 0x010603db to service level 1.

      1. This has been verified by printing out the service_id. Note it's being printed in network byte order:
        1. Code Block
          (o2iblnd_cb.c:851:kiblnd_post_tx_locked())
          	192.168.2.2@o2ib: cmid service_id = 0x010603db,
              sl = 0 ps = 0x106 mtu = 5 rate = 16·
  3. The SL2VL and VLArb tables should be configured in: /etc/opensm/opensm.conf. Refer to the SL2VL Mapping and VL Arbitration section in doc/QoS_management_in_OpenSM.txt for more details.

OPA QoS

OPA performs similar functions to what's described above but through configuring the opa-fm.

For more details refer to: https://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_OP_FabricSuite_Fabric_Manager_UG_H76468_v1_0.pdf section 4.

Overview

Configuring opa-fm is done through /etc/sysconfig/opa-fm.xml.

The configuration of vFabrics consists of the following sections:

  1. Applications - describes applications that can run on one or more end nodes
  2. DeviceGroups - describes a set of end nodes in the fabric
  3. VirtualFabrics - defines a vFabric consisting of a group of applications, a set of devices, and the operating parameters for the vFabric

Traffic can be matched to a specific application via the Service-ID.

The application is defined within a Virtual Fabric.

A QoS policy can be defined within a Virtual Fabric

Example

Code Block
<VirtualFabric>
<Name>Storage</Name>
<Enable>1</Enable>
<Security>1</Security>
<QOS>1</QOS>
<Bandwidth>20%</Bandwidth>
<!--  <Member>group_with_storage_targets</Member>  -->
<LimitedMember>All</LimitedMember>
<Application>Storage</Application>

<Application>
<Name>Storage</Name>
<IncludeApplication>iSER</IncludeApplication>
<IncludeApplication>Lustre</IncludeApplication>
</Application>

<Application>
<Name>Lustre</Name>
<!--
 ServiceID = 0x000000000106XXXX where XXXX is socket "port number" 
-->
<!--
 <ServiceIDRange>0x0000000001060000-0x000000000106FFFF</ServiceIDRange> 
-->
<!--  Default port 987 = 0x03DB  -->
<ServiceID>0x00000000010603DB</ServiceID>
</Application>

Here is a sample opafm.xml file

QoS for RoCE

https://community.mellanox.com/docs/DOC-2894

Resources