I've been involved in an investigation to try and understand how opafm.xml should be configured in order redirect lustre traffic to the appropriate Virtual Fabric.

The final opafm.xml config should look like:

209     <Application>
210        <Name>LNet</Name>
211        <ServiceID>0x00000000010603DB</ServiceID>
212     </Application>
213 
214     <Application>
215       <Name>Lustre</Name>
216       <IncludeApplication>LNet</IncludeApplication>
217     </Application>
694
695     <VirtualFabric>
696       <Name>Lustre</Name>
697       <Enable>1</Enable>  
698       <PKey>0x0001</PKey>
699       <Security>1</Security>
700       <QOS>1</QOS>
701       <Bandwidth>20%</Bandwidth>
702       <Member>All</Member>
703       <Application>Lustre</Application>
704       <BaseSL>2</BaseSL>
705       <Standby>0</Standby>
706     </VirtualFabric>

The above configuration matches the serviceID against the virtual Fabric.

However in order for the Virtual Fabric to be matched the PKey needs to match what's being specified in the sa_query. Look at the kernel code below

2008 __be64 rdma_get_service_id(struct rdma_cm_id *id, struct sockaddr *addr)
2009 {
2010 »·······if (addr->sa_family == AF_IB)
2011 »·······»·······return ((struct sockaddr_ib *) addr)->sib_sid;
2012 
2013 »·······return cpu_to_be64(((u64)id->ps << 16) + be16_to_cpu(cma_port(addr)));
2014 }
2015 EXPORT_SYMBOL(rdma_get_service_id);
 
139 static inline u16 ib_addr_get_pkey(struct rdma_dev_addr *dev_addr)
140 {
141 »·······return ((u16)dev_addr->broadcast[8] << 8) | (u16)dev_addr->broadcast[9];
142 }
 
2293 static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms,
2294 »·······»·······»·······      struct cma_work *work)
2295 {
2296 »·······struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr;
2297 »·······struct ib_sa_path_rec path_rec;
2298 »·······ib_sa_comp_mask comp_mask;
2299 »·······struct sockaddr_in6 *sin6;
2300 »·······struct sockaddr_ib *sib;
2302 
2303 »·······memset(&path_rec, 0, sizeof path_rec);
2304 »·······rdma_addr_get_sgid(dev_addr, &path_rec.sgid);
2305 »·······rdma_addr_get_dgid(dev_addr, &path_rec.dgid);
2310 »·······path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(dev_addr));
2311 »·······path_rec.numb_path = 1;
2312 »·······path_rec.reversible = 1;
2313 »·······path_rec.service_id = rdma_get_service_id(&id_priv->id, cma_dst_addr(id_priv));
2314 
2315 »·······comp_mask = IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |
2316 »·······»·······    IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH |
2317 »·······»·······    IB_SA_PATH_REC_REVERSIBLE | IB_SA_PATH_REC_SERVICE_ID;
2318 
2319 »·······switch (cma_family(id_priv)) {
2320 »·······case AF_INET:
2321 »·······»·······path_rec.qos_class = cpu_to_be16((u16) id_priv->tos);
2322 »·······»·······comp_mask |= IB_SA_PATH_REC_QOS_CLASS;
2323 »·······»·······break;
2324 »·······case AF_INET6:
2325 »·······»·······sin6 = (struct sockaddr_in6 *) cma_src_addr(id_priv);
2326 »·······»·······path_rec.traffic_class = (u8) (be32_to_cpu(sin6->sin6_flowinfo) >> 20);
2327 »·······»·······comp_mask |= IB_SA_PATH_REC_TRAFFIC_CLASS;
2328 »·······»·······break;
2329 »·······case AF_IB:
2330 »·······»·······sib = (struct sockaddr_ib *) cma_src_addr(id_priv);
2331 »·······»·······path_rec.traffic_class = (u8) (be32_to_cpu(sib->sib_flowinfo) >> 20);
2332 »·······»·······comp_mask |= IB_SA_PATH_REC_TRAFFIC_CLASS;
2333 »·······»·······break;
2334 »·······}
2335 
2338 »·······id_priv->query_id = ib_sa_path_rec_get(&sa_client, id_priv->id.device,
2339 »·······»·······»·······»·······»·······       id_priv->id.port_num, &path_rec,
2340 »·······»·······»·······»·······»·······       comp_mask, timeout_ms,
2341 »·······»·······»·······»·······»·······       GFP_KERNEL, cma_query_handler,
2342 »·······»·······»·······»·······»·······       work, &id_priv->query);
2344 
2345 »·······return (id_priv->query_id < 0) ? id_priv->query_id : 0;
2346 }

opensm works differently in that it only matches against the pkey if it's explicitly defined in the qos-config.conf file.

opafm works differently in that it generates a pkey if the <security> field is set to 1. And the likelyhood is that the generated pkey is not going to match the pkey in the query therefore failing the matching.

according to Renae Weber (renae.weber@intel.com)

If ipoib is using the Default VF and you do not specify a PKey for your Lustre VF,
it will use the same PKey* – which would seem to be consistent with what opensm is doing**.
It is when you enable Security in the Lustre VF, that it has an inconsistent PKey
(in the example below, it is 0x2) and the failure occurs.

*AMIR: As long as <Security>0</Security>
**AMIR: I'm not sure if that's 100% accurate. I think the behavior would still be different.
opafm.xml seems to require more work to configure correctly in that regard.

    <!-- A default VF with all Devices and Applications -->
    <!-- This is the default Virtual Fabric config -->
    <VirtualFabric>
      <Name>Default</Name>
      <Enable>1</Enable>
      <Security>0</Security>
      <QOS>0</QOS>
      <Member>All</Member>
      <Application>AllOthers</Application>
      <MaxMTU>Unlimited</MaxMTU>
      <MaxRate>Unlimited</MaxRate>
    </VirtualFabric>

    <VirtualFabric>
      <Name>Lustre</Name>
      <Enable>1</Enable>
      <Security>0</Security>
      <QOS>1</QOS>
      <Bandwidth>20%</Bandwidth>
      <Member>All</Member>
      <Application>Lustre</Application>
      <BaseSL>2</BaseSL>
      <Standby>0</Standby>
    </VirtualFabric>

[root@phwtpriv21 ~]# opasaquery -ovfinfo
vFabric Index: 0   Name: Default 
ServiceId: 0x0000000000000000  MGID: 0x0000000000000000:0x0000000000000000
PKey: 0x8001   SL: 0  Select: 0x0   PktLifeTimeMult: unspecified 
MaxMtu: unlimited  MaxRate: unlimited   Options: 0x00 
QOS: Disabled  PreemptionRank: 0  HoQLife:    8 ms
-------------------------------------------------------------------------------
vFabric Index: 1   Name: Lustre 
ServiceId: 0x0000000000000000  MGID: 0x0000000000000000:0x0000000000000000
PKey: 0x8001   SL: 2  Select: 0x2: SL   PktLifeTimeMult: unspecified 
MaxMtu: unlimited  MaxRate: unlimited   Options: 0x02: QoS 
QOS: Bandwidth:  20%  PreemptionRank: 0  HoQLife:    8 ms
-------------------------------------------------------------------------------
vFabric Index: 2   Name: Admin 
ServiceId: 0x0000000000000000  MGID: 0x0000000000000000:0x0000000000000000
PKey: 0x7fff   SL: 0  Select: 0x1: PKEY   PktLifeTimeMult: unspecified 
MaxMtu: unlimited  MaxRate: unlimited   Options: 0x01: Security 
QOS: Disabled  PreemptionRank: 0  HoQLife:    8 ms
-------------------------------------------------------------------------------

The pkey can be configured as described in the link below:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-configure_ipoib_using_the_command_line

 

  • No labels