Slide 1 - Introduciton

Hello, my name is Amir Shehata, lead Luster Networking Engineer at DDN Storage.

Today I'd like talk to you about Multi rail networking capabilities in the EXAScaler product [ Luster Networking Layer ].

Slide 2 - Agenda

I'll start off by explaining what Multi-Rail is all about and what it offers us

I will then cover a relevant example to show the simplification Multi-Rail brings to EXAScaler network deployment and configuration

From there I'll highlight the benefits and give some ideas on best configuration practices.

Let's get started

Slide 3 - What is Multi-Rail and Health

In this presentation, when I speak about Multi-Rail, I'm referring to two related features: Multi-Rail and Health.

First let's touch on Multi-Rail. Best way to explain it is to look at how LNet used to work before.

If a node had multiple interfaces, each interface had to be configured on a different LNet network, like o2ib1, o2ib2, etc.

Multi-Rail does a couple of important things. It allows us to group homogeneous interfaces in the same network. Instead of configuring the node's interfaces in different networks, now we can configure them all in the same network. Simplifies configuration tremendously.

This allows LNet to use all the interfaces in the same network. If we have two nodes, and each node has multiple interfaces, we can configure all the interfaces on the same network and LNet can use them in Active-Active setup. Basically, LNet will select the best interface from the group and use it.

If a node has heterogeneous interfaces, for example 2 IB interfaces and 2 OPA interfaces. We will by necessity group them into two different networks. IB interfaces in o2ib and OPA interfaces in o2ib1. However, LNet can still use all these interfaces to communicate with peers which are on the same networks. This goes beyond standard bonding, which requires homogeneous interfaces.

So fare we talked about using all the interfaces in Active-Active, but what about resiliency. With Health, we do not need to sacrifice performance for resiliency. The LNet Health feature allows us to keep track of the health of all configured networks and interfaces. When LNet selects the interface to use it selects the interface with the best health.

To explain this further, let's say, we have two interfaces in the o2ib network, IB0 and IB1. If LNet fails to send on IB0, then it will decrement the health of that interface and retry on the IB1 interface. It will then keep monitoring the IB0 interface until it's sure it is healthy again. In the mean LNet will keep using IB1, the healthier interface.

Slide 4 - Without MR

Now let's look at an example. The DGX-2, NVIDIA's AI box, can have up to 8 different interfaces. Without MR each one of these interfaces will need to be configured in its own LNet network. And since the EXAScaler servers do not have as many interfaces, we will have to alias these interfaces and connect the aliases to the different networks. Complicated configuration for sure.

Slide 5 - With MR

Once we throw MR in the mix, the configuration becomes very simply. Only one LNet network, and all the interfaces of all the nodes, clients and servers, are connected to that network. Of course this is only one potential configuration.

Slide 6 - Dual Fabric

The key point I'm trying to make is now we can match the underlying fabric. If we're dealing with two fabrics, OPA vs IB as an example, or even if the underlying fabric network has two segments, then Multi-Rail allows us to match it. This makes configuration much more intuitive and less complicated.