You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Target releaseLustre 2.13
Epic
Document status
DRAFT
Document owner
DesignerAmir Shehata
Developers
QA

Scope

Problem Statement

Multi-Rail (MR) and Health add the capability of using all the interfaces available. Instead of configuring each interface on a separate LNet, it is now possible to configure multiple interfaces on the same LNet. MR code can now iterate over all interfaces and select an interface to send a message based on one of the following criteria:

  1. Healthiest interface
  2. NUMA closeness
  3. Most available credits
  4. Round Robin

However, there is no way for the user to override this selection criteria. This becomes particularly useful when desiring to redirect traffic over specific paths depending on source and/or destination NIDs

Use Cases

There are two uses cases outlined on LU-9121 - Getting issue details... STATUS

  1. The ability to select routers based on destination peer NIDs
  2. Assigning different interfaces to separate targets (MDT, OST) running on the same node

Other possible use cases

  1. Assigning priority to different networks. For example if you have an OPA and a IB network and you want to always use the IB network as long as it's alive.
  2. Preferring a specific path to avoid a bottleneck in the fabric

Work Overview

There are four primary tasks in this project

  1. Reuse and improve the existing policy infrastructure, currently used for creating runtime errors
    1. Currently there exists some infrastructure which was added to enable error injection policies to be added. This infrastructure can be enhanced to handle all policy based code. Both UDSP and Fault Injection features can use it.
  2. Implement the kernel side feature on top of this infrastructure, based on IOCTL commands
    1. Once we have the infrastructure, we can add the different UDSP policies
  3. Implement user space controls to drive the kernel side LNet feature
    1. Each UDSP policy will have a corresponding user space management interface.
  4. Write a Unit Test Plan
    1. This should be a live document modified as the design progresses.
    2. I'm pondering making it part of the High-Level Design. Basically each portion of the design will have a Test Plan section to be filled out
  5. Create a set of LUTF scripts to test this feature.
    1. It is critical to create the test scripts along side the development, because based on previous experience, the more test scripts are delayed the more likely they will not get done.

The High Level Design document will detail each of these tasks.

Requirements

These requirements are from the original MR Requirements document.

IDClassVersionStatusDescription
cfg-095REQUIRED1.0ACCEPTEDDLC shall provide APIs to configure User Defined Selection Policy (UDSP)
cfg-100REQUIRED1.0ACCEPTEDUDSP shall be comprised of a set of rules.
cfg-105REQUIRED1.0ACCEPTEDOnly one UDSP shall be added/removed/modified per configuration operation
cfg-110REQUIRED1.0ACCEPTEDUDSP shall allow rules which define network priorities
cfg-115REQUIRED1.0ACCEPTEDUDSP shall allow rules which define interface priorities
cfg-120REQUIRED1.0ACCEPTEDUDSP shall allow rules which define one local NID to one remote NID mapping (1:1).
cfg-125REQUIRED1.0ACCEPTEDUDSP shall allow rules which define mapping priority.
cfg-130REQUIRED1.0ACCEPTED

UDSP shall allow rules which define many local NIDs to many remote NIDs mapping (N:N).

cfg-135REQUIRED1.0ACCEPTED

UDSP shall allow rules which define many local NIDs to a one remote NID mapping (N:1).

cfg-140REQUIRED1.0ACCEPTED

UDSP shall allow rules which define one local NID to many remote NIDs mapping (1:N).

cfg-145DESIRED1.0ACCEPTED

UDSP shall allow rules which define the number of messages that should be sent using one rule. This allows fine grained control over traffic distribution.

cfg-150REQUIRED1.0ACCEPTEDUDSP rules shall provide the option to define relative rule priority
cfg-155REQUIRED1.0ACCEPTED

If UDSP rule priority is not defined it defaults to highest priority

cfg-160REQUIRED1.0ACCEPTEDlnetctl utility shall provide a command line front end interface to configure UDSP by calling the DLC APIs mentioned in the above requirements
cfg-165REQUIRED1.0ACCEPTEDlnetctl utility shall accept and parse YAML configuration files specifying UDSP configuration
  • No labels