Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

It is sometimes desirable to fine tune the selection of local/remote NIs used for communication. For example currently if there are two networks an o2ib OPA and a tcp MLX network, both will be used. Especially if the traffic volume is low the credits criteria will be equivalent between the nodes, and both networks will be used in round robin. However, the user might want to use one network for all traffic and keep the other network free unless the other network goes down.

...

UDSPs are configured from lnetctl via either command line or YAML config files and then passed to the kernel. Policies are applied to all local networks and remote peers then stored in the kernel. During the selection process the policies are examined as part of the selection algorithm. Whenever new peers/peer_nis/local networks/local nis are added they are matched against the rules.The user interface is recorded here.Policies will be the top most priority in the selection process since it is user defined. The rest of the selection criteria will be applied on the subset of interfaces which match the policies.

UDSP Rules Types

There are three UDSP rule types

  1. Network rules
  2. NID rules
  3. Pair rules

Network Rules

These rules define the relative priority of the networks against each other. 0 is the highest priority. Networks with higher priorities will be selected during the selection algorithm.

NID Rules

These rules define the relative priority of individual NIDs. 0 is the highest priority. Once a network is selected the NID with the highest priority is preferred. Note that NID priority is prioritized below health. For example, if there are two NIDs, NID-A and NID-B. NID-A has higher priority but lower health value, NID-B will still be selected. In that sense the policies are there as a hint to guide the selection algorithm.

Pair Rules


Design Principle

Rules shall be defined throug user space and passed to the LNet module. LNet shall store these rules. The net priority and NI priority as separate rules, stored in a separate data structure. Once they are configured they can be applied to the networks. The advantage of that is that rules are not strictly tied to the internal constructs, but can be applied whenever the internal constructs are created and if the internal constructs are deleted then they remain and can be automatically applied at a future time.

This makes configuration easy since a set of rules can be defined, like "all IB networks priority 1", "all Gemini networks priority 2", etc, and when a network is added, it automatically inherits these rules.

Selection policy rules are comprised of two parts:

  1. The matching rule
  2. The rule action

The matching rule is what's used to match a NID or a network. The action is what's applied when the rule is matched.

A rule can be uniquely identified using the matching rule or an internal ID which assigned by the LNet module when a rule is added and returned to the user space when they are returned as a result of a show command.

cfg-100, cfg-105, cfg-110, cfg-115, cfg-120, cfg-125, cfg-130, cfg-135, cfg-140, cfg-160, cfg-165

lnetctl Interface

# Adding a network priority rule. If the NI under the network doesn't have
# an explicit priority set, it'll inherit the network priority:
lnetctl > selection net [add | del | show] -h
Usage: selection net add --net <network name> --priority <priority>
  
WHERE:
 
selection net add: add a selection rule based on the network priority
        --net: network string (e.g. o2ib or o2ib* or o2ib[1,2])
        --priority: Rule priority
 
Usage: selection net del --net <network name> [--id <rule id>]
  
WHERE:
 
selection net del: delete a selection rule given the network patter or the id. If both
                   are provided they need to match or an error is returned.
        --net: network string (e.g. o2ib or o2ib* or o2ib[1,2])
        --id: ID assigned to the rule returned by the show command.
  
Usage: selection net show [--net <network name>]
 
WHERE:
 
selection net show: show selection rules and filter on network name if provided.
        --net: network string (e.g. o2ib or o2ib* or o2ib[1,2])
  
# Add a NID priority rule. All NIDs added that match this pattern shall be assigned
# the identified priority. When the selection algorithm runs it shall prefer NIDs with
# higher priority.
lnetctl > selection nid [add | del | show] -h
Usage: selection nid add --nid <NID> --priority <priority>
 
WHERE:
 
selection nid add: add a selection rule based on the nid pattern
        --nid: nid pattern which follows the same syntax as ip2net
        --priority: Rule priority
 
 
Usage: selection nid del --nid <NID> [--id <rule id>]
 
WHERE:
 
selection nid del: delete a selection rule given the nid patter or the id. If both
                   are provided they need to match or an error is returned.
        --nid: nid pattern which follows the same syntax as ip2net
        --id: ID assigned to the rule returned by the show command.
 
 
Usage: selection nid show [--nid <NID>]
 
WHERE:
 
selection nid show: show selection rules and filter on NID pattern if provided.
        --nid: nid pattern which follows the same syntax as ip2net
# Adding point to point rule. This creates an association between a local NI and a remote
# NID, and assigns a priority to this relationship so that it's preferred when selecting a pathway..
lnetctl > selection peer [add | del | show] -h
Usage: selection peer add --local <NID> --remote <NID> --priority <priority>
 
WHERE:
 
selection peer add: add a selection rule based on local to remote pathway
        --local: nid pattern which follows the same syntax as ip2net
        --remote: nid pattern which follows the same syntax as ip2net
        --priority: Rule priority
 
Usage: selection peer del --local <NID> --remote <NID> --id <ID>
 
WHERE:
 
selection peer del: delete a selection rule based on local to remote NID pattern or id
        --local: nid pattern which follows the same syntax as ip2net
        --remote: nid pattern which follows the same syntax as ip2net
        --id: ID of the rule as provided by the show command.
 
Usage: selection peer show [--local <NID>] [--remote <NID>]
 
WHERE:
 
selection peer show: show selection rules and filter on NID patterns if provided.
        --local: nid pattern which follows the same syntax as ip2net
        --remote: nid pattern which follows the same syntax as ip2net
 
# the output will be of the same YAML format as the input described below.

YAML Syntax

Each selection rule will translate into a separate IOCLT to the kernel.

# Configuring Network rules
selection:
    - type: net
      net: <net name or pattern. e.g. o2ib1, o2ib*, o2ib[1,2]>
      priority: <Unsigned integer where 0 is the highest priority>
 
# Configuring NID rules:
selection:
    - type: nid
      nid: <a NID pattern as described in the Lustre Manual ip2net syntax>
      priority: <Unsigned integer where 0 is the highest priority>
 
# Configuring Point-to-Point rules.
selection:
    - type: peer
      local: <a NID pattern as described in the Lustre Manual ip2net syntax>
      remote: <a NID pattern as described in the Lustre Manual ip2net syntax>
      priority: <Unsigned integer where 0 is the highest priority>
 
# to delete the rules, there are two options:
# 1. Whenever a rule is added it will be assigned a unique ID. Show command will display the
#    unique ID. The unique ID must be explicitly identified in the delete command.
# 2. The rule is matched in the kernel based on the matching rule, unique identifier.
#    This means that there can not exist two rules that have the exact matching criteria
# Both options shall be supported.

Flattening rules

Rules  will have a serialize and deserialize APIs. The serialize API will flatten the rules into a contiguous buffer that will be sent to the kernel. On the kernel side the rules will be deserialzed to be stored and queried. When the userspace queries the rules, the rules are serialized and sent up to user space, which deserializes it and prints it in a YAML format.

DLC API

Code Block
/* This is a common structure which describes an expression */
struct lnet_match_expr {
    __u32   lme_start;
    __u32   lme_end;
    __u32   lme_incr;
    char    lme_r_expr[0];
};

struct lnet_selection_descriptor {
    enum selection_type lsd_type;
    char                *lsd_pattern1;
    char                *lsd_pattern2;

    union {
        __u32           lsda_priority;
    } lsd_action_u;
};

/*
 * lustre_lnet_add_selection
 *   Delete the peer NIDs. If all peer NIDs of a peer are deleted
 *   then the peer is deleted
 *
 *   selection - describes the selection policy rule
 *   seq_no - sequence number of the command
 *   err_rc - YAML structure of the resultant return code
 */
int lustre_lnet_add_selection(struct selection_descriptor *selection, int seq_no, struct cYAML **er_rc);

Selection Policies

There are four different types of rules that this HLD will address:

...

Gliffy Diagram
namePreferredNID
pagePin7

Preferred local/remote NID pairs

...

Gliffy Diagram
namepeer2peer
pagePin1

Refer to Olaf's LUG 2016/LAD 2016 PPT for more context.

...

Gliffy Diagram
nameSerialized Net rule
pagePin2

The rest of the rules will look very similar as above, except that the list of rules included in the memory pointed to by rule_bulk is going to contain the pertinent structure format.

...

Code Block
/*
 * lnet_sel_rule_serialize()
 * 	Serialize the rules pointed to by rules into the memory block that is provided. In order for this
 *  API to work in both Kernel and User space the bulk pointer needs to be passed in. When this API
 *  is called in the kernel, it is expected that the bulk memory is allocated in userspace. This API
 *  is intended to be called from the kernel to serialize the rules before sending it to user space
 * 		rules [IN] - rules to be serialized
 * 		rule_type [IN] - rule type to be serialized
 * 		bulk_size [IN] - size of memory allocated.
 *  	bulk [OUT] - allocated block of memory where the serialized rules are stored.
 */
int lnet_sel_rule_serialize(struct list_head *rules, enum lnet_sel_rule_type rule_type, __u32 *bulk_size, void __user *bulk);
 
/*
 * lnet_sel_rule_deserialize()
 * 	Given a bulk of rule_type rules, deserialize and append rules to the linked
 *  list passed in. Each rule is assigned an ID > 0 if an ID is not already assigned 
 * 		bulk [IN] - memory block containing serialized rules
 * 		bulk_size [IN] - size of bulk memory block
 * 		rule_type [IN] - type of rule to deserialize
 * 		rules [OUT] - linked list to append the deserialized rules to
 */
int lnet_sel_rule_deserialize(void __user *bulk, __u32_bulk_size, enum lnet_sel_rule_type rule_type, struct list_head *rules);

...


Policy IOCTL Handling

Three new IOCTLs will need to be added: IOC_LIBCFS_ADD_RULES, IOC_LIBCFS_DEL_RULES, and IOC_LIBCFS_GET_RULES.

...

Gliffy Diagram
size1200
nameDragonFly Topology
pagePin2

The diagram above was inspired by: https://www.ece.tufts.edu/~karen/classes/final_presentation/Dragonfly_Topology_Long.pptx

...