Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The user interface is recorded here.

Selection Policies

There are four different types of rules that this HLD will address:

  1. LNet Network priority rule
    1. This rule assigns a priority to a network. During selection the network with the highest priority is preferred.
  2. Local NID rule
    1. This rule assigns a priority to a local NID within an LNet network. This NID is preferred during selection.
  3. Remote NID rule
    1. This rule assigns a priority to a remote NID within an LNet network. This NID is preferred during selection
  4. Peer-to-peer rules
    1. This rule associates local NIs with peer NIs. When selecting a peer NI to send to the one associated with the selected local NI is preferred.

These rules are applied differently in the kernel.

The Network priority rule results in a priority value in the struct lnet_net to be set to the one defined in the rule. The local NID rule results in a priority value in the struct lnet_ni to be set to the one define din the rule. The remote NID rule results in a priority value in the struct lnet_peer_ni to be set to the one defined in the rule. The infrastructure for peer-to-peer rules is implemented via a list of preferred NIDs kept in the struct lnet_peer_ni structure. Once the local network/best NI are already selected, we go through all the peer NIs on the same network and prefer the peer NI which has the best NIs NID on its preferred list. Thereby, preferring a specific pathway between the node and the peer.

Each of these rules impacts a different part of the selection algorithm. The Network rule impacts the selection of the local network. Local NID rules impacts the selection of the best NI to send out of from the preferred network. Remote NID and peer-to-peer rules both impact the peer NI to send to. 

It is possible to use both the local NID rule and the peer-to-peer rule to force messages to always take a specific path. For example, assuming a node with three interfaces 10.10.10.3, 10.10.10.4 and 10.10.10.5 and two rules as follows:

Code Block
selection:
    - type: nid
	  local: true
      nid: 10.10.10.5
      priority: 0
 
selection:
    - type: peer
      local: 10.10.10.5
      remote: 10.10.10.6
      priority: 0

These two rules will always prefer sending messages from 10.10.10.5 to 10.10.10.6. As opposed to only sending it occasionally when the 10.01.10.5 interface is selected every third message assuming round robin.

As another example, it is also possible to prioritize a set of local and remote NIs so that they are always preferred. Assuming two peers

  • PeerA: 10.10.10.2, 10.10.10.3 and 10.10.30.2
  • PeerB: 10.10.10.4, 10.10.10.5 and 10.10.30.3

We can setup the following rules:

Code Block
selection:
    - type: nid
	  local: true
      nid: 10.10.10.*
      priority: 0
 
selection:
    - type: nid
	  local: false
      nid: 10.10.10.*
      priority: 0

The following rules will always prefer messages to be sent between the 10.10.10.* interfaces, rather than the 10.10.30.* interfaces.

The question to answer is if such restrictions generally useful? One use case for such rules is while debugging or characterizing the network. Another argument is that the clusters that use lustre are so diverse that allowing them flexibility over traffic control is a benefit for them, as long as the default behavior is optimal out of the box.

The following section attempts to outline some real life scenario where these rules can be used.

Use Cases

Preferred Network

...

Code Block
/*
 * lustre_lnet_add_net_sel_pol
 *   Add a net selection policy. If there already exists a 
 *   policy for this net it will be updated.
 *      net - Network for the selection policy
 *      priority - priority of the rule
 */
int lustre_lnet_add_net_sel_pol(char *net, int priority);
 
/*
 * lustre_lnet_del_net_sel_pol
 *   Delete a net selection policy.
 *      net - Network for the selection policy
 *      id - [OPTIONAL] ID of the policy. This can be retrieved via a show command.
 */
int lustre_lnet_del_net_sel_pol(char *net, int id);
 
/*
 * lustre_lnet_show_net_sel_pol
 *   Show configured net selection policies.
 *      net - filter on the net provided.
 */
int lustre_lnet_show_net_sel_pol(char *net);
 
/*
 * lustre_lnet_add_nid_sel_pol
 *   Add a nid selection policy. If there already exists a 
 *   policy for this nid it will be updated. NIDs can be either
 *   local NIDs or remote NIDs.
 *      nid - NID for the selection policy selection policy
 *		local - is this a local NID
 *      priority - priority of the rule
 */
int lustre_lnet_add_nid_sel_pol(char *nid, bool local, int priority);
 
/*
 * lustre_lnet_del_nid_sel_pol
 *   Delete a nid selection policy.
 *      nid - NID for the selection policy
 * 		local - is this a local NID
 *      id - [OPTIONAL] ID of the policy. This can be retrieved via a show command.
 */
int lustre_lnet_del_nid_sel_pol(char *nid, int id);
 
/*
 * lustre_lnet_show_nid_sel_pol
 *   Show configured nid selection policies.
 *      nid - filter on the NID provided.
 */
int lustre_lnet_show_nid_sel_pol(char *nid);
 
/*
 * lustre_lnet_add_nid_sel_pol
 *   Add a peer to peer selection policy. If there already exists a 
 *   policy for the pair it will be updated.
 *      src_nid - source NID
 *      dst_nid - destination NID
 *      priority - priority of the rule
 */
int lustre_lnet_add_peer_sel_pol(char *src_nid, char *dst_nid, int priority);
 
/*
 * lustre_lnet_del_peer_sel_pol
 *   Delete a peer to peer selection policy.
 *      src_nid - source NID
 *      dst_nid - destination NID
 *      id - [OPTIONAL] ID of the policy. This can be retrieved via a show command.
 */
int lustre_lnet_del_peer_sel_pol(char *src_nid, char *dst_nid, int id);


/*
 * lustre_lnet_show_peer_sel_pol
 *   Show peer to peer selection policies.
 *      src_nid - [OPTIONAL] source NID. If provided the output will be filtered
 *                on this value.
 *      dst_nid - [OPTIONAL] destination NID. If provided the output will be filtered
 *                on this value.
 */
int lustre_lnet_show_peer_sel_pol(char *src_nid, char *dst_nid);

...

Code Block
/*
 * describes a network:
 *  nw_id: can be the base network name, ex: o2ib or a full network id, ex: o2ib3.
 *  nw_expr: an expression to describe the variable part of the network ID
 *           ex: tcp* - all tcp networks
 *           ex: tcp[1-5] - resolves to tcp1, tcp2, tcp3, tcp4 and tcp5.
 */
struct lustre_lnet_network_descr {
	__u32 nw_id;
    struct cfs_expr_list *nw_expr;
};
 
/*
 * lustre_lnet_network_rule
 *   network rule
 *      nwr_link - link on rule list
 *      nwr_descr - network descriptor
 *      nwr_priority - priority of the rule.
 * 		nwr_id - ID of the rule assigned while deserializing if not already assigned.
 */
struct lustre_lnet_network_rule {
	struct list_head nwr_link;
	struct lustre_lnet_network_descr nwr_descr;
	__u32 nwr_priority;
	__u32 nwr_id
};
 
/*
 * lustre_lnet_nid_range_descr
 *   nidr_expr - expression describing the IP part of the NID
 *   nidr_nw - a description of the network
 */
struct lustre_lnet_nidr_range_descr {
	struct list_head nidr_expr;
    struct lustre_lnet_network_descr nidr_nw;
};

/*
 * lustre_lnet_nidr_range_rule
 *  Rule for the nid range.
 *     nidr_link - link on the rule list
 *     nidr_descr - descriptor of the nid range
 *     priority - priority of the rule
 */
struct lustre_lnet_nidr_range_rule {
    struct list_head nidr_link;
	struct lustre_lnet_nidr_range_descr nidr_descr;
	int nidr_priority;
	bool nidr_local;
};

/*
 * lustre_lnet_p2p_rule
 *  Rule for the peer to peer.
 *     p2p_link - link on the rule list
 *     p2p_src_descr - source nid range
 *     p2p_dst_descr - destination nid range
 *     priority - priority of the rule
 */
struct lustre_lnet_p2p_rule {
    struct list_head p2p_link;
	struct lustre_lnet_nidr_range_descr p2p_src_descr;
	struct lustre_lnet_nidr_range_descr p2p_dst_descr;
	int priority;
};

...

Code Block
enum lnet_sel_rule_type {
	LNET_SEL_RULE_NET = 0,
	LNET_SEL_RULE_NID,
	LNET_SEL_RULE_P2P
};
 
struct lnet_expr {
	__u32 ex_lo;
	__u32 ex_hi;
	__u32 ex_stride;	
};
 
struct lnet_net_descr {
	__u32 nsd_net_id;
	struct lnet_expr nsd_expr;
};
 
struct lnet_nid_descr {
	struct lnet_expr nir_ip[4];
	struct lnet_net_descr nir_net;
};
 
struct lnet_ioctl_net_rule {
	struct lnet_net_descr nsr_descr
	__u32 nsr_prio;
	__u32 nsr_id
};

struct lnet_ioctl_nid_rule {
	struct lnet_nid_descr nir_descr;
	__32 nir_prio;
	__u32 nir_id;
	bool nir_local;
};
 
sturct lnet_ioctl_net_p2p_rule {
	struct lnet_nid_descr p2p_src_descr;
	struct lnet_nid_descr p2p_dst_descr;
	__u32 p2p_prio;
	__u32 p2p_id;
};
 
/* 
 * lnet_ioctl_rule_blk
 *  describes a set of rules of the same type to transfer to the kernel.
 *		rule_hdr - header information describing the total size of the transfer
 * 		rule_type - type of rules included
 * 		rule_size - size of each individual rule. Can be used to check backwards compatibility
 * 		rule_count - number of rules included in the bulk.
 * 		rule_bulk - pointer to the user space allocated memory containing the rules.
 */
struct lnet_ioctl_rule_blk {
	struct libcfs_ioctl_hdr rule_hdr;
	enum lnet_sel_rule_type rule_type;
    __u32 rule_size;
	__u32 rule_count;
	void __user *rule_bulk;
};

...

The net which matches the rule will be assigned the priority defined in the rule.

NID Rule

The local_ni or the peer_ni which match that NID will If the local flag is set then attempt to match the local_nis otherwise attempt to match the peer_nis. The NI matched shall be assigned the priority defined in the rule.

...