It is sometimes desirable to fine tune the selection of local/remote NIs used for communication. For example currently if there are two networks an OPA and a MLX network, both will be used. Especially if the traffic volume is low the credits criteria will be equivalent between the nodes, and both networks will be used in round robin. However, the user might want to use one network for all traffic and keep the other network free unless the other network goes down.
User Defined Selection Policies (UDSP) will allow this type of control.
UDSPs are configured from lnetctl via either command line or YAML config files and then passed to the kernel. Policies are applied to all local networks and remote peers then stored in the kernel. During the selection process the policies are examined as part of the selection algorithm.
Outlined below are the UDSP rule types
These rules define the relative priority of the networks against each other. 0 is the highest priority. Networks with higher priorities will be selected during the selection algorithm, unless the network has no healthy interfaces. If there exists an interface on another network which can be used and its healthier than any which are available on the current network, then that one will be used. Health will always trump all other criteria.
lnetctl policy add --src *@<net type> --<action-type> <context dependent value> ex: lnetctl policy add --src *@o2ib1 --priority 0 |
These rules define the relative priority of individual NIDs. 0 is the highest priority. Once a network is selected the NID with the highest priority is preferred. Note that NID priority is prioritized below health. For example, if there are two NIDs, NID-A and NID-B. NID-A has higher priority but lower health value, NID-B will still be selected. In that sense the policies are there as a hint to guide the selection algorithm.
lnetctl policy add --src <ip>@<net type> --<action-type> <context dependent value> ex: lnetctl policy add --src 10.10.10.2@o2ib1 --priority 1 |
These rules define the relative priority of paths. 0 is the highest priority. Once a destination NID is selected the source NID with the highest priority is selected to send from.
lnetctl policy add --src <ip>@<net type> --dst <ip>@<net type> --<action-type> <context dependent value> ex: lnetctl policy add --src 10.10.10.2@o2ib1 --dst 10.10.10.4@o2ib1 --priority 1 |
Router Rules define which set of routers to use. When defining a network there could be paths which are more optimal than others. To have more control over the path traffic takes, admins configure interfaces on different networks, and split up the router pools among the networks. However, this results in complex configuration, which is hard to maintain and error prone. It is much more desirable to configure all interfaces on the same network, and then define which routers to use when sending to a remote peer. Router Rules allow this functionality
lnetctl policy add --dst <ip>@<net type> --rte <ip>@<net type> --<action-type> <context dependent value> ex: lnetctl policy add --dst 10.10.10.2@o2ib3 --rte 10.10.10.[5-8]@o2ib --priority 0 |
Below is the command like syntax for managing UDSPs
# Adding a local network udsp # if multiple local networks are available, each one can have a priority. # The one with the highest priority is preferred lnetctl policy add --src *@<net type> --<action type> <action context sensitive value> --idx <value> --src: is defined in ip2nets syntax. '*@<net type>' syntax indicates the network. This is not to be confused with '*.*.*.*'@<net type>' which indicates all NIDs in this network. --<action type>: 'priority' is the only implemented action type --<action context sensitive value>: is a value specific to the action type. For 'priority' it's a value for [0 - 255] --idx: The index of where to insert the rule. If it's larger than the policy list it's appended to the end of the list. If not specified the default behavior is to append to the end of the list # Adding a local NID udsp # After a local network is chosen. If there are multiple NIs in the network the # one with highest priority is preferred. lnetct policy add --src <Address descriptor>@<net type> --<action type> <action context sensitive value> --idx <value> --src: the address descriptor defined in ip2nets syntax as described in the manual <net type> is something like: tcp1, o2ib2 --<action type>: 'priority' is the only implemented action type --<action context sensitive value>: is a value specific to the action type. For 'priority' it's a value for [0 - 255] --idx: The index of where to insert the rule. If it's larger than the policy list it's appended to the end of the list. If not specified the default behavior is to append to the end of the list # Adding a remote NID udsp # When selecting a peer NID select the one with the highest priority. lnetct policy add --dst <Address descriptor>@<net type> --<action type> <action context sensitive value> --idx <value> --dst: the address descriptor defined in ip2nets syntax as described in the manual <net type> is something like: tcp1, o2ib2 --<action type>: 'priority' is the only implemented action type --<action context sensitive value>: is a value specific to the action type. For 'priority' it's a value for [0 - 255] --idx: The index of where to insert the rule. If it's larger than the policy list it's appended to the end of the list. If not specified the default behavior is to append to the end of the list # Adding a NID pair udsp # When this rule is flattented the local NIDs which match the rule are added on a list # on the peer NIs matching the rule. When selecting the peer NI, the one with the # local NID being used on its list is preferred. lnetct policy add --src <Address descriptor>@<net type> --dst <Address descriptor>@<net type> --idx <value> --src: the address descriptor defined in ip2nets syntax as described in the manual <net type> is something like: tcp1, o2ib2 --dst: the address descriptor defined in ip2nets syntax as described in the manual <net type> is something like: tcp1, o2ib2. Destination NIDs can be local or remote. --idx: The index of where to insert the rule. If it's larger than the policy list it's appended to the end of the list. If not specified the default behavior is to append to the end of the list # Adding a Router udsp # similar to the NID pair udsp. The router NIDs matching the rule are added on a list # on the peer NIs matching the rule. When sending to a remote peer, the router which # has its nid on the peer NI list is preferred. lnetct policy add --dst <Address descriptor>@<net type> --rte <Address descriptor>@<net type> --idx <value> --dst: the address descriptor defined in ip2nets syntax as described in the manual <net type> is something like: tcp1, o2ib2 --rte: the address descriptor defined in ip2nets syntax as described in the manual <net type> is something like: tcp1, o2ib2. --idx: The index of where to insert the rule. If it's larger than the policy list it's appended to the end of the list. If not specified the default behavior is to append to the end of the list # show all policies in the system. # the policies are dumped in YAML form. # Each policy is assigned an index. # The index is part of the policy YAML block lnetctl policy show # to delete a policy the index must be specified. # The normal behavior then is to firsh show the list of policies # grab the index and use it in the delete command. lnetctl policy del --idx <value> # generally, the syntax is as follows lnetctl policy <add | del | show> --src: ip2nets syntax specifying the local NID to match --dst: ip2nets syntax specifying the remote NID to match --rte: ip2nets syntax specifying the router NID to match --priority: Priority to apply to rule matches --idx: Index of where to insert the rule. By default it appends to the end of the rule list |
As of the time of this writing only "priority" action shall be implemented. However, it is feasible in the future to implement different actions to be taken when a rule matches. For example, we can implement a "redirect" action, which redirects traffic to another destination. Yet another example is "lawful intercept" or "mirror" action, which mirrors messages to a different destination. This might be useful for keeping a standby server updated with all information going to the primary server. A lawful intercept action allows personnel authorized by a Law Enforcement Agency (LEA) to intercept file operations from targeted clients and send the file operations to an LI Mediation Device.
udsp:
- idx: <unsigned int>
src: <ip>@<net type>
dst: <ip>@<net type>
rte: <ip>@<net type>
action:
- priority: <unsigned int> |
There are three main operations which can be carried out on UDSPs either from the command line or YAML configuration: add, delete, show.
The UI allows adding a new rule. With the use of the idx optional parameter, the admin can specifiy where in the rule chain the new rule should be added. By default the rule is appended to the list. Any other value will result in inserting the rule in that position.
When a new UDSP is added the entire UDSP set is re-evaluated. This means all Nets, NIs and peer NIs in the systems are traversed and the rules re-applied. This is an expensive operation, but given that UDSP management should be a rare operation, it shouldn't be a problem.
The UI allows deleting an existing UDSP using its index. The index can be shown using the show command. When a UDSP is deleted the entire UDSP set are re-evaluated. The Nets, NIs and peer NIs are traversed and the rules re-applied..
The UI allows showing existng UDSPs. The format of the YAML output is as follows:
udsp:
- idx: <unsigned int>
src: <ip>@<net type>
dst: <ip>@<net type>
rte: <ip>@<net type>
action:
- priority: <unsigned int> |
All policies are stored in kernel space. All logic to add, delete and match policies will be implemented in kernel space. This complicates the kernel space processing. Arguably, policy maintenance logic is not core to LNet functionality. What is core is the ability to select source and destination networks and NIDs in accordance with user definitions. However, the kernel is able to manage policies much easier and with less potential race conditions than user space.
UDSPs are comprised of two parts:
The matching rule is what's used to match a NID or a network. The action is what's applied when the rule is matched.
A rule can be uniquely identified using an internal ID which is assigned by the LNet module when a rule is added and returned to the user space when the UDSPs are shown.
UDSPs shall be defined by administrators either via LNet command line utility, lnetctl, or via YAML configuration file. lnetctl parses the UDSP and stores it in an intermediary format, which will be flattened and passed down to the kernel LNet module. LNet shall store these UDSPs on a policy list. Once policies are added to LNet they will be applied on existing networks, NIDs and routers. The advantage of this approach is that UDSPs are not strictly tied to the internal constructs, IE networks, NIDs or routers, but can be applied whenever the internal constructs are created and if the internal constructs are deleted then they remain and can be automatically applied at a future time.
This makes configuration easy since a set of UDSPs can be defined, like "all IB networks priority 1", "all Gemini networks priority 2", etc, and when a network is added, it automatically inherits these rules.
Peers are normally not created explicitly by the administrators. The ULP requests to send a message to a peer or the node receives an unsolicited message from a peer which results in creating a peer construct in LNet. It is feasible, especially for router policies, to have a UDSP which associates a set of clients with in a specific range with a set of optimal routers. Having the policies stored and matched in kernel aids in fulfilling this requirement.
Performance needs to be taken into account with this feature. It is not feasible to traverse the policy lists on every send operation. This will add unnecessary overhead. When rules are applied they have to be "flattened" to the constructs they impact. For example, a Network Rule is added as follows: o2ib priority 0. This rule gives priority for using o2ib network for sending. A priority field in the network will be added. This will be set to 0 for the o2ib network. As we traverse the networks in the selection algorithm, which is part of the current code, the priority field will be compared. This is a more optimal approach than examining the policies on every send to see if it we get any matches.
It is important to define the order of rule operations, when there are multiple rules that apply to the same construct.
The order is defined by the selection algorithm logical flow:
/* lnet structure which will keep a list of UDSPs */
struct lnet {
...
list_head ln_udsp_list;
...
}
/* each NID range is defined as net_id and an ip range */
struct lnet_ud_nid_descr {
__u32 ud_net_id;
list_head ud_ip_range;
}
/* UDSP action types */
enum lnet_udsp_action_type {
EN_LNET_UDSP_ACTION_PRIORITY = 0,
EN_LNET_UDSP_ACTION_NONE = 1,
}
/*
* a UDSP rule can have up to three user defined NID descriptors
* - src: defines the local NID range for the rule
* - dst: defines the peer NID range for the rule
* - rte: defines the router NID range for the rule
*
* An action union defines the action to take when the rule
* is matched
*/
struct lnet_udsp {
list_head udsp_on_list;
__u32 idx;
lnet_ud_nid_descr *udsp_src;
lnet_ud_nid_describe *udsp_dst;
lnet_ud_nid_descr *udsp_rte;
enum lnet_udsp_action_type udsp_action_type;
union udsp_action {
__u32 udsp_priority;
};
}
/* The rules are flattened in the LNet structures as shown below */
struct lnet_net {
...
/* defines the relative priority of this net compared to others in the system */
__u32 net_priority;
...
}
struct lnet_remotenet {
...
/* defines the relative priority of the remote net compared to other remote nets */
__u32 lrn_priority;
...
}
struct lnet_ni {
...
/* defines the relative priority of this NI compared to other NIs in the net */
__u32 ni_priority;
...
}
struct lnet_peer_ni {
...
/* defines the relative peer_ni priority compared to other peer_nis in the peer */
__u32 lpni_priority;
/* defines the list of local NID(s) (>=1) which should be used as the source */
union lpni_pref {
lnet_nid_t nid;
lnet_nid_t *nids;
}
/*
* defines the list of router NID(s) to be used when sending to this peer NI
* if the peer NI is remote
*/
lnet_nid_t *lpni_rte_nids;
...
}
/* UDSPs will be passed to the kernel via IOCTL */
#define IOC_LIBCFS_ADD_UDSP _IOWR(IOC_LIBCFS_TYPE, 106, IOCTL_CONFIG_SIZE)
#define IOC_LIBCFS_DEL_UDSP _IOWR(IOC_LIBCFS_TYPE, 107, IOCTL_CONFIG_SIZE)
#define IOC_LIBCFS_GET_UDSP _IOWR(IOC_LIBCFS_TYPE, 108, IOCTL_CONFIG_SIZE)
#define IOC_LIBCFS_GET_UDSP_SIZE _IOWR(IOC_LIBCFS_TYPE, 108, IOCTL_CONFIG_SIZE) |
/*
* api-ni.c will be modified to handle adding a UDSP
* All UDSP operations are done under mutex and exclusive spin
* lock to avoid constructs changing during application of the
* policies.
*/
int
LNetCtl(unsigned int cmd, void *arg)
{
...
case IOC_LIBCFS_ADD_UDSP: {
struct lnet_ioctl_config_udsp *config_udsp = arg;
mutex_lock(&the_lnet.ln_api_mutex);
/*
* add and do initial flattening of the UDSP into
* internal structures.
*/
rc = lnet_add_and_flatten_udsp(config_udsp);
mutex_unlock(&the_lnet.ln_api_mutex);
return rc;
}
case IOC_LIBCFS_DEL_UDSP: {
struct lnet_ioctl_config_udsp *del_udsp = arg;
mutex_lock(&the_lnet.ln_api_mutex);
/*
* delete the rule identified by index
*/
rc = lnet_del_udsp(del_udsp->udsp_idx);
mutex_unlock(&the_lnet.ln_api_mutex);
return rc;
}
case IOC_LIBCFS_GET_UDSP_SIZE: {
struct lnet_ioctl_config_udsp *get_udsp = arg;
mutex_lock(&the_lnet.ln_api_mutex);
/*
* get the UDSP size specified by idx
*/
rc = lnet_get_udsp_num(get_udsp);
mutex_unlock(&the_lnet.ln_api_mutex);
return rc
}
case IOC_LIBCFS_GET_UDSP: {
struct lnet_ioctl_config_udsp *get_udsp = arg;
mutex_lock(&the_lnet.ln_api_mutex);
/*
* get the udsp at index provided. Return -ENOENT if
* no more UDSPs to get
*/
rc = lnet_add_udsp(get_udsp, get_udsp->udsp_idx);
mutex_unlock(&the_lnet.ln_api_mutex);
return rc
}
...
} |
The handler for the IOC_LIBCFS_ADD_RULES will perform the following operations:
Application of the rules will be done under api_mutex_lock and the exclusive lnet_net_lock to avoid having the peer or local net lists changed while the rules are being applied.
The rules are iterated and applied whenever:
IOC_LIBCFS_DEL_UDSPThe handler for IOC_LIBCFS_DEL_RULES will
When the updated rule set is applied all traces of deleted or modified rules are removed from the LNet constructs.
Return the size of the UDSP specified by index.
IOC_LIBCFS_GET_UDSPThe handler for the IOC_LIBCFS_GET_RULES will serialize the rules on the UDSP list.
The GET call is done in two stages. First it makes a call to the kernel to determine the size of the UDSP at index. User space then allocates a block big enough to accommodate the UDSP and makes another call to actually get the UDSP.
User space iteratively calls the UDSPs until there are no more UDSPs to get.
User space prints the UDSPs in the YAML format specified here.
TODO: Another option is to have IOC_LIBCFS_GET_UDSP_NUM, which gets the total size needed for all UDSPs, and then user space can make one call to get all the UDSPs. However, this complicates the marshaling function. The user space will also need to handle cases where the size of the UDSPs are too large for one call. The above proposal will do more iterations to get all the UDSPs, but the code should be simpler. And since the number of UDSPs are expected to be small, the above proposal should be fine.
/*
* select an NI from the Nets with highest priority
*/
struct lnet_ni *
lnet_find_best_ni_on_local_net(struct lnet_peer *peer, int md_cpt)
{
...
list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_peer_nets) {
...
struct lnet_net *net;
/* consider only highest priority peer_net */
peer_net_prio = peer_net->lpn_priority;
if (peer_net_prio > best_peer_net_prio)
continue;
else if (peer_net_prio < best_peer_net_prio)
peer_net_prio = best_peer_net_prio;
net = lnet_get_net_locked(peer_net->lpn_net_id);
if (!net)
continue
/*
* look only at the Nets with the highest priority and disregard
* nets which have lower priority. Nets with equal priority are
* examined and the best_ni is selected from amongst them.
*/
net_prio = net->net_priority;
if (net_prio > best_net_prio)
continue;
else if (net_prio < best_net_prio) {
best_net_prio = net_prio;
best_ni = NULL;
}
best_ni = lnet_find_best_ni_on_spec_net(best_ni, peer,
best_peer_net, md_cpt, false);
...
}
...
}
/*
* select the NI with the highest priority
*/
static struct lnet_ni *
lnet_get_best_ni(struct lnet_net *local_net, struct lnet_ni *best_ni,
struct lnet_peer *peer, struct lnet_peer_net *peer_net,
int md_cpt)
{
...
ni_prio = ni->ni_priority;
if (ni_fatal) {
continue;
} else if (ni_healthv < best_healthv) {
continue;
} else if (ni_healthv > best_healthv) {
best_healthv = ni_healthv;
if (distance < shortest_distance)
shortest_distance = distance;
/*
* if this NI is lower in priority than the one already set then discard it
* otherwise use it and set the best priority so far to this NI's.
*/
} else if ni_prio > best_ni_prio) {
continue;
} else if (ni_prio < best_ni_prio)
best_ni_prio = ni_prio;
}
...
}
/*
* When a UDSP rule associates local NIs with remote NIs, the list of local NIs NIDs
* is flattened to a list in the associated peer_NI. When selecting a peer NI, the
* peer NI with the corresponding preferred local NI is selected.
*/
bool
lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, lnet_nid_t nid)
{
...
}
/*
* select the peer NI with the highest priority first and then check
* if it's preferred.
*/
static struct lnet_peer_ni *
lnet_select_peer_ni(struct lnet_send_data *sd, struct lnet_peer *peer,
struct lnet_peer_net *peer_net)
{
...
ni_is_pref = lnet_peer_is_pref_nid_locked(lpni, best_ni->ni_nid);
lpni_prio = lpni->lpni_priority;
if (lpni_healthv < best_lpni_healthv)
continue;
/*
* select the NI with the highest priority.
*/
else if lpni_prio > best_lpni_prio)
continue;
else if (lpni_prio < best_lpni_prio)
best_lpni_prio = lpni_prio;
/*
* select the NI which has the best_ni's NID in its preferred list
*/
else if (!preferred && ni_is_pref)
preferred = true;
...
}
static int
lnet_handle_find_routed_path(struct lnet_send_data *sd,
lnet_nid_t dst_nid,
struct lnet_peer_ni **gw_lpni,
struct lnet_peer **gw_peer)
{
...
lpni = lnet_find_peer_ni_locked(dst_nid);
peer = lpni->lpni_net->lpn_peer;
list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_peer_nets) {
peer_net_priority = peer_net->lpn_priority;
if (peer_net_priority > peer_net_best_priority)
continue;
else if (peer_net_priority < peer_net_best_priority)
peer_net_best_priority = peer_net_priority;
lpni = NULL;
while ((lpni = lnet_get_next_peer_ni_locked(peer, peer_net, lpni)) {
/* find best gw for this lpni */
lpni_prio = lpni->lpni_priority;
if (lpni_prio > lpni_best_prio)
continue;
else if (lpni_prio < lpni_best_prio)
lpni_best_prio = lpni_prio;
/*
* lnet_find_route_locked will be changed to consider the list of
* gw NIDs on the lpni
*/
gw = lnet_find_route_locked(NULL, lpni, sd->sd_rtr_nid);
...
/*
* if gw is MR then select best_NI. Increment the sequence number of
* the gw NI for Round Robin selection.
*/
...
}
}
...
} |
After a UDSP is parsed in user space it needs to be marshalled and sent to the kernel. The kernel will de-marshal the data and store it in its own data structures. The UDSP is formed of the following pieces of information:
The data flow of a UDSP looks as follows:
![]()
The DLC library will provide the outlined APIs to expose a way to create, delete and show rules.
Once rules are created and stored in the kernel, they are assigned an ID. This ID is returned and shown in the show command, which dumps the rules. This ID can be referenced later to delete a rule. The process is described in more details below.
/* * lustre_lnet_udsp_str_to_action * Given a string format of the action, convert it to an enumerated type * action - string format for the action. */ enum lnet_udsp_action_type lustre_lnet_udsp_str_to_action(char *action); /* * lustre_lnet_add_udsp * Add a selection policy. * src - source NID descriptor * dst - destination NID descriptor * rte - router NID descriptor * type - action type * action - union of the action * idx - the index to delete * seq_no - sequence number of the request * err_rc - [OUT] struct cYAML tree describing the error. Freed by * caller */ int lustre_lnet_add_udsp(char *src, char *dst, char *rte, enum lnet_udsp_action_type type, union action, unsigned int idx, int seq_no, struct cYAML **err_rc); /* * lustre_lnet_del_udsp * Delete a net selection policy. * idx - the index to delete * seq_no - sequence number of the request * err_rc - [OUT] struct cYAML tree describing the error. Freed by * caller */ int lustre_lnet_del_udsp(int idx, int seq_no, struct cYAML **err_rc); /* * lustre_lnet_show_udsp * Show configured net selection policies. * seq_no - sequence number of the request * show_rc - [OUT] struct cYAML tree containing the UDSPs * err_rc - [OUT] struct cYAML tree describing the error. Freed by * caller */ int lustre_lnet_show_udsp(int seq_no, struct cYAML **show_rc, struct cYAML **err_rc); |
/* each NID range is defined as net_id and an ip range */
struct lnet_ud_nid_descr {
__u32 ud_net_id;
list_head ud_ip_range;
}
/* UDSP action types */
enum lnet_udsp_action_type {
EN_LNET_UDSP_ACTION_PRIORITY = 0,
EN_LNET_UDSP_ACTION_NONE = 1,
}
/*
* a UDSP rule can have up to three user defined NID descriptors
* - src: defines the local NID range for the rule
* - dst: defines the peer NID range for the rule
* - rte: defines the router NID range for the rule
*
* An action union defines the action to take when the rule
* is matched
*/
struct lnet_udsp {
list_head udsp_on_list;
__u32 idx;
lnet_ud_nid_descr *udsp_src;
lnet_ud_nid_describe *udsp_dst;
lnet_ud_nid_descr *udsp_rte;
enum lnet_udsp_action_type udsp_action_type;
union udsp_action {
__u32 udsp_priority;
};
} |
struct cfs_range_expr {
struct list_head re_link;
__u32 re_lo;
__u32 re_hi;
__u32 re_stride;
};
struct lnet_ioctl_udsp {
__u32 iou_idx;
enum lnet_udsp_action_type iou_action_type
union action iou_action {
__u32 priority;
}
__u32 iou_src_dot_expr_count;
__u32 iou_dst_dot_expr_count;
__u32 iou_rte_dot_expr_count;
char iou_bulk[0];
}; |
The address is expressed as a list of cfs_range_expr. These need to be marshalled. For IP address there are 4 of these structures. Other type of addresses can have a different number. As an example, gemini will only have one. The corresponding iou_[src|dst|rte]_dot_expr_count is set to the number of expressions describing the address. Each expression is then flattened in the structure. They have to be flattened in the order defined: SRC, DST, RTE.
The kernel will receive the marshalled data and will form its internal structures. The functions to marshal and de-marshal should be straight forward. Note that user space and kernel space use the same structures. These structure will be defined in a common location. For this reason the functions to marshal and de-marshal will be shared.
Common functions that can be called from user space and kernel space will be created to marshal and de-marshal the UDSPs:
/* * lnet_get_udsp_size() * Given the UDSP return the size needed to flatten the UDSP */ int lnet_get_udsp_size(struct lnet_udsp *udsp); /* * lnet_udsp_marshal() * Marshal the UDSP pointed to by udsp into the memory block that is provided. In order for this * API to work in both Kernel and User space the bulk pointer needs to be passed in. When this API * is called in the kernel, it is expected that the bulk memory is allocated in userspace. This API * is intended to be called from the kernel to marshal the rules before sending it to user space. * It will also be called from user space to marshal the udsp before sending to the kernel. * udsp [IN] - udsp to marshal * bulk_size [IN] - size of bulk. * bulk [OUT] - allocated block of memory where the serialized rules are stored. */ int lnet_udsp_marshal(struct lnet_udsp *udsp, __u32 *bulk_size, void __user *bulk); /* * lnet_udsp_demarshal() * Given a bulk containing a single UDSP, demarshal and populate the udsp structure provided * bulk [IN] - memory block containing serialized rules * bulk_size [IN] - size of bulk memory block * udsp [OUT] - preallocated struct lnet_udsp */ int *lnet_udsp_demarshal(void __user *bulk, __u32_bulk_size, struct lnet_udsp *usdp); |
cfg-100, cfg-105, cfg-110, cfg-115, cfg-120, cfg-125, cfg-130, cfg-135, cfg-140, cfg-160, cfg-165
This section will be updated as development continues. The goal is to update the unit test cases with as much detail as possible. It might be better to have pointers to the actual test scripts in the test case table below. For now an example of a pseudo coded test script is outlined below.
This section defines common functions which will be used in many test cases. They are defined in pseudo python
def add_verify_net(net_configs, destination) # all command should be executed on destination redirect_to_dest(destination) for cfg in net_configs: lnetctl net add --net cfg['net'] --if cfg['intf'] show_output = lnetctl net show if (cfg['net'] not in show_output) or (show_output[cfg['net']].if_name != cfg['intf']) return FAILED return SUCCESS def add_verify_policy(network_type, priority, destination) # all command should be executed on destination redirect_to_dest(destination) lnetctl policy add --src *@network_type --priority priority show_output = lnetctl policy show if (network_type not in show_output) or (show_output[network_type].priority != priority) return FAILED show_output = lnetctl net show --net network_type if (not show_output) or (show_output[network_type].priority != priority) return FAILED return SUCCESS generate_traffic(peer1, peer2) run_lnet_selftest(peer1, peer2) get_traffic_stats(peer1) # get traffic statistics and return verify_traffic_on(stats1, stats2, net) # make sure that the bulk of the traffic is on net |
| Policy | Test case | |
|---|---|---|
| Network Rule | Add and verify local network policy.
| |
Verify traffic goes over the network with the highest priority
| ||
| Verify traffic goes over the network with the healthiest local NI even though it might not be set to highest priority | ||
| Delete local network policy and verify it has been deleted | ||
| Verify traffic returns to normal pattern when network policy is deleted | ||
| Error handling: Add policy for non-existent network | ||
| Add and verify a remote network policy. IE messages will need to be routed to that network | ||
| Verify traffic is routed to the remote network with the highest priority | ||
| Verify traffic is routed to another available network given the highest priority remote network is not reachable. | ||
| Delete remote network policy and verify it has been deleted | ||
| Verify traffic returns to normal pattern when remote network policy is deleted. | ||
| Error handling: Add policy for non-existent remote network | ||
| NID Rules | Add and verify local NID rule | |
Verify traffic goes over the local NID with the highest priority | ||
| Verify traffic goes over the healthiest NID even if it has lower priority | ||
Delete NID policy and verify it has been deleted | ||
| Verify traffic goes back to regular pattern after NID policy is deleted. | ||
Error handling: Add policy for non-existent NID | ||
| Repeat the above tests for remote NID | ||
| NID Pair Rules | Add and verify NID Pair Rule → TODO: how do you verify that a NID Pair rule has been applied correctly. We need to show the preferred NID list in the show command. This also applies to Router Rules. | |
| Verify traffic goes over the preferred Local NIDs | ||
| Delete NID pair rule a and verify it has been deleted | ||
| Verify traffic goes back to regular pattern after NID Pair policy is deleted. | ||
| Error handling: Add a policy that don't match any local NIDs. This should be a no-op | ||
| Router Rules | Same set of tests as above but for routers | |
| Subsequent Addition | For each of the policy types, add a policy which doesn't match any thing currently configured. Verify that policy is added irregardless | |
Add an LNet construct (Net, NI, Route) which matches an existing policy. Verify that policy has been applied on construct → TODO: Show commands like net show, peer show, etc should be modified to show the result of the policy application. | ||
| Verify traffic adheres to policy | ||
| Delete LNet construct. Verify that policy remains. | ||
| Dynamic Policy Addition | Run traffic. For each of the policy types add a policy which should alter traffic Verify traffic patterns change when policy is added. | |
| Policy application order | Add all types of policies. They all should match and be applied. Verify. Run traffic. Verify that policies are applied on traffic in the order of operations defined here. | |
| Dynamic policy Deletion | Add all types of policies. Run traffic Verify that polices are applied on traffic in the order of operations defined. Delete the policy one at a time. Verify traffic pattern change with each policy deleted. |
If a node can be reached on two LNet networks, it is sometimes desirable to designate a fail-over network. Currently in lustre there is the concept of High Availability (HA) which allows servicenode nids to be defined as described in the lustre manual section 11.2. By using the syntax described in that section, two nids to the same peer can also be defined. However, this approach suffers from current limitation in the lustre software, where the NIDs are exposed to layers above LNet. It is ideal to keep network failures handling contained within LNet and only let lustre worry about defining HA.
Given this it is desirable to have two LNet networks defined on a node, each could have multiple interfaces. Then have a way to tell LNet to always use one network until it is no longer available, IE: all interfaces in that network are down.
In this manner we separate the functionality of defining fail-over pairs from defining fail-over networks.
In a scenario where servers are being upgraded with new interfaces to be used in Multi-Rail, it's possible to add interfaces, for example MLX-EDR interfaces to the server. The user might want to continue making the existing QDR clients use the QDR interface, while new clients can use the EDR interface or even both interfaces. By specifying rules on the clients that prefer specific interfaces this behaviour can be achieved.
![]()
This is a finer tuned method of specifying an exact path, by not only specifying a priority to a local interface or a remote interface, but by specifying concrete pairs of interfaces that are most preferred. A peer interface can be associated with multiple local interfaces if necessary, to have a N:1 relationship between local interfaces and remote interfaces.
![]()
Refer to Olaf's LUG 2016/LAD 2016 PPT for more context.
![]()
Client sets A and B are all configured on the same LNet network, example o2ib. The servers are on a different LNet network, o2ib2. But due to the underlying network topology it is more efficient to route traffic from Client set A over Router set A and Client set B over Router set B. The green links are wider than the red links. UDSPs can be configured on the clients to specify the preferred set of router NIDs.
Based on , there is a need to select an interface based on the destination portal type.
TODO: This will need a new type of policy. However, I believe we might be crossing a gray area here. LNet will need to have an "understanding" about portal types in a sense. Another suggested solution proposed by Andreas Dilger: Why not just configure the MDS with NID1 and the OSS with NID2, and the client won't even know that they are on the same node?
https://www.ece.tufts.edu/~karen/classes/final_presentation/Dragonfly_Topology_Long.pptx