...
Refer to the above power point for further discussion on the dragon-fly topology.
Preferred local/remote NID pairs
This is a finer tuned method of specifying an exact path, by not only specifying a priority to a local interface or a remote interface, but by specifying concrete pairs of interfaces that are most preferred. A peer interface can be associated with multiple local interfaces if necessary, to have a N:1 relationship between local interfaces and remote interfaces.
DLC APIs
The DLC library will provide the outlined APIs to expose a way to create, delete and show rules.
Example
TBD: I'm thinking in a topology such as the one represented above, the sys admin would configure the routing properly, such that messages heading to a particular IP destination on a different group would get routed to the correct edge router, and from there to the destination group. When LNet is layered on top of this topology there will be no need to explicitly specify a rule, as all necessary routing rules will be defined in the routing tables of the kernel. The assumption here is that Infinitband IB over IP would obey the standard linux routing rules.
Preferred local/remote NID pairs
This is a finer tuned method of specifying an exact path, by not only specifying a priority to a local interface or a remote interface, but by specifying concrete pairs of interfaces that are most preferred. A peer interface can be associated with multiple local interfaces if necessary, to have a N:1 relationship between local interfaces and remote interfaces.
DLC APIs
The DLC library will provide the outlined APIs to expose a way to create, delete and show rules.
| Code Block |
|---|
/*
* lustre_lnet_add_net_sel_pol
* Add a net selection policy. If there already exists a
* policy for this net |
| Code Block |
/*
* lustre_lnet_add_net_sel_pol
* Add a net selection policy. If there already exists a
* policy for this net it will be updated.
* net - Network for the selection policy
* priority - priority of the rule
*/
int lustre_lnet_add_net_sel_pol(char *net, int priority);
/*
* lustre_lnet_del_net_sel_pol
* Delete a net selection policy.
* net - Network for the selection policy
* id - [OPTIONAL] ID of the policy. This can be retrieved via a show command.
*/
int lustre_lnet_del_net_sel_pol(char *net, int id);
/*
* lustre_lnet_show_net_sel_pol
* Show configured net selection policies.
* net - filter on the net provided.
*/
int lustre_lnet_show_net_sel_pol(char *net);
/*
* lustre_lnet_add_nid_sel_pol
* Add a nid selection policy. If there already exists a
* policy for this nid it will be updated. NIDs can be either
* local NIDs or remote NIDs.
* nid - NID for the selection policy
* priority - priority of the rule
*/
int lustre_lnet_add_nid_sel_pol(char *nid, int priority);
/*
* lustre_lnet_del_nid_sel_pol
* Delete a nid selection policy.
* nid - NID for the selection policy
* id - [OPTIONAL] ID of the policy. This can be retrieved via a show command.
*/
int lustre_lnet_del_nid_sel_pol(char *nid, int id);
/*
* lustre_lnet_show_nid_sel_pol
* Show configured nid selection policies.
* nid - filter on the NID provided.
*/
int lustre_lnet_show_nid_sel_pol(char *nid);
/*
* lustre_lnet_add_nid_sel_pol
* Add a peer to peer selection policy. If there already exists a
* policy for the pair it will be updated.
* src_nid - source NID
* dst_nid - destination NID
* priority - priority of the rule
*/
int lustre_lnet_add_peer_sel_pol(char *src_nid, char *dst_nid, int priority);
/*
* lustre_lnet_del_peer_sel_pol
* Delete a peer to peer selection policy.
* src_nid - source NID
* dst_nid - destination NID
* id - [OPTIONAL] ID of the policy. This can be retrieved via a show command.
*/
int lustre_lnet_del_peer_sel_pol(char *src_nid, char *dst_nid, int id);
/*
* lustre_lnet_show_peer_sel_pol
* Show peer to peer selection policies.
* src_nid - [OPTIONAL] source NID. If provided the output will be filtered
* on this value.
* dst_nid - [OPTIONAL] destination NID. If provided the output will be filtered
* on this value.
*/
int lustre_lnet_show_peer_sel_pol(char *src_nid, char *dst_nid); |
...
| Code Block |
|---|
/*
* describes a network:
* nw_id: can be the base network name, ex: o2ib or a full network id, ex: o2ib3.
* nw_expr: an expression to describe the variable part of the network ID
* ex: tcp* - all tcp networks
* ex: tcp[1-5] - resolves to tcp1, tcp2, tcp3, tcp4 and tcp5.
*/
struct lustre_lnet_network_descr {
__u32 nw_id;
struct cfs_expr_list *nw_expr;
};
/*
* lustre_lnet_network_rule
* network rule
*/
struct lustre_lnet_network_rule {
struct lustre_lnet_network_descr nwr_descr;
int priority;
};
/*
* lustre_lnet_nid_range_descr
* nidr_expr - expression describing the IP part of the NID
* nidr_nw - a description of the network nwr_link - link on rule list
* nwr_descr - network descriptor
* priority - priority of the rule.
*/
struct lustre_lnet_nidrnetwork_range_descrrule {
struct list_head nidr*nwr_exprlink;
struct lustre_lnet_network_descr nidr_nwnwr_descr;
int priority;
};
struct/*
* lustre_lnet_nidrnid_range_rule {
struct lustre_lnet_nidr_range_descr nidr_descr;
int priority;
};
descr
* nidr_expr - expression describing the IP part of the NID
* nidr_nw - a description of the network
*/
struct lustre_lnet_p2pnidr_range_ruledescr {
struct lustre_lnet_nidr_range_descrlist_head nidr_src_descrexpr;
struct lustre_lnet_nidrnetwork_range_descr nidr_dst_descrnw;
int priority};
}; |
IOCTL Data structures
| Code Block |
|---|
struct lnet_expr {
__u32 ex_lo;
__u32 ex_hi;
__u32 ex_stride;
};
struct lnet_net_descr {
__u32 nsd_net_id;
struct lnet_expr nsd_expr;
};
struct lnet_nid_descr {
struct lnet_expr nir_ip[4];
struct lnet_net_descr nir_net;
};
struct lnet_ioctl_net_rule {
struct lnet_net_descr nsr_descr
__u32 nsr_prio;
};
struct lnet_ioctl_nid_rule {
struct lnet_nid_descr nir_descr;
__32 nir_prio
};
sturct lnet_ioctl_net_p2p_rule {
struct lnet_nid_descr p2p_src_descr;
struct lnet_nid_descr p2p_dst_descr;
__u32 p2p_prio;
}; |
Serialization/Deserialization
/*
* lustre_lnet_nidr_range_rule
* Rule for the nid range.
* nidr_link - link on the rule list
* nidr_descr - descriptor of the nid range
* priority - priority of the rule
*/
struct lustre_lnet_nidr_range_rule {
struct list_head *nidr_link;
struct lustre_lnet_nidr_range_descr nidr_descr;
int priority;
};
/*
* lustre_lnet_p2p_rule
* Rule for the peer to peer.
* p2p_link - link on the rule list
* p2p_src_descr - source nid range
* p2p_dst_descr - destination nid range
* priority - priority of the rule
*/
struct lustre_lnet_p2p_rule {
struct list_head *p2p_link;
struct lustre_lnet_nidr_range_descr p2p_src_descr;
struct lustre_lnet_nidr_range_descr p2p_dst_descr;
int priority;
}; |
IOCTL Data structures
| Code Block |
|---|
struct lnet_expr {
__u32 ex_lo;
__u32 ex_hi;
__u32 ex_stride;
};
struct lnet_net_descr {
__u32 nsd_net_id;
struct lnet_expr nsd_expr;
};
struct lnet_nid_descr {
struct lnet_expr nir_ip[4];
struct lnet_net_descr nir_net;
};
struct lnet_ioctl_net_rule {
struct lnet_net_descr nsr_descr
__u32 nsr_prio;
};
struct lnet_ioctl_nid_rule {
struct lnet_nid_descr nir_descr;
__32 nir_prio
};
sturct lnet_ioctl_net_p2p_rule {
struct lnet_nid_descr p2p_src_descr;
struct lnet_nid_descr p2p_dst_descr;
__u32 p2p_prio;
};
/*
* lnet_ioctl_rule_blk
* describes a set of rules of the same type to transfer to the kernel.
* rule_hdr - header information describing the total size of the transfer
* rule_type - type of rules included
* rule_size - size of each individual rule. Can be used to check backwards compatibility
* rule_bulk - pointer to the user space allocated memory containing the rules.
*/
struct lnet_ioctl_rule_blk {
struct libcfs_ioctl_hdr rule_hdr;
enum lnet_sel_rule_type rule_type;
__u32 rule_size;
void *rule_bulk;
}; |
Serialization/Deserialization
Both userspace and kernel space are going to store the rules in the data structures described above. However once userspace has parsed and stored the rules it'll need to serialize it and send it to the kernel.
The serialization process will use the IOCTL datastructures defined above. The process itself is straightforward. The rules as stored in the user space or the kernel are in a linked list. But each rule is of deterministic size and form. For example an IP is described as four struct cfs_range_expr structures. This can be translated into four struct lnet_expr structures.
On the receiving end the process is reversed to rebuild the linked lists.
Common functions that can be called from user space and kernel space will be created to serialize and deserialize the rules:
| Code Block |
|---|
TBD -
int lnet_sel_rule_serialize()
int lnet_sel_rule_deserialize() |
Policy Application
A new IOCTL will need to be added: IOC_LIBCFS_ADD_RULES and IOC_LIBCFS_GET_RULES.
The handler for the IOC_LIBCFS_ADD_RULES will perform the following operations:
- Rebuild the rules
- Iterate through all the local networks and apply the rules
- Iterate through all the peers and apply the rules.
- Store the rules
Application of the rules will be done under api_mutex_lock and the exclusive lnet_net_lock to avoid having the peer or local net lists changed while the rules are being applied.
There will be different lists one for each rule type. The rules are iterated and applied whenever:
- A local network interface is added.
- A remote peer/peer_net/peer_ni is added
...