You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

https://review.whamcloud.com/#/c/33182

More Details
LU-11292 lnet: Discover routers on first use

Discover routers on first use. This brings the behavior when
interacting with routers inline with when dealing with normal
peers.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I8527e41daf2f5f6ab5f04aac1285aaa6cc4ee594
  • lnet_initiate_peer_discovery()
    • This function initiates peer discovery for the passed in lpni and returns LNET_DC_WAIT{{
    • It is called when we want to discover a peer on first use.
    • It is called when we want to discover a gateway on first use.
  • lnet_handle_find_routed_path()
    • Call lnet_find_routed_locked() to find a gateway
    • If the gateway has not been discovered yet. Then discover it.
    • Increment the sequence number on the route only if the route is going to be used.
      • This helps in ensuring that the route sequence numbers remain sane.
  • lnet_find_route_locked()
    • returns the route to use use_route and the previous route prev_route
    • It not longer increments the sequence number of the route since finding the route doesn't equate to using the route
      • Incrementing the route sequence number is delegated to the calling function.
https://review.whamcloud.com/#/c/33183/
LU-11298 lnet: use peer for gateway

The routing code uses peer_ni for a gateway. However with Mulit-Rail
a gateway could have multiple interfaces on several different
networks. Instead of using a single peer_ni as the gateway we should
be using the peer and let the MR selection code select the best
peer_ni to send to.

This patch moves the gateway from peer to peer_ni. Much of the
code needs to be rewritten in the following patches to account
for that change. This patch disables the routing features by
disabling the code to add/delete routes.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ia7dab552268c4a7fbd7b88122b9a95363d155fd7

The routing code will change quiet a bit so this patch removes most of the current routing code and then reintroduces it later.

This patch concentrates on switching the gateway from using lnet_peer_ni to using lnet_peer.

The design decision here is that a gateway is a node where LNet is started with the routing feature enabled. A gateway node can have multiple interfaces. In order to align routing with Multi-Rail, then the code should be first selecting a gateway peer, then using multi-rail to select the best peer_ni on that gateway to use.

The following functions are removed in this patch and will be introduced in later patches

lnet_is_route_alive()
lnet_rtr_addref_locked()
lnet_rtr_decref_locked()
lnet_shuffle_seed()
lnet_add_route_to_rnet()
lnet_add_route() # the bulk of the code is removed
lnet_check_routes() # the bulk of the code is removed
lnet_del_route() # the bulk of the code is removed
lnet_parse_rc_info() # the bulk of the code is removed
lnet_destroy_rc_data()
lnet_update_rc_data_locked()
lnet_router_check_interval()
lnet_ping_router_locked()
lnet_prune_rc_data()
lnet_compare_peers()

Key fields are moved from lnet_peer_ni to lnet_peer or deleted including:

lpni_rtrq # moved
lpni_rtr_list # moved
lpni_ping_notsent # deleted
lpni_ping_timestamp # deleted
lpni_ping_deadline # deleted
lpni_rtr_refcount # moved
lpni_healthy # this is a remnant code which is cleaned up
lpni_routes # moved

The lnet_route structure is changed in the following way:

struct lnet_peer *lr_gateway # this is now lnet_peer instead of lnet_peer_ni
__u32 lr_lnet
# it is no longer possible to determine the local network of the route by simply looking at the gateway peer, since the peer can have multiple interfaces on different networks. Therefore the route now must define the local network and remote network. This way we are able to select and compare routes properly.


The rest of the changes concentrate on removing the use of lnet_peer_ni as the gateway and replacing it with lnet_peer

In lib-move.c there are changes in both lnet_post_routed_recv_locked() and lnet_return_rx_credits_locked()

lnet_find_route_locked() is marked as "to be implemented". As a result lnet_handle_find_routed_path() which calls lnet_find_route_locked() is also incomplete due to removal of routing functionality. There are changes there, but the changes are mainly to avoid compilation problem. It will be re-implemented in a later patch.

Routing is disabled with this patch.

https://review.whamcloud.com/#/c/33184/
LU-11299 lnet: lnet_add/del_route()

Reimplemented lnet_add_route() and lnet_del_route() to use
the peer instead of the peer_ni.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I3734098a81ab18d1d74220c691d96a9b9817e6da

NOTES: lnet_check_routes() is removed in this patch. We should move it in its own patch against ticket: LU-10153. Since the previous patch removes a bunch of functions. The reason for removing lnet_check_routes() is that we no longer restrict multiple routes on the same remote network.

This patch re-implements the following functions, which now use lnet_peer instead of lnet_peer_ni for the gateway

lnet_rtr_addref_locked()
lnet_rtr_decref_locked()
lnet_shuffle_seed()
lnet_add_route_to_rnet()
lnet_add_route()
lnet_del_route_from_rnet()
lnet_del_route()
https://review.whamcloud.com/#/c/33185/
LU-11300 lnet: router aliveness

A route is considered alive if the gateway is able to route
messages from the local to the remote net. That means that
at least one of the network interfaces on the remote net of
the gateway is viable.

Introduced the concept of sensitivity percentage. This defaults
to 100%. It holds a dual meaning:
1. A route is considered alive if at least one of the its interfaces'
health is >= LNET_MAX_HEALTH_VALUE * router_sensitivity_percentage
100 means at least one interface has to be 100% healthy
2. On a router consider a peer_ni dead if its health is not at least
LNET_MAX_HEALTH_VALUE * router_sensitivity_percentage.
100% means the interface has to be 100% healthy.

Re-implemented lnet_notify() to decrement the health of the
peer interface if the LND reports a failure on that peer.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ie97561fb70bf6a558bc90fa9266a6ba38fa3d293




  • No labels