Page tree
Skip to end of metadata
Go to start of metadata

Overview

Neil Brown notes: (recorded here for easier reference)

  1. lnet_nid_t doesn't change size. I think that would be too disruptive.
  2. We introduce a new LND: "v6tcp" (or possible "tcpng") which uses IPv6 addressing. This means that a network will require all nodes to be updated before any node can use IPv6. (This isn't essential, but at this stage seems easiest)
  3. The nid for "v6tcp" is not related to the IPv6 address - it is assigned separately and two-way lookup will be provided. The mechanism for assigning a nid is beyond the scope of LNet, but static config (e.g. via module parameter or ioctl) is one option.
  4. When a node receives an LNet message from a node on the same network, it records the source IPv6 address with the netid/nid for later use. This ensures that all routers know the address mapping for all active ip6tcp nodes on any of their networks (due to ping messages) and servers for higher level protocols (e.g. lustre mgs) will also learn the mapping. Any node receiving a request will be able to reply, because it cached the reply IP address.
  5. When a node wants to contact a peer on the same ip6tcp network, but doesn't already know the IPv6 address, it connects to a service that it does know -- possibly a router, possibly a mgs etc -- and uses a new request, specific to ipv6tcp, to request the address.
  6. When a node wants to make an initial connection to a router or a service, it is expected to have the IPv6 address, but not the nid. If the peer is on the same network, it establishes the connection with the known IPv6 and sends a message with a dest_nid set to the reserved value "0". The initial "Hello" will receive a reply with the correct nid as "src_nid", and that will be used for future messages.
  7. If the peer for an initial connection is on a different network, we will need something more involved. Probably a new LNet message which contains the target IPv6 address, and which will be forward by routers until it gets to a router that knows the answer. Again, the dest_nid will (probably) be zero.
  8. Part of the nid address space will be reserved for transient nids. These can be used to boot-strap contacting a nid-allocation service. A newly started node with no statically allocated nid can choose a random nid in this part of the address space and try to contact a configured router. If the router think that the nid is already in used (with some timeout), it will abort the connection. The client can try a new nid. Once it has a transient nid, it can contact an nid server (maybe mgs could provide this service eventually) and get given a stable nid.

Tasks

DeveloperTask
  • Revise the socket code - particularly using "struct sockaddr" instead of a __u32 address and an int port. My plan is to use the same socklnd code for both IPv4 and IPv6.


Feedback

  1. o2iblnd will not benefit from this solution? o2iblnd will still need to use IPv4, since it uses IBoIP for figuring out the NID?
  2. I'm not quiet clear on the NID assignment. Currently, when you add a network it's automatically assigned a NID. For example if you do something like:
    1. [root@lustre01 ~]# lnetctl net add --net tcp --if eth0
      [root@lustre01 ~]# lnetctl net show
      net:
          - net type: lo
            local NI(s):
              - nid: 0@lo
                status: up
          - net type: tcp
            local NI(s):
              - nid: 192.168.122.100@tcp
                status: up
                interfaces:
                    0: eth0


      1. How would that work for IPv6. For example if eth0 is IPv6.
  3. Seems  like the steps you're suggesting add a NID registry feature? A way of assigning NIDs to nodes? Or are you saying that each node will possibly have a module parameter which identifies its NID? I think from point 8 it appears like you mean the former.
  4. Just thinking about how Lustre is configured, an example here: Create and Mount a Lustre Filesystem
    1. The mgs NID has to be specified in the mkfs.lustre command line arguments for OSSs, OSTs, etc. It'll also need to be specified on the client mount command. If the MGS is on a tcp6 network, what would the NID be that you specify on the command line?
    2. Maybe we can require that MGS to always have an IPv4 address so we can still configure Lustre in the same way. The MGS can have other IPv6 interfaces, which can be discovered automatically via Multi-Rail. Then when sending to the peer we follow the Multi-Rail algorithm?
    3. Clients and OSSes may only have IPv6 addresses assuming:
      1. They MGS also supports IPv6
      2. There exists a router in the middle which can route between IPv6 and other networks: o2ib, tcp, gni
  5. From a user perspective I would see something like:
    1. [root@lustre01 ~]# lnetctl net add --net tcp --if eth0
      # Only succeeds if eth0 has an IPv4 address
    2. [root@lustre01 ~]# lnetctl net add --net tcp6 --if eth0
      # Only succeeds if eth0 has an IPv6 address
    3. For configuring tcp6 networks it would make sense to maintain the same NID assignment flow. IE. NID is assigned when a Network Interface is added:
      1. lnetctl net add --net tcp6 --if eth0 
      2. Load the ipv6 LND
      3. ipv6 communicates with a central NID register to grab the next available NID. The NID registry (which can be the MGS) has to be configured before start up. so nodes can reach it
        1. IPv6 NIDs do not contain the IPv6 address, but a symbolic name.
        2. Would that mean however, that we won't be able to configure a NID unless the NID registry service is up?
      4. The network interface addition would only succeed after the NID is assigned from the registry
      5. When the lnetctl net add returns the NID is assigned so when you do lnetctl net show  afterwards you can get the NID. Of course this would also work with the network module parameter.
      6. What if there no network specified in the module parameter and you bring up LNet either via:
        1. lnetctl lnet configure --all 
        2. lctl net up 
          1. What would happen if there is no IPv4 address? Will it default to IPv6? Currently the default is that it comes up with a tcp network and the first interface configured.
      7. If the NID registry is not up or not found, we can assign a temporary NID to the node which gets updated when the NID registry comes up (your point 8?).
  6. I wasn't clear on point 6 above
  7. Below are some sequence diagrams I'm thinking about:

registry_add_ni

registry_del_ni

registry_down

registry_send_without_nid

recv_without_nid

1 Comment

  1. I think there are a few things that could be done to implement IPv6, and/or simplify the use of lnet_nid_t to make it easier to change in the future.  LNet is not exactly my area, so take these comments with a grain of salt.  Also see comments on LU-10391, which is the tracking ticket for IPv6 development and should be used for patches against master.

    • in the Lustre configuration, the LNet NIDs are stored in some places as strings, and other places as binary struct lnet_nid_t.  It would be useful to remove the use of either the binary or ASCII forms of the NID from the Lustre configuration logs, and instead only use/store the target name (potentially ignoring existing NIDs in old logs).  The client would get the target→NID mapping by the MGS using the "Imperative Recovery" list (see LU-19 for details, currently only used during recovery), which holds the current mapping of which nodes are mounting specific server targets.  This approach has the benefit of removing hard-coded addresses in the Lustre configuration, which will simplify DHCP usage for servers, or dynamic target migration.
    • Have LNet routers that map between protocols that have IPv4 addresses and IPv6 addresses (lnet_nid128_t or whatever).  That allows systems which do not need interoperability (new or "upgrade everything at once") to immediately use the newer protocol, while systems that need to interoperate with IPv4 addresses (i.e. every existing system today) to be able to use the old NID format.  It may mean that both tcp and tcp6 LNDs need to be installed on the same node, in order to communicate with both IPv4 and IPv6 peers.