Neil Brown notes: (recorded here for easier reference)
- lnet_nid_t doesn't change size. I think that would be too disruptive.
- We introduce a new LND: "v6tcp" (or possible "tcpng") which uses IPv6 addressing. This means that a network will require all nodes to be updated before any node can use IPv6. (This isn't essential, but at this stage seems easiest)
- The nid for "v6tcp" is not related to the IPv6 address - it is assigned separately and two-way lookup will be provided. The mechanism for assigning a nid is beyond the scope of LNet, but static config (e.g. via module parameter or ioctl) is one option.
- When a node receives an LNet message from a node on the same network, it records the source IPv6 address with the netid/nid for later use. This ensures that all routers know the address mapping for all active ip6tcp nodes on any of their networks (due to ping messages) and servers for higher level protocols (e.g. lustre mgs) will also learn the mapping. Any node receiving a request will be able to reply, because it cached the reply IP address.
- When a node wants to contact a peer on the same ip6tcp network, but doesn't already know the IPv6 address, it connects to a service that it does know -- possibly a router, possibly a mgs etc -- and uses a new request, specific to ipv6tcp, to request the address.
- When a node wants to make an initial connection to a router or a service, it is expected to have the IPv6 address, but not the nid. If the peer is on the same network, it establishes the connection with the known IPv6 and sends a message with a dest_nid set to the reserved value "0". The initial "Hello" will receive a reply with the correct nid as "src_nid", and that will be used for future messages.
- If the peer for an initial connection is on a different network, we will need something more involved. Probably a new LNet message which contains the target IPv6 address, and which will be forward by routers until it gets to a router that knows the answer. Again, the dest_nid will (probably) be zero.
- Part of the nid address space will be reserved for transient nids. These can be used to boot-strap contacting a nid-allocation service. A newly started node with no statically allocated nid can choose a random nid in this part of the address space and try to contact a configured router. If the router think that the nid is already in used (with some timeout), it will abort the connection. The client can try a new nid. Once it has a transient nid, it can contact an nid server (maybe mgs could provide this service eventually) and get given a stable nid.
- o2iblnd will not benefit from this solution? o2iblnd will still need to use IPv4, since it uses IBoIP for figuring out the NID?
- I'm not quiet clear on the NID assignment. Currently, when you add a network it's automatically assigned a NID. For example if you do something like:
- How would that work for IPv6. For example if eth0 is IPv6.
- Seems like the steps you're suggesting add a NID registry feature? A way of assigning NIDs to nodes? Or are you saying that each node will possibly have a module parameter which identifies its NID? I think from point 8 it appears like you mean the former.
- Just thinking about how Lustre is configured, an example here: Create and Mount a Lustre Filesystem
- The mgs NID has to be specified in the mkfs.lustre command line arguments for OSSs, OSTs, etc. It'll also need to be specified on the client mount command. If the MGS is on a tcp6 network, what would the NID be that you specify on the command line?
- Maybe we can require that MGS to always have an IPv4 address so we can still configure Lustre in the same way. The MGS can have other IPv6 interfaces, which can be discovered automatically via Multi-Rail. Then when sending to the peer we follow the Multi-Rail algorithm?
- Clients and OSSes may only have IPv6 addresses assuming:
- They MGS also supports IPv6
- There exists a router in the middle which can route between IPv6 and other networks: o2ib, tcp, gni
- From a user perspective I would see something like:
- For configuring tcp6 networks it would make sense to maintain the same NID assignment flow. IE. NID is assigned when a Network Interface is added:
lnetctl net add --net tcp6 --if eth0
- Load the ipv6 LND
- ipv6 communicates with a central NID register to grab the next available NID. The NID registry (which can be the MGS) has to be configured before start up. so nodes can reach it
- IPv6 NIDs do not contain the IPv6 address, but a symbolic name.
- Would that mean however, that we won't be able to configure a NID unless the NID registry service is up?
- The network interface addition would only succeed after the NID is assigned from the registry
- When the
lnetctl net addreturns the NID is assigned so when you do
lnetctl net showafterwards you can get the NID. Of course this would also work with the network module parameter.
- What if there no network specified in the module parameter and you bring up LNet either via:
lnetctl lnet configure --all
lctl net up
- What would happen if there is no IPv4 address? Will it default to IPv6? Currently the default is that it comes up with a tcp network and the first interface configured.
- If the NID registry is not up or not found, we can assign a temporary NID to the node which gets updated when the NID registry comes up (your point 8?).
- I wasn't clear on point 6 above
- Below are some sequence diagrams I'm thinking about: