| Table of Contents |
|---|
How do I build lustre on
...
CentOS 7.x?
Refer to Walk-thru- Build Lustre MASTER on RHEL 7.3/CentOS 7.3 from Git
...
For more information on the feature refer to the HLD: http://wiki.lustre.org/images/b/bb/Multi-Rail_High-Level_Design_20150119.pdf. Dynamic Behavior section.
How should I setup a
...
cluster with a combination of Multi-Rail enabled nodes and non-Multi-Rail enabled nodes? (#TODO)
What should I do if I upgrade from a non-Multi-Rail Lustre version (<=2.10 ) to a Multi-Rail Lustre version (> 2.10) with regards to Multi-Rail? (#TODO)
Are there any specific consideration when configuring
...
Linux routing for LNet Multi-Rail node?
Refer to: MR Cluster Setup
...
For a sample lnet_selftest script: self-test template script
Is there a way to
...
functionally test LNet?
We're currently working on a functional test tool, LNet Unit Test Framework. The documents will be made available soon.
What
...
are the best OPA LND tunables to use?
| Code Block |
|---|
net:
- net type: o2ib1
local NI(s):
- interfaces:
0: ib2
tunables:
peer_timeout: 180
peer_credits: 128
peer_buffer_credits: 0
credits: 1024
lnd tunables:
peercredits_hiw: 4
map_on_demand: 32
concurrent_sends: 256
fmr_pool_size: 2048
fmr_flush_trigger: 512
fmr_cache: 1
conns_per_peer: 4
ntx: 2048 |
What
...
are the best HFI tunables to use with
...
Lustre?
| Code Block |
|---|
options hfi1 krcvqs=8 piothreshold=0 sge_copy_mode=2 wss_threshold=70 |
| Info | ||
|---|---|---|
| ||
It is NOT recommended to enable the OPA TID RDMA feature ({{cap_mask=0x4c09a0*1*cbba}}) as this can cause significant memory usage and service errors when there are a large number of connections. |
Can you tell me more about how to configure LNet and QoS?
...
- Check that you can ping your MGS first.
- Check that you can "
lnetctl ping" the MGS NID.- If you're able to "
lnetctl ping" the MGS NID, then check if you can RDMA usingib_write_bw:- Start
ib_write_bwon your server. This will start a receiver process - Run "
ib_write_bw <MGS IP address>" - This will run an RDMA traffic test independent of LNet. If this works then RDMA works and move on to further LNet debugging. Otherwise contact your IB service provider.
- Start
- Next step is to run run
lnet_selftestto verify LNet traffic. - If lnet_selftest works, then verify your MGS is setup properly.
- If you're able to "
If If ib_write_bw works, but LNet doesn't work, then check your your o2iblnd configuration, as shown below.
I have a routed setup and my clients can't mount? (#TODO)
How do I check
...
my o2iblnd
...
configurations on
...
different nodes are compatible?
- Make sure sure
peer_creditsare the same across your nodes.- peer_credits are dynamically negotiated, such that the lowest lowest
peer_creditsare used. However, if it's not your intention to have different differentpeer_creditsacross the different nodes, it is recommended to ensure they all have the same value.
- peer_credits are dynamically negotiated, such that the lowest lowest
- Make sure your your
peer_credits_hiware the same across your nodes.peer_credits_hiwdefine the High Water Mark value, which when reached the outstanding credits on the connection are returned using a No-op message.
- Make sure your concurrent_sends are the same across your nodes.
- concurrent_sends define the number of concurrent transmits per conneciton
- The recommended values for the above parameters are:
- peer_credits = 32
- peer_credits_hiw = 16
- concurrent_sends = 64
- Generally speaking you want peer_credits_hiw to be half of peer_credits and concurrent_sends to be two times peer_credits.
- Make sure conns_per_peer are the same across the nodes. It defines the number of IB connections to create to 1 peer.
- On OPA it is recommend to set this value to 4
- On MLX it is recommended to leave it a at the default value of 1
What is is map_on_demand and what should I set it to? (#TODO)
...