Testing a Lustre filesystem

Lustre code available from Whamcloud git repository contains tools to test a Lustre installation. Since creation of Lustre in 2001, these tools have matured and multiplied. To date, three different test suits are available.

Test suite overview

This document assumes that you have a Linux kernel compiled with Lustre patches. Typical routes to getting a working Lustre kernel include:

By downloading a pre-build kernel from a provider.
By applying the Lustre patches and building your own kernel.
Details on both of these routes is provided on the wiki page: Putting together a Lustre filesystem.

Pre-requisites

The instructions on this page assume that you have the Lustre test suite installed. You can get this from source at http://git.whamcloud.com or as RPM from server builds at build.whamcloud.com.

`llmount.sh`

One of the simplest test suites consists of llmount.sh and llmountcleanup.sh. llmount.sh uses a collection of bash scripts to create a Lustre file system complete with MDS, MDT, OSS, OST and Client using loop devices on a single machine. llmountcleanup.sh tears down the work llmount.sh performed and should return your system to normal.

Once llmount.sh has completed successfully you should see the following:

[root@client-10 ~]# /build/lustre-release/lustre/tests/llmount.sh
Stopping clients: client-10.lab.whamcloud.com /mnt/lustre (opts:)
Stopping clients: client-10.lab.whamcloud.com /mnt/lustre2 (opts:)
Loading modules from /build/lustre-release/lustre/tests/..
lnet.debug=0x33f1504
lnet.subsystem_debug=0xffb7e3ff
lnet options: 'networks=tcp0 accept=all'
Formatting mgs, mds, osts
Checking servers environments
Checking clients client-10.lab.whamcloud.com environments
Setup mgs, mdt, osts
Starting mds: -o loop  /tmp/lustre-mdt /mnt/mds
lnet.debug=0x33f1504
lnet.subsystem_debug=0xffb7e3ff
lnet.debug_mb=256
Started lustre-MDT0000
Starting ost1: -o loop  /tmp/lustre-ost1 /mnt/ost1
lnet.debug=0x33f1504
lnet.subsystem_debug=0xffb7e3ff
lnet.debug_mb=256
Started lustre-OST0000
Starting ost2: -o loop  /tmp/lustre-ost2 /mnt/ost2
lnet.debug=0x33f1504
lnet.subsystem_debug=0xffb7e3ff
lnet.debug_mb=256
Started lustre-OST0001
Starting client: client-10.lab.whamcloud.com: -o user_xattr,acl,flock client-10.lab.whamcloud.com@tcp:/lustre /mnt/lustre
lnet.debug=0x33f1504
lnet.subsystem_debug=0xffb7e3ff
lnet.debug_mb=256
Using TIMEOUT=20
[root@client-10 ~]#

configuring llmount.sh

llmount.sh takes configuration from environment variables. If you want to overload these values, you can copy the default values from /usr/lib64/lustre/tests/cfg/local.sh locally, modify your copy of local.sh and then ensure the system-wide llmount.sh first sources your local.sh.

Troubleshooting `llmount.sh`

llmount.sh falls over complaining ...

This error typically indicates a problem with an Infiniband (IB) network. Even though llmount.sh does not connect to any external machines the IB network must be working correctly. It is possible to switch to tcp for the purposes of running llmount.sh: Select network using export NETTYPE=tcp, and check that Lnet is configured to use tcp in /etc/modules.conf. More details on Lnet are available in the manual.

llmount complains that a value is undefined

Before you run llmount.sh it is necessary to set the debug size environment variable: export DEBUG_SIZE=256. Setting the DEBUG_SIZE to this value ensures enough space is allocated for logs for all the cpus in the system. If DEBUG_SIZE is too small, the param setting will complain during llmoun.sh

You will now have a lustre filesystem available to you in user-space at /mnt/lustre/.

You can test this by switching striping to all nodes and writing a big file:

[root@client-10 ~]# lfs setstripe -c -1 /mnt/lustre
[root@client-10 ~]# lfs getstripe /mnt/lustre/
/mnt/lustre/
stripe_count:   -1 stripe_size:    0 stripe_offset:  -1
[root@client-10 ~]# dd if=/dev/zero of=/mnt/lustre/file.out bs=1MB count=400
400+0 records in
400+0 records out
400000000 bytes (400 MB) copied, 2.33261 seconds, 171 MB/s

Clean-up the after the tests:

/build/lustre-release/lustre/tests/llmountcleanup.sh

`auster`

Auster is a large suite of functional tests for Luster. There is very good coverage of all Lustre functionality contained within Auster. Help is available on-line:

$ /usr/lib64/lustre/tests/auster -h
Usage auster [options]  suite [suite optoins] [suite [suite options]]
Run Lustre regression tests suites.
      -c CONFIG Test environment config file
      -d LOGDIR Top level directory for logs
      -D FULLLOGDIR Full directory for logs
      -f STR    Config name (cfg/<name>.sh)
      -g GROUP  Test group file (Overrides tests listed on command line)
      -S TESTSUITE First test suite to run allows for restarts
      -i N      Repeat tests N times (default 1). A new directory
                will be created under LOGDIR for each iteration.
      -k        Don't stop when subtests fail
      -R        Remount lustre between tests
      -r        Reformat (during initial configuration if needed)
      -s        SLOW=yes
      -v        Verbose mode
      -l        Send logs to the Maloo database after run
                  (can be done later by running maloo_upload.sh)
      -h        This help.

Suite options
These are suite specific options that can be specified after each suite on
the command line.
   suite-name  [options]
      --only LIST         Run only specific list of subtests
      --except LIST       Skip list of subtests
      --start-at SUBTEST  Start testing from subtest
      --stop-at SUBTEST   Stop testing at subtest
      --time-limit LIMIT  Don't allow this suite to run longer
                          than LIMT seconds. [UNIMPLEMENTED]

Example usage:
Run all of sanity and all of replay-single except for 70b with SLOW=y using
the default "local" configuration.

  auster -s sanity replay-single --except 70b

Run all tests in the regression group 5 times using large config.

  auster -f large -g test-groups/regression  -r 5

Run tests using auster script

Single node

 # cd /usr/lib[64]/lustre/tests
 # ./auster -rv runtests

Note: This is a very simple setup, not all tests can be run in this configuration

Multiple nodes

# cd /usr/lib[64]/lustre/tests
Edit cfg/local.shMinimum required variables: mds_HOST, ost_HOST, PDSH, MDSDEV (MDSDEV1 if lustre 2.x), OSTCOUNT, OSTDEV#, MDS_MOUNT_OPTS, OST_MOUNT_OPTS


See Lustre Test Tools Environment Variable for more infomation
Make sure partitions on the disks are setup
If using real devices, make sure to set MDS_MOUNT_OPTS, OST_MOUNT_OPTS = ""
Edit cfg/ncli.sh if there are more than 1 clientsSet RCLIENTS=<list of remote clients>


# ./auster -rvf ncli runtests (or any other test suite)

Test logs will be in /tmp/test_logs/YYYY-MM-DD

Subsequence runs do not need to reformat (-r option) the filesystem

Page tree