You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

Lustre code available from Whamcloud git repository contains tools to test a Lustre installation. Since creation of Lustre in 2001, these tools have matured and multiplied. To date, 22 different test suits are regularly run and many more are available.

Test suite overview

This document assumes that you have a Linux kernel compiled with Lustre patches. Typical routes to getting a working Lustre kernel include:

  • By downloading a pre-build kernel from a provider.
  • By applying the Lustre patches and building your own kernel.
    Details on both of these routes is provided on the wiki page: Putting together a Lustre filesystem.

Pre-requisites

The instructions on this page assume that you have the Lustre test suite installed. You can get this from source at http://git.whamcloud.com/ or as RPM from server builds at build.whamcloud.com.

Configuring llmount.sh

llmount.sh is configured from environment variables stored in the $LUSTRE/tests/cfg/ directory by default. By default, the test configuration for a test system is in the local.sh file.  If you want to specify configuration values other than the defaults (e.g. to specify block devices instead of loopback files, change the number of OSTs being tested, etc), the best way to do this is to set the NAME={lustre_fsname} variable and create a new configuration script $LUSTRE/tests/cfg/${lustre_fsname} that contains the site-specific variables, and sources $LUSTRE/tests/cfg/local.sh for the default values.  This allows keeping site-specific values separate from files that are in Git or RPM packages, but ensures that any new or release-specific changes to local.sh are used (if not already specified).  For example, a test configuration for a developer running on a single node might look like:

$ export NAME=testfs					# used by test-framework.sh to find the configuration file
$ cat lustre/tests/cfg/$NAME.sh
FSNAME=testfs
MDSDEVBASE=/dev/vg_testfs/lvmdt
OSTDEVBASE=/dev/vg_testfs/lvost
OSTCOUNT=${OSTCOUNT:-5}
MODOPTS_LIBCFS="libcfs_panic_on_lbug=0"
FAIL_ON_ERROR=${FAIL_ON_ERROR:-true}
export SHARED_DIRECTORY="/tmp"			# /tmp is shared for all services on the test node
. $LUSTRE/tests/cfg/local.sh			# source all of the other configuration defaults
unset OSTSIZE							# use the size of the lvost devices, not a fixed size
unset MDSSIZE							# use the size of the lvmdt devices, not a fixed size

llmount.sh

One of the simplest test suites consists of llmount.sh (to format and mount a test filesystem) and llmountcleanup.sh (to unmount the filesystem). llmount.sh uses a collection of bash scripts to create a Lustre file system complete with MDS, MDT, OSS, OST and Client using loop devices on a single machine. llmountcleanup.sh tears down the work llmount.sh performed and should return your system to normal.

Once llmount.sh has completed successfully you should see the following:

[root@client-10 ~/lustre-git]# cd lustre/tests
[root@client-10 ~/lustre-git/lustre/tests]# ./llmount.sh
Stopping clients: client-10.lab.whamcloud.com /mnt/lustre (opts:)
Stopping clients: client-10.lab.whamcloud.com /mnt/lustre2 (opts:)
Loading modules from /build/lustre-release/lustre/tests/..
lnet.debug=0x33f1504
lnet.subsystem_debug=0xffb7e3ff
lnet options: 'networks=tcp0 accept=all'
Formatting mgs, mds, osts
Checking servers environments
Checking clients client-10.lab.whamcloud.com environments
Setup mgs, mdt, osts
Starting mds: -o loop  /tmp/lustre-mdt /mnt/mds
lnet.debug=0x33f1504
lnet.subsystem_debug=0xffb7e3ff
lnet.debug_mb=256
Started lustre-MDT0000
Starting ost1: -o loop  /tmp/lustre-ost1 /mnt/ost1
lnet.debug=0x33f1504
lnet.subsystem_debug=0xffb7e3ff
lnet.debug_mb=256
Started lustre-OST0000
Starting ost2: -o loop  /tmp/lustre-ost2 /mnt/ost2
lnet.debug=0x33f1504
lnet.subsystem_debug=0xffb7e3ff
lnet.debug_mb=256
Started lustre-OST0001
Starting client: client-10.lab.whamcloud.com: -o user_xattr,acl,flock client-10.lab.whamcloud.com@tcp:/lustre /mnt/lustre
lnet.debug=0x33f1504
lnet.subsystem_debug=0xffb7e3ff
lnet.debug_mb=256
Using TIMEOUT=20
[root@client-10 ~/lustre-git/lustre/tests]#



Troubleshooting llmount.sh

llmount.sh falls over complaining that it cannot connect to the server

This error has multiple possible causes.  One common problem is that the system hostname maps to localhost (127.0.0.1) instead of a real IP address. Another possible problem is when testing on a single node using the Infiniband (IB) network. Even though llmount.sh does not connect to any external machines the IB network must be working correctly. It is possible to switch to TCP for the purposes of running llmount.sh - select network using NETTYPE=tcp in the config file and check that LNet is configured to use tcp in /etc/modprobe.d/lustre.conf. More details on LNet are available in the manual.

llmount.sh complains that a value is undefined

Before you run llmount.sh it is necessary to set the debug size environment variable in the configuration, for example DEBUG_SIZE=256. Setting DEBUG_SIZE to this value ensures enough space is allocated for logs for all the CPUs in the system. If DEBUG_SIZE is too small, the param setting will complain during llmount.sh

  1. You will now have a lustre filesystem available to you in user-space at /mnt/lustre/ or /mnt/$FSNAME/ if it was specified differently in your configuration.
  2. You can test this by switching striping to all nodes and writing a big file:

    [root@client-10 ~/lustre-git/lustre/tests]# lfs setstripe -c -1 /mnt/lustre
    [root@client-10 ~/lustre-git/lustre/tests]# lfs getstripe /mnt/lustre/
    /mnt/lustre/
    stripe_count:   -1 stripe_size:    0 stripe_offset:  -1
    [root@client-10 ~/lustre-git/lustre/tests]# dd if=/dev/zero of=/mnt/lustre/file.out bs=1MB count=400
    400+0 records in
    400+0 records out
    400000000 bytes (400 MB) copied, 2.33261 seconds, 171 MB/s
    
  3. Clean-up the after the tests:

    ./llmountcleanup.sh
    

auster

Auster is a large suite of functional tests for Lustre. There is very good coverage of all Lustre functionality contained within Auster. Help is available on-line:

$ /usr/lib64/lustre/tests/auster -h
Usage auster [options]  suite [suite options] [suite [suite options]]
Run Lustre regression tests suites.
      -c CONFIG Test environment config file
      -d LOGDIR Top level directory for logs
      -D FULLLOGDIR Full directory for logs
      -f STR    Config name (cfg/<name>.sh)
      -g GROUP  Test group file (Overrides tests listed on command line)
      -S TESTSUITE First test suite to run allows for restarts
      -i N      Repeat tests N times (default 1). A new directory
                will be created under LOGDIR for each iteration.
      -k        Don't stop when subtests fail
      -R        Remount lustre between tests
      -r        Reformat (during initial configuration if needed)
      -s        SLOW=yes
      -v        Verbose mode
      -l        Send logs to the Maloo database after run
                  (can be done later by running maloo_upload.sh)
      -h        This help.

Suite options
These are suite specific options that can be specified after each suite on
the command line.
   suite-name  [options]
      --only LIST         Run only specific list of subtests
      --except LIST       Skip list of subtests
      --start-at SUBTEST  Start testing from subtest
      --stop-at SUBTEST   Stop testing at subtest
      --time-limit LIMIT  Don't allow this suite to run longer
                          than LIMT seconds. [UNIMPLEMENTED]


Example usage:
Run all of sanity and all of replay-single except for 70b with SLOW=y using
the default "local" configuration.

  auster -s sanity replay-single --except 70b

Run all tests in the regression group 5 times using large config.

  auster -f large -g test-groups/regression -i 5

Run tests using auster script

  • Single node
 # cd /usr/lib64/lustre/tests
 # ./auster -rv runtests

Note: This is a very simple setup, not all tests can be run in this configuration

  • Multiple nodes
# cd /usr/lib64/lustre/tests
edit cfg/testfs.sh
Minimum required variables: mds_HOST, ost_HOST, PDSH, MDSDEV1, OSTCOUNT, OSTDEV#, MDS_MOUNT_OPTS, OST_MOUNT_OPTS


See Lustre Test Tools Environment Variable for more infomation
Make sure partitions on the disks are setup
If using real devices, make sure to set MDS_MOUNT_OPTS, OST_MOUNT_OPTS = ""
If there is more than one clients set RCLIENTS=<list of remote clients>


# ./auster -rvf testfs runtests (or any other test suite)

Test logs will be in /tmp/test_logs/YYYY-MM-DD.

Subsequence runs do not need to reformat (-r option) the filesystem.

  • No labels