You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 27 Next »

Purpose

Describe the steps you need to build and test a 1.8 Lustre system (MGS, MDT, MDS, OSS, OST, client) on a CentOS 5 machine.

Prerequisite

  • A newly installed CentOS 5 machine with the name: client-10.
  • EPEL Repository: this is a convenient source for git.

Overview

Lustre 1.8 servers require a patched and compiled kernel. Patches are readily available in the Whamcloud git source repository. A test suite is included with the Lustre 1.8 source. This document walks through the steps of patching the kernel, building Lustre and running a basic test of the complete system.

Procedure

The procedure requires that a OS is setup for development - this includes Lustre sources, kernel source and build tools. Once setup, a new kernel can be patched, compiled, run and tested. Further reading on building a CentOS RPM based kernel is available on the CentOS site.

Provision Machine

Once CentOS 5.5 is newly installed on client-10 login as root.

  1. Install required kernel development tools.
    yum groupinstall "Development Tools"
    yum install rpm-build redhat-rpm-config unifdef gnupg quilt git
    
  2. Create a user build with the home directory /build
    useradd -d /build build
    
  3. Switch to the user build and change to the build $HOME directory.
    su build
    cd $HOME
    
  4. Get the 1.8 branch from the Whamcloud git account.
    git clone git://git.whamcloud.com/fs/lustre-release.git
    cd lustre-release
    git checkout --track -b b1_8 origin/b1_8
    
  5. Run sh ./autogen.sh
  6. Resolve any outstanding dependencies until autogen.sh completes successfully. Success will look like:
    [root@client-10 lustre-release]# sh ./autogen.sh 
    Checking for a complete tree...
    checking for automake-1.9 >= 1.9... found 1.9.6
    ...
    Running automake-1.9...
    configure.ac: installing `./install-sh'
    configure.ac: installing `./missing'
    configure.ac:9: installing `./config.guess'
    configure.ac:9: installing `./config.sub'
    Running autoconf...
    [root@client-10 lustre-release]#
    

Prepare the kernel source

In this walk-thru, the kernel is built using rpmbuild - a tool specific to RPM based distributions.

  1. Get the kernel source. First create the directory structure, then get the source from the RPM. Create a .rpmmacros file to install the kernel source in our user dir.
    cd $HOME
    mkdir -p kernel/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
    cd kernel
    echo '%_topdir %(echo $HOME)/kernel/rpmbuild' > ~/.rpmmacros
    
  2. Install the kernel source:
    rpm -i http://mirror.centos.org/centos/5/updates/SRPMS/kernel-2.6.18-194.32.1.el5.src.rpm 2>&1 | grep -v mockb
    
  3. Expand the source. Using rpmbuild will also apply CentOS patches.
    cd ~/kernel/rpmbuild/SPECS
    rpmbuild -bp --target=`uname -m` ./kernel-2.6.spec
    

    This should end with:
    ...
    Patch #20215 (xen-hvm-fix-up-suspend-resume-migration-w-pv-drivers.patch):
    + patch -p1 --fuzz=2 -s
    + echo 'Patch #20216 (xen-hvm-correct-accuracy-of-pmtimer.patch):'
    Patch #20216 (xen-hvm-correct-accuracy-of-pmtimer.patch):
    + patch -p1 --fuzz=2 -s
    + exit 0
    

At this point, we now have kernel souce, with all the CentOS patches applied, residing in the directory /build/kernel/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18.x86_64

Patch the kernel source with the Lustre code.

  1. Add a unique build id so we can be certain our kernel is booted. Edit ~build/kernel/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18.x86_64/Makefile and modify line 4, the EXTRAVERSION to read:
    EXTRAVERSION = -lustre18
    
  2. enter the directory /build/kernel/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18.x86_64
    cd /build/kernel/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18.x86_64
    
  3. overwrite the .config file with /build/lustre-release/lustre/kernel_patches/kernel_configs/kernel-2.6.18-2.6-rhel5-x86_64.config
    cp /build/lustre-release/lustre/kernel_patches/kernel_configs/kernel-2.6.18-2.6-rhel5-x86_64-smp.config ./.config
    
  4. link the Lustre series and patches
    ln -s ~/lustre-release/lustre/kernel_patches/series/2.6-rhel5.series series
    ln -s ~/lustre-release/lustre/kernel_patches/patches patches
    
  5. Apply the patches to the kernel source using quilt
    quilt push -av
    ...
    ...
    Applying patch patches/md-avoid-bug_on-when-bmc-overflow.patch
    patching file drivers/md/bitmap.c
    Hunk #1 succeeded at 1161 (offset 1 line).
    Hunk #3 succeeded at 1224 (offset 1 line).
    patching file include/linux/raid/bitmap.h
    
    Applying patch patches/jbd2_stats_proc_init-wrong-place.patch
    patching file fs/jbd2/journal.c
    Hunk #1 succeeded at 1042 (offset 143 lines).
    
    Now at patch patches/jbd2_stats_proc_init-wrong-place.patch
    

Build the new kernel as an RPM.

  1. Go into the kernel source directory and issue the following commands to build a kernel rpm.
    cd /build/kernel/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18.x86_64
    make oldconfig || make menuconfig
    make include/asm
    make include/linux/version.h
    make SUBDIRS=scripts
    make include/linux/utsrelease.h
    make
    make rpm
    
  2. A successful build will return:
    ...
    ...
    Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
    Checking for unpackaged file(s): /usr/lib/rpm/check-files /var/tmp/kernel-2.6.18lustre18-root
    Wrote: /build/kernel/rpmbuild/SRPMS/kernel-2.6.18lustre18-1.src.rpm
    Wrote: /build/kernel/rpmbuild/RPMS/x86_64/kernel-2.6.18lustre18-1.x86_64.rpm
    Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.35163
    + umask 022
    + cd /build/kernel/rpmbuild/BUILD
    + cd kernel-2.6.18lustre18
    + exit 0
    rm ../kernel-2.6.18lustre18.tar.gz
    

NOTE If you receive a request to generate more entropy, you need to trigger some disk I/O or keyboard I/O. I would recommend (in another terminal):

grep -Ri 'whamcloud' /usr

At this point, you should have a fresh kernel RPM /build/kernel/rpmbuild/RPMS/x86_64/kernel-2.6.18lustre18-1.x86_64.rpm

Configure and build Lustre

  1. Configure Lustre source
    [build@client-10 linux-2.6.18.x86_64]$ cd ~/lustre-release/
    [build@client-10 lustre-release]$ ./configure --with-linux=/build/kernel/rpmbuild/BUILD/kernel-2.6.18lustre18/
    ...
    ...
    LLCPPFLAGS:    -D__arch_lib__ -D_LARGEFILE64_SOURCE=1
    CFLAGS:        -g -O2 -Werror
    EXTRA_KCFLAGS: -include /build/lustre-release/config.h  -g -I/build/lustre-release/lnet/include -I/build/lustre-release/lnet/include -I/build/lustre-release/lustre/include
    LLCFLAGS:      -g -Wall -fPIC -D_GNU_SOURCE
    
    Type 'make' to build Lustre.
    
  2. make rpms:
    [build@client-10 lustre-release]$ make rpms
    ...
    ...
    Wrote: /build/kernel/rpmbuild/RPMS/x86_64/lustre-debuginfo-1.8.5.54-2.6.18_194.32.1.el5.lustre18_201103071000.x86_64.rpm
    Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.15638
    + umask 022
    + cd /build/kernel/rpmbuild/BUILD
    + cd lustre-1.8.5.54
    + rm -rf /var/tmp/lustre-1.8.5.54-root
    + exit 0
    make[1]: Leaving directory `/build/lustre-release'
    
  3. You should now have build the following, similarly named, rpms:
    ls ~build/kernel/rpmbuild/RPMS/x86_64/
    lustre-debuginfo-1.8.5.54-2.6.18_lustre18_201103081147.x86_64.rpm
    lustre-tests-1.8.5.54-2.6.18_lustre18_201103081147.x86_64.rpm
    lustre-source-1.8.5.54-2.6.18_lustre18_201103081147.x86_64.rpm
    lustre-modules-1.8.5.54-2.6.18_lustre18_201103081147.x86_64.rpm
    lustre-1.8.5.54-2.6.18_lustre18_201103081147.x86_64.rpm
    lustre-ldiskfs-3.1.5-2.6.18_lustre18_201103081148.x86_64.rpm
    lustre-ldiskfs-debuginfo-3.1.5-2.6.18_lustre18_201103081148.x86_64.rpm
    kernel-2.6.18lustre18-1.x86_64.rpm
    

Installing the Lustre kernel and rebooting.

  1. As root, Install the kernel
    [root@client-10 ~]# rpm -ivh ~build/kernel/rpmbuild/RPMS/x86_64/kernel-2.6.18lustre18-1.x86_64.rpm
    
  2. Check that /boot/grub/menu.lst contains the correct default kernel to boot. This is typically 0:
    Default=0
    
  3. reboot
  4. connect with conman, and watch the machine come up
  5. view the login prompt with satisfaction:
    CentOS release 5.5 (Final)
    Kernel 2.6.18-lustre18 on an x86_64
    
    client-10.lab.whamcloud.com login:
    

Installing Lustre.

  1. Change to root and Change directory into /build/kernel/rpmbuild/RPMS/x86_64/
  2. Install modules lustre-modules and user space tools lustre-
    rpm -ivh /build/kernel/rpmbuild/RPMS/x86_64/lustre-modules-1.8.5.54-2.6.18_lustre18_*.x86_64.rpm /build/kernel/rpmbuild/RPMS/x86_64/lustre-1.8.5.54-2.6.18_lustre18_*.x86_64.rpm /build/kernel/rpmbuild/RPMS/x86_64/lustre-ldiskfs-3.1.5-2.6.18_lustre18_*.x86_64.rpm
    

Installing e2fsprogs

e2fsprogs is needed to run the test suite.

  1. Download e2fsprogs from http://build.whamcloud.com/job/e2fsprogs/
  2. Install with rpm -ivh e2fsprogs

Testing Lustre

  1. As root, create a large enough debug buffer to contain the log for the total number of
  2. Run llmount.sh
    export DEBUG_SIZE=256
    /build/lustre-release/lustre/tests/llmount.sh
    
  3. You should see something like:
    [root@client-10 ~]# /build/lustre-release/lustre/tests/llmount.sh 
    Stopping clients: client-10.lab.whamcloud.com /mnt/lustre (opts:)
    Stopping clients: client-10.lab.whamcloud.com /mnt/lustre2 (opts:)
    Loading modules from /build/lustre-release/lustre/tests/..
    lnet.debug=0x33f1504
    lnet.subsystem_debug=0xffb7e3ff
    lnet options: 'networks=tcp0 accept=all'
    Formatting mgs, mds, osts
    Checking servers environments
    Checking clients client-10.lab.whamcloud.com environments
    Setup mgs, mdt, osts
    Starting mds: -o loop  /tmp/lustre-mdt /mnt/mds
    lnet.debug=0x33f1504
    lnet.subsystem_debug=0xffb7e3ff
    lnet.debug_mb=256
    Started lustre-MDT0000
    Starting ost1: -o loop  /tmp/lustre-ost1 /mnt/ost1
    lnet.debug=0x33f1504
    lnet.subsystem_debug=0xffb7e3ff
    lnet.debug_mb=256
    Started lustre-OST0000
    Starting ost2: -o loop  /tmp/lustre-ost2 /mnt/ost2
    lnet.debug=0x33f1504
    lnet.subsystem_debug=0xffb7e3ff
    lnet.debug_mb=256
    Started lustre-OST0001
    Starting client: client-10.lab.whamcloud.com: -o user_xattr,acl,flock client-10.lab.whamcloud.com@tcp:/lustre /mnt/lustre
    lnet.debug=0x33f1504
    lnet.subsystem_debug=0xffb7e3ff
    lnet.debug_mb=256
    Using TIMEOUT=20
    [root@client-10 ~]# 
    
  4. You will now have a lustre filesystem available to you in user-space at /mnt/lustre/.
  5. You can test this by switching striping to all nodes and writing a big file:
    [root@client-10 ~]# lfs setstripe -c -1 /mnt/lustre
    [root@client-10 ~]# lfs getstripe /mnt/lustre/
    /mnt/lustre/
    stripe_count:   -1 stripe_size:    0 stripe_offset:  -1
    [root@client-10 ~]# dd if=/dev/zero of=/mnt/lustre/file.out bs=1MB count=400
    400+0 records in
    400+0 records out
    400000000 bytes (400 MB) copied, 2.33261 seconds, 171 MB/s
    
  6. Clean-up the after the tests:
    /build/lustre-release/lustre/tests/llmountcleanup.sh
    

Congratulations, you mission is complete.

Trouble shooting.

  1. If Infiniband is now working, you can switch to tcp: Select network using export NETTYPE=tcp. Lustre test defaults to 'tcp', the automatically provisioned machines have lnet setup to us o2ib

ENDS~

  • No labels