Purpose
Describe the steps you need to build and test a 1.8 Lustre system (MGS, MDT, MDS, OSS, OST, client) on a CentOS 5 Toro machine.
Prerequisite
- Account on Toro
- Reservations on the nodes: client-10, client-11, client-12
- CentOS 5 provisioned on client-10 using loadhudsonbuild.rb
Overview
Lustre 1.8 servers require a patched and compiled kernel. Patches are readily available in the Whamcloud git source repository. The test suite is included with the Lustre 1.8 source.
Procedure
The procedure requires that a OS is setup for development - this includes Lustre source and kernel headers. Once setup, a new kernel can be patched, compiled, run and tested. Building a RPM based kernel is described in detail on the Lustre.org wiki.
Provision Machine
Once CentOS5.5 is provisioned on client-10 login as root.
- Install development tools:
yum groupinstall "Development Tools"
- Install a bunch of useful stuff:
yum install rpm-build redhat-rpm-config unifdef gnupg quilt git
- Create a user
build
with the home directory/build
useradd -d /build build
- Switch to user
su build
- Change to directory
~build
- Get the 1.8 branch from the Whamcloud git account.
git clone git://git.whamcloud.com/fs/lustre-release.git cd lustre-release git checkout --track -b b1_8 origin/b1_8
- Run
sh ./autogen.sh
- Resolve the outstanding dependencies until
autogen.sh
completes successfully. Success will look like:[root@client-10 lustre-release]# sh ./autogen.sh Checking for a complete tree... checking for automake-1.9 >= 1.9... found 1.9.6 ... Running automake-1.9... Running autoconf... [root@client-10 lustre-release]#
Prepeare the kernel source
This section of the walk-thru is taken from http://wiki.centos.org/HowTos/Custom_Kernel
- Get the kernel source. First create the directory structure, then get the source from the RPM. Create a
.rpmmacros
file to install the kernel source in our user dir.cd mkdir -p kernel/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS} cd kernel echo '%_topdir %(echo $HOME)/kernel/rpmbuild' > ~/.rpmmacros
- Install the kernel source:
rpm -i http://mirror.centos.org/centos/5/updates/SRPMS/kernel-2.6.18-194.32.1.el5.src.rpm 2>&1 | grep -v mockb
- Expand the source. Using rpmbuild will also apply CentOS patches.
cd ~/kernel/rpmbuild/SPECS rpmbuild -bp --target=`uname -m` ./kernel-2.6.spec
This should return a bunch of stuff and end:... + echo 'Patch #20216 (xen-hvm-correct-accuracy-of-pmtimer.patch):' Patch #20216 (xen-hvm-correct-accuracy-of-pmtimer.patch): + patch -p1 --fuzz=2 -s + exit 0
At this point, we now have a kernel souce, with all the CentOS patches applied, residing in the directory /build/kernel/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18.x86_64
Patch the kernel source with the Lustre code.
- Add a unique build id so we can be certain our kernel is booted. Edit
~build/kernel/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18.x86_64/Makefile
and modify lin 4, theEXTRAVERSION
to read:EXTRAVERSION = -lustre18
- enter the directory
/build/kernel/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18.x86_64
- overwrite the
.config
file with/build/lustre-release/lustre/kernel_patches/kernel_configs/kernel-2.6.18-2.6-rhel5-x86_64.config
cp /build/lustre-release/lustre/kernel_patches/kernel_configs/kernel-2.6.18-2.6-rhel5-x86_64-smp.config ./.config
- link the Lustre series and patches
ln -s ~/lustre-release/lustre/kernel_patches/series/2.6-rhel5.series series ln -s ~/lustre-release/lustre/kernel_patches/patches patches
- Apply the patches to the kernel source using quilt
quilt push -av ... ... Applying patch patches/jbd2_stats_proc_init-wrong-place.patch patching file fs/jbd2/journal.c Hunk #1 succeeded at 1042 (offset 143 lines). Now at patch patches/jbd2_stats_proc_init-wrong-place.patch
Build the new kernel as an RPM.
- Go into the kernel source directory and issue the following commands to build a kernel rpm.
cd /build/kernel/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18.x86_64 make oldconfig || make menuconfig make include/asm make include/linux/version.h make SUBDIRS=scripts make include/linux/utsrelease.h make rpm
- make a coffee. NOTE If you receive a request to generate more entropy, you need to trigger some disk I/O or keyboard I/O. I would recommend (in another terminal):
grep -Ri 'whamcloud' /usr
- As user
build
change to directory~build/lustre-release
At this point, you should have a fresh kernel RPM /build/kernel/rpmbuild/RPMS/x86_64/kernel-2.6.18lustre18-1.x86_64.rpm
Configure and build Lustre
- Configure Lustre source
[build@client-10 lustre-release]$ ./configure --with-linux=/build/kernel/rpmbuild/BUILD/kernel-2.6.18lustre18/ ... ... EXTRA_KCFLAGS: -include /build/lustre-release/config.h -g -I/build/lustre-release/lnet/include -I/build/lustre-release/lnet/include -I/build/lustre-release/lustre/include LLCFLAGS: -g -Wall -fPIC -D_GNU_SOURCE Type 'make' to build Lustre.
- make rpms:
[build@client-10 lustre-release]$ make rpms ... ... Wrote: /build/kernel/rpmbuild/RPMS/x86_64/lustre-debuginfo-1.8.5.54-2.6.18_194.32.1.el5.lustre18_201103071000.x86_64.rpm Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.15638 + umask 022 + cd /build/kernel/rpmbuild/BUILD + cd lustre-1.8.5.54 + rm -rf /var/tmp/lustre-1.8.5.54-root + exit 0 make[1]: Leaving directory `/build/lustre-release'
- You should now have build the following rpms:
ls ~build/kernel/rpmbuild/RPMS/x86_64/ lustre-debuginfo-1.8.5.54-2.6.18_lustre18_201103081147.x86_64.rpm lustre-tests-1.8.5.54-2.6.18_lustre18_201103081147.x86_64.rpm lustre-source-1.8.5.54-2.6.18_lustre18_201103081147.x86_64.rpm lustre-modules-1.8.5.54-2.6.18_lustre18_201103081147.x86_64.rpm lustre-1.8.5.54-2.6.18_lustre18_201103081147.x86_64.rpm lustre-ldiskfs-3.1.5-2.6.18_lustre18_201103081148.x86_64.rpm lustre-ldiskfs-debuginfo-3.1.5-2.6.18_lustre18_201103081148.x86_64.rpm kernel-2.6.18lustre18-1.x86_64.rpm
Installing the Lustre kernel and rebooting.
- As root, Install the kernel
rpm -ivh ~build/kernel/rpmbuild/RPMS/x86_64/kernel-2.6.18prep-1.x86_64.rpm
- Check that
/boot/grub/menu.lst
contains the correct default kernel to boot. This is typically 0:Default=0
reboot
- connect with conman, and watch the machine come up
- view the login prompt with satisfaction:
CentOS release 5.5 (Final) Kernel 2.6.18-lustre18 on an x86_64 client-10.lab.whamcloud.com login:
Installing Lustre.
- Change to
root
and Change directory into/build/kernel/rpmbuild/RPMS/x86_64/
- Install modules
lustre-modules
and user space toolslustre-
rpm -ivh /build/kernel/rpmbuild/RPMS/x86_64/lustre-modules-1.8.5.54-2.6.18_lustre18_*.x86_64.rpm /build/kernel/rpmbuild/RPMS/x86_64/lustre-1.8.5.54-2.6.18_lustre18_*.x86_64.rpm /build/kernel/rpmbuild/RPMS/x86_64/lustre-ldiskfs-3.1.5-2.6.18_lustre18_*.x86_64.rpm
Installing e2fsprogs
e2fsprogs is needed to run the test suite.
- Download e2fsprogs from http://build.whamcloud.com/job/e2fsprogs/
- Install with
rpm ivh e2fsprogs*
Testing Lustre
- As root, create a large enough debug buffer to contain the log for the total number of
- Run
llmount.sh
export DEBUG_SIZE=256 /build/lustre-release/lustre/tests/llmount.sh
- You should see something like:
[root@client-10 ~]# /build/lustre-release/lustre/tests/llmount.sh Stopping clients: client-10.lab.whamcloud.com /mnt/lustre (opts:) Stopping clients: client-10.lab.whamcloud.com /mnt/lustre2 (opts:) Loading modules from /build/lustre-release/lustre/tests/.. lnet.debug=0x33f1504 lnet.subsystem_debug=0xffb7e3ff lnet options: 'networks=tcp0 accept=all' Formatting mgs, mds, osts Checking servers environments Checking clients client-10.lab.whamcloud.com environments Setup mgs, mdt, osts Starting mds: -o loop /tmp/lustre-mdt /mnt/mds lnet.debug=0x33f1504 lnet.subsystem_debug=0xffb7e3ff lnet.debug_mb=256 Started lustre-MDT0000 Starting ost1: -o loop /tmp/lustre-ost1 /mnt/ost1 lnet.debug=0x33f1504 lnet.subsystem_debug=0xffb7e3ff lnet.debug_mb=256 Started lustre-OST0000 Starting ost2: -o loop /tmp/lustre-ost2 /mnt/ost2 lnet.debug=0x33f1504 lnet.subsystem_debug=0xffb7e3ff lnet.debug_mb=256 Started lustre-OST0001 Starting client: client-10.lab.whamcloud.com: -o user_xattr,acl,flock client-10.lab.whamcloud.com@tcp:/lustre /mnt/lustre lnet.debug=0x33f1504 lnet.subsystem_debug=0xffb7e3ff lnet.debug_mb=256 Using TIMEOUT=20 [root@client-10 ~]#
- Clean-up the after the tests:
/build/lustre-release/lustre/tests/llmountcleanup.sh
Congratulations, you mission is complete.
Trouble shooting.
- If Infiniband is now working, you can switch to tcp: Select network using
export NETTYPE=o2ib
. Lustre test defaults to 'tcp', the automatically provisioned machines use o2ib