Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: remove obsolete links, some formatting

Lustre TIPS:

Important links:

Building from source

6.3
https://wiki.hpdd.intel.com/display/PUB/Walk-thru-+Build+Lustre+MASTER+on+RHEL+6.3+from+Whamcloud+git

7.xRHEL 8.0

Walk-thru- Build Lustre MASTER on RHEL 78.30/CentOS 78.3 0 from Git


Submitting Changes
https://wiki.hpddlustre.intel.com/display/PUB/Submitting+org/Submitting_Changes

Using Gerrit
https://wiki.hpddlustre.intel.com/display/PUB/Using+org/Using_Gerrit
review.whamcloud.com  <- This might change in the near future
When you

When you Checkout a new lustre codebase, and you've already build the kernel
then do the following:

    cd <lustre-build>
    sh ./autogen.sh
    ./configure --with-linux=/home/$USER/kernel/rpmbuild/BUILD/kernel-2.6.32.lustremaster/
    make rpms
# or ->make to make binaries and objects without rpms

. there is a delay between the latest kernel version and lustre's track.
. lustre/kernel_patches/which_patch tells you which kernel Lustre currently supports
. Use dmesg to dump out debug messages which can help in finding out what's going on
. to disable SELinux: 
[root@host2a ~]# cat /etc/sysconfig/selinux
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#       enforcing - SELinux security policy is enforced.
#       permissive - SELinux prints warnings instead of enforcing.
#       disabled - SELinux is fully disabled.
SELINUX=permissive
# SELINUXTYPE= type of policy in use. Possible values are:
#       targeted - Only targeted network daemons are protected.
#       strict - Full SELinux protection.
SELINUXTYPE=targeted

# SETLOCALDEFS= Check local definition changes
SETLOCALDEFS=0


Uninstalling Lustre
List the RPMs which are installed
rpm -qa --last | less
rpm -e --nodeps <rpm name>

lustre-tests-2.3.62-2.6.32.lustremaster_g22b4247.x86_64 Wed 13 Mar 2013 07:56:14 AM PDT
lustre-2.3.62-2.6.32.lustremaster_g22b4247.x86_64 Wed 13 Mar 2013 07:56:03 AM PDT
lustre-osd-ldiskfs-2.3.62-2.6.32.lustremaster_g22b4247.x86_64 Wed 13 Mar 2013 07:54:12 AM PDT
lustre-modules-2.3.62-2.6.32.lustremaster_g22b4247.x86_64 Wed 13 Mar 2013 07:53:12 AM PDT
lustre-ldiskfs-3.3.0-2.6.32.lustremaster_g22b4247.x86_64 Wed 13 Mar 2013 07:52:27 AM PDT

then reinstall
rpm -ivh --nodeps <rpm name>

Ex:

rpm -e --nodeps lustre-tests; rpm -e --nodeps lustre-modules; rpm -e --nodeps lustre-osd-ldiskfs; rpm -e --nodeps lustre; rpm -e --nodeps lustre-ldiskfs
rpm -ivh --nodeps lustre-ldiskfs-3.3.0-2.6.32.lustremaster_g2fdc87c.x86_64.rpm; rpm -ivh --nodeps lustre-modules-2.3.62-2.6.32.lustremaster_g2fdc87c.x86_64.rpm; rpm -ivh --nodeps lustre-osd-ldiskfs-2.3.62-2.6.32.lustremaster_g2fdc87c.x86_64.rpm; rpm -ivh --nodeps lustre-2.3.62-2.6.32.lustremaster_g2fdc87c.x86_64.rpm; rpm -ivh --nodeps lustre-tests-2.3.62-2.6.32.lustremaster_g2fdc87c.x86_64.rpm


How to unconfigure and remove a lustre kernel module
lctl network unconfigure
lustre_rmmod
modprobe lnet
lctl network up
lctl list_nids
cat /proc/sys/lnet/routes

Using lustre self test
. Refer to chapter 23 in the Lustre Manual
. modprobe lnet_selftest /* loads all required modules */

Sample test
export LST_SESSION=1236
lst new_session read/write
lst add_group servers 192.168.213.132@tcep
lst add_group readers 192.168.213.132@tcp
lst add_batch bulk_rw
lst add_test --batch bulk_rw --from readers --to servers brw read check=simple size=1M
lst run bulk_rw
# display server stats for 30 seconds
lst stat servers & sleep 30; kill $!
# tear down
lst end_session


cp /home/ashehata/lustre-master/lnet/lnet/lnet.ko /lib/modules/2.6.32.lustremaster/updates/kernel/net/lustre/lnet.ko

cp /home/ashehata/lustre-master/libcfs/libcfs/libcfs.ko  /lib/modules/2.6.32.lustremaster/updates/kernel/net/lustre/libcfs.ko

cp /home/ashehata/lustre-master/lnet/selftest/lnet_selftest.ko /lib/modules/2.6.32.lustremaster/updates/kernel/net/lustre/lnet_selftest.ko

Configuration
 /etc/modprobe.d/lustre.conf #lustre config
 /var/log/messages #syslog messages
        ===== metacommands =======
        --device
        --ignore_errors
        ignore_errors
        ======== control =========
        help
        lustre_build_version
        exit
        quit
        ===== network config =====
        --net
        network
        net
        list_nids
        which_nid
        replace_nids
        interface_list
        peer_list
        conn_list
        active_tx
        route_list
        show_route
        ping
        ==== obd device selection ====
        device
        device_list
        dl
        ==== obd device operations ====
        activate
        deactivate
        abort_recovery
        set_timeout
        conf_param
        local_param
        get_param
        set_param
        list_param
        ==== debugging control ====
        debug_daemon
        debug_kernel
        dk
        debug_file
        df
        clear
        mark
        filter
        show
        debug_list
        modules
        ==== virtual block device ====
        blockdev_attach
        blockdev_detach
        blockdev_info
        ===  Pools ==
        pool_new
        pool_add
        pool_remove
        pool_destroy
        pool_list
        ===  Changelogs ==
        changelog_register
        changelog_deregister
        == device setup (these are not normally used post 1.4) ==
        attach
        detach
        setup
        cleanup
        dump_cfg
        ==== testing (DANGEROUS) ====
        --threads
        lookup
        readonly
        notransno
        add_uuid
        del_uuid
        add_peer
        del_peer
        add_conn 
        del_conn 
        disconnect
        push
        mynid
        fail
        test_create
        test_mkdir
        test_destroy
        test_rmdir
        test_lookup
        test_setxattr
        test_md_getattr
        getattr
        setattr
        create
        destroy
        test_getattr
        test_setattr
        test_brw
        lwt
        memhog
        getobjversion
        ==== LFSCK ====
        lfsck_start
        lfsck_stop
        ==== obsolete (DANGEROUS) ====
        cfg_device
        recover
        lov_getconfig
        llog_catlist
        llog_info
        llog_print
        llog_check
        llog_cancel
        llog_remove
        add_interface
        del_interface
        add_route
        del_route
        set_route
For more help type: help command-name
New OFED driver
On eric|brent.whamcloud.com
/scratch/repos/mellanox/el6/
Launching xterm to have a black background and white writing
xterm -fg GhostWhite -bg grey0 
I screwed up my swap and I had to remove it.  Here is the commands that I had
to use:
mount / -o remount,rw (makes the mount read/write)
Running Toro
loadjenkinsbuild -j lustre-master -d el6 -a x86_64 -t server -n wtm-[68,69,70] -p test --cobbleruri http://localhost/cobbler_api -r
https://wiki.hpdd.intel.com/display/ENG/Provisioning+Toro+from+a+Hudson+or+Local+Build
Rosso:
Host 192.55.65.102
   User ashehata 
   ProxyCommand nc -x proxy.jf.intel.com:1080 %h %p
Login and reserve as specified on:
https://wiki.hpdd.intel.com/display/ENG/Reserving+Time+on+Toro
conman to check for status
https://wiki.hpdd.intel.com/display/ENG/Lab+User+Guide#LabUserGuide-Conman
Installing Lustre
https://wiki.hpdd.intel.com/display/PUB/Create+and+Mount+a+Lustre+Filesystem
https://wiki.hpdd.intel.com/display/PUB/Walk-thru-+Creating+a+Lustre+client
MGS,MGT,MDT
cat /etc/modprobe.conf
options lnet networks="o2ib0(ib0)"
Create a partition or use a raw disk
mkfs.lustre --fsname=lustrewt --mgs --mdt /dev/sda4
mkdir /mnt/mdt
mount -t lustre /dev/sda4 /mnt/mdt
OSS, OST
mkfs.lustre --ost --fsname=lustrewt --mgsnode=192.168.4.11@o2ib0 /dev/sda4
mkdir /mnt/oss
mount -t lustre /dev/sda4 /mnt/oss
Client
mount -t lustre 192.168.4.11@o2ib:/lustrewt /mnt
Connecting to brent through intel's vpn
=========================================
Johnn: here is what I do
[9:00:48 AM] Minh Diep: rjongalo-mobl3:.ssh minhdiep$ cat ~/bin/socks-gateway 
#!/bin/bash
case $1 in
     *.intel.com|192.168.*|127.0.*|localhost|10.*)
         METHOD="-X connect"
     ;;
     *)
         METHOD="-X 5 -x proxy-us.intel.com:1080"
     ;;
esac
  /usr/bin/nc $METHOD $* 
rjongalo-mobl3:.ssh minhdiep$ head -1 ~/.ssh/config
ProxyCommand ~/bin/socks-gateway %h %p
rjongalo-mobl3:.ssh minhdiep$
REBASEing
git rebase HEAD~<num of commits>
#make changes
git commit -av
git add <files>
#use the same change-id in the comment
git rebase --abort # aborts bad rebase
Undo a commit and redo
$ git commit ...              (1)
$ git reset --soft HEAD^      (2)
$ edit                        (3)
$ git add ....                (4)
$ git commit -c ORIG_HEAD     (5)
1    This is what you want to undo
2    This is most often done when you remembered what you just committed is
incomplete, or you misspelled your commit message, or both. Leaves working
tree as it was before "reset".
3    Make corrections to working tree files.
4    Stage changes for commit.
5    "reset" copies the old head to .git/ORIG_HEAD; redo the commit by starting
with its log message. If you do not need to edit the message further, you can
give -C option instead.
Fixing the repo if it's tracking the wrong origin
git remote show origin
1. checkout a seperate branch (you can even just create one temporarly: git checkout -b temp )
2. delete old master: git branch -D master
3. recreate it as tracking: git branch -t master origin/master
(though you may be able to just rename your current master: git branch -m master temp)
the branch that had been master will still exist, it will be whatever you you did the checkout as in step 1
you can rebase the branch created in step 1 onto master: git rebase master temp
$ git checkout master
$ git checkout -b A
# make changes
$ git add <files>
$ git commit -av
$ git log
# shows
#    commit Branch-A-C1
$ git checkout master
$ git checkout -b B
# make changes
$ git add <files>
$ git commit -av
$ git log
# shows
#    commit Branch-B-C2
$ git checkout -b C
# Make changes
$ git add <files>
$ git commit -av
$ git log
# shows
#    commit 3 Branch-C-C3
#    commit 2 Branch-B-C2
$ git checkout A
$ git checkout -b D
$ git merge C
# resolve conflicts
# make some more changes
$ git commit -av
$ git log
# shows
#   commit Branch-D-Merge+C4
#   commit 1 Branch-A-C1
Note the log for commits 3 (Branch-C-C3) and 2 (Branch-B-C2) are lost
Now if I push the following into gerrit
Patch 1 from branch A
Patch 2 from branch B
Patch 3 from branch C
Patch 4 from branch D
What are the differences that gerrit will show for Patch 4?  I'm assuming since there is no log for commits  lctl --list-commands
 lfs --list-commands

Launching xterm to have a black background and white writing
xterm -fg GhostWhite -bg grey0 

I screwed up my swap and I had to remove it.  Here is the commands that I had
to use:
mount / -o remount,rw (makes the mount read/write)


Login and reserve as specified on:
Reserve a machine

Installing Lustre
Create and Mount a Lustre Filesystem
Walk-thru- Creating a Lustre client

MGS,MGT,MDT
cat /etc/modprobe.conf
options lnet networks="o2ib0(ib0)"

Create a partition or use a raw disk

mkfs.lustre --fsname=lustrewt --mgs --mdt /dev/sda4
mkdir /mnt/mdt
mount -t lustre /dev/sda4 /mnt/mdt

OSS, OST
mkfs.lustre --ost --fsname=lustrewt --mgsnode=192.168.4.11@o2ib0 /dev/sda4
mkdir /mnt/oss
mount -t lustre /dev/sda4 /mnt/oss

Client
mount -t lustre 192.168.4.11@o2ib:/lustrewt /mnt


REBASEing
git rebase HEAD~<num of commits>
#make changes
git commit -av
git add <files>
#use the same change-id in the comment
git rebase --abort # aborts bad rebase

Undo a commit and redo

$ git commit ...              (1)
$ git reset --soft HEAD^      (2)
$ edit                        (3)
$ git add ....                (4)
$ git commit -c ORIG_HEAD     (5)

1    This is what you want to undo

2    This is most often done when you remembered what you just committed is
incomplete, or you misspelled your commit message, or both. Leaves working
tree as it was before "reset".

3    Make corrections to working tree files.

4    Stage changes for commit.

5    "reset" copies the old head to .git/ORIG_HEAD; redo the commit by starting
with its log message. If you do not need to edit the message further, you can
give -C option instead.

Fixing the repo if it's tracking the wrong origin

git remote show origin
1. checkout a separate branch (you can even just create one temporarly: git checkout -b temp )
2. delete old master: git branch -D master
3. recreate it as tracking: git branch -t master origin/master
(though you may be able to just rename your current master: git branch -m master temp)
the branch that had been master will still exist, it will be whatever you you did the checkout as in step 1
you can rebase the branch created in step 1 onto master: git rebase master temp



$ git checkout master
$ git checkout -b A
# make changes
$ git add <files>
$ git commit -av
$ git log
# shows
#    commit Branch-A-C1

$ git checkout master
$ git checkout -b B
# make changes
$ git add <files>
$ git commit -av
$ git log
# shows
#    commit Branch-B-C2

$ git checkout -b C
# Make changes
$ git add <files>
$ git commit -av
$ git log
# shows
#    commit 3 Branch-C-C3
#    commit 2 Branch-B-C2

$ git checkout A
$ git checkout -b D
$ git merge C
# resolve conflicts
# make some more changes
$ git commit -av
$ git log
# shows
#   commit Branch-D-Merge+C4
#   commit 1 Branch-A-C1

Note the log for commits 3 (Branch-C-C3) and 2 (Branch-B-C2) are lost

Now if I push the following into gerrit

Patch 1 from branch A
Patch 2 from branch B
Patch 3 from branch C
Patch 4 from branch D

What are the differences that gerrit will show for Patch 4?  I'm assuming since there is no log for commits 2 and 3 on branch D, then it'll show all the changes in Commits 2 and 3 and Merge+Change.

What I'm trying to do is only to get gerrit to show the new changes applied on Branch D only not the merged changes for branch C as well.

This way, the inspectors don't have to do double the work.

Any ideas or clarifications?.
[10/11/2013 1:24:22 PM] Keith Mannthey: So..  For 4 you check out A the make D changes.  So when you push patch 4 it will be dependant on patch 4.  The other patches will be based on Master so easy to land and review.
[10/11/2013 1:24:38 PM] Keith Mannthey: If patch 1 changes you will have to rebase patch 4.
[10/11/2013 1:24:42 PM] Keith Mannthey: Sound about right?
[10/11/2013 1:28:19 PM] Amir Shehata: Keith I'm not following.. the way I see it patch 4 will be dependent on patch 1 (because branch D has no log of the changes IDs in branches B and C), so Patch 4 will show the differences between A and B, C and D!
[10/11/2013 1:30:19 PM] Keith Mannthey: Are you diffing the branches or submitting them?
[10/11/2013 1:31:13 PM] Amir Shehata: from each bracnh when I'm planning to push a patch into gerrit
[10/11/2013 1:31:36 PM] Amir Shehata: and I want to predict how gerrit will show the difference for inspection
[10/11/2013 1:32:07 PM] Amir Shehata: so my assumption is that gerrit figures out the dependencies based on the Change-IDs in the log.  is that true?
[10/11/2013 1:34:54 PM | Edited 1:35:10 PM] Keith Mannthey: Yes it looks at the commits in your branch that you push.
[10/11/2013 1:36:46 PM] Amir Shehata: Is there a way to pull the change log when I merge?
[10/11/2013 1:37:15 PM] Amir Shehata: so when I merge C into D, is there a way to pull the changes log as well, so D now depends on A, B, and C?
[10/11/2013 1:38:44 PM] Amir Shehata: hmmm, I think at the end of the day it looks like I'll just go with a linear dependency, which is easiest.  That'll come with the disadvantage of rebasing branches when their dependencies change... but I'm not sure how to make them parallel
[10/11/2013 1:39:17 PM | Edited 1:39:31 PM] Keith Mannthey: git log will show you what is in your branch.
[10/11/2013 1:39:30 PM] Robert Read: Amir, when you created branch C, did you checkout master first or did you create it based on branch B?
[10/11/2013 1:41:43 PM] Amir Shehata: created it based on branch B
[10/11/2013 1:43:48 PM] Robert Read: so when you merged C on to D, then you will get the changes from B and C as well
[10/11/2013 1:44:01 PM] Robert Read: try this:  git log master..D
[10/11/2013 1:44:19 PM] Amir Shehata: is that from branch D or master?
[10/11/2013 1:44:31 PM] Robert Read: "master..D"
[10/11/2013 1:44:43 PM] Robert Read: that will show all changes that are on D that are not on master
[10/11/2013 1:45:04 PM] Amir Shehata: yeah, that only show the changes in A and D
[10/11/2013 1:45:16 PM] Amir Shehata: but D is the cumulative changes in B and C and D
[10/11/2013 1:45:24 PM] Amir Shehata: which is what I'm tryin to avoid
[10/11/2013 1:45:56 PM] Amir Shehata: cause It hink once I push D into gerrit it'll show all the differences which will be a superset of the changes in A, B and C
[10/11/2013 1:46:15 PM] Robert Read: yes, i think something has gone wrong somewhere
[10/11/2013 1:46:50 PM] Amir Shehata: let me try creating branch D again, merge teh conflict and commit
[10/11/2013 1:47:23 PM] Amir Shehata: problem is when I commit the merge, that script that checks for the commit format kicks in and demands a proper commit log, IE LU-nnn blah balh
[10/11/2013 1:50:20 PM] Robert Read: when i do this, i see all the changes  in my history for D
[10/11/2013 1:50:47 PM] Amir Shehata: I'm trying it again
[10/11/2013 1:52:59 PM] Robert Read: i did it again with a conflict between A and C and i still see all commits in D
[10/11/2013 1:53:29 PM] Amir Shehata: ok my bad... I did a stash/apply in the middle I think that might've screwed things up now
[10/11/2013 1:53:55 PM] Amir Shehata: now it looks like all the change IDs for all commits is there....
[10/11/2013 1:53:56 PM] Amir Shehata: thanks
[10/11/2013 1:58:10 PM | Edited 2:00:07 PM] Robert Read: you might find this helpful for visualizing what git is doing: http://rowanj.github.io/gitx/
[10/11/2013 1:58:37 PM] Robert Read: (if you're on a mac)

Debugging Kernel Crash dump
crash /boot/System.map-2.6.32.lustremaster vmlinux vmcore
--> vmlinux is located in ./BUILD/kernel-2.6.32.lustremaster/vmlinux
--> /var/crash/*/vmcore

http://people.redhat.com/anderson/crash_whitepaper/


RBP - Stack base pointer (always points in the stack)
Ex: 3585

in crash we did:
dis -r (ptlrpc_register_buld+1130)
this disassembles teh function ptlrpc_regsiter_build()
from the dissassembled code: 
0xffffffffa097f910 <ptlrpc_register_bulk+32>:   mov    %rdi,-0x90(%rbp)
using gdb:
gdb ./lustre/ptlrpc/ptlrpc.ko
l *ptlrpc_register_bulk+32
(gdb) l *ptlrpc_register_bulk+32
0x47dd0 is in ptlrpc_register_bulk
(/home/ashehata/dlc_Jun17_2013/lustre/ptlrpc/niobuf.c:296).
291     /**
292      * Register bulk at the sender for later transfer.
293      * Returns 0 on success or error code.
294      */
295     int ptlrpc_register_bulk(struct ptlrpc_request *req)
296     {
297             struct ptlrpc_bulk_desc *desc = req->rq_bulk;
298             lnet_process_id_t peer;
299             int rc = 0;
300             int rc2;
Since 296 is before the function starts that tells us that the it's moving
the value of this parameter on the stack:
from the register dump:
    [exception RIP: ptlrpc_register_bulk+1130]
    RIP: ffffffffa097fd5a  RSP: ffff8801a45c1b10  RFLAGS: 00010282
    RAX: 0000000000000000  RBX: ffff8800931ba000  RCX: 00051e339ee2b698
    RDX: 0000000000000000  RSI: ffffffffa09eb2c0  RDI: ffffffffa0a2d520
    RBP: ffff8801a45c1bd0   R8: 0000000000000000   R9: 00000000fffffff4
    R10: 0000000000000001  R11: 0000000000000000  R12: 00000000fffffff4
    R13: 00051e339ee2b698  R14: 0000000000000000  R15: 00051e339ee2b698
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
RBP = ffff8801a45c1bd0
RBP - 90 = ffff880101d0a000
in crash we do:
struct ptlrpc_request ffff880101d0a000
this dumps the memory of the structure.
From there we can tell the rq_bulk pointer
then we do:
struct ptlrpc_bulk_desc 0xffff8800931ba000
which dumps the memory for this structure
from there we see that bd_export = 0x0 & bd_import = some value
After examining the code: __ptlrpc_free_bulk() checks if bd_export is not NULL
before freeing it.  Which creates the assumption that bd_export can be NULL
when bd_import is not NULL and vice versa.  Likely depending on the type of
bulk request.
Although in the code for ptlrpc_register_bulk() it grabs the peer:
peer = desc->bd_import->imp_connection->c_peer;
which assumes that bd_import is never NULL when we hit this function.  So it
could be that this function is only called when bd_import is not NULL

UNUMOUNT LUSTRE
- fuser -m <path to device or directory>
--> tells you the process pid using the directory
- kill the process or make the process to stop using the resource
- umount <dir>


Starting the VMware's network config:
cd c:\Program Files (x86)\VMware\VMware Player
rundll32.exe vmnetui.dll VMNetUI_ShowStandalone


GIT Cheat sheet

To diff files in staging area
git diff --cached

rebasing
commit a change ontop of what you already commited
# git rebase -i origin/master
mark the newest one with "fixup" and quit

Changing the comments on a commit
http://stackoverflow.com/questions/179123/how-do-i-edit-an-incorrect-commit-message-in-git


When installing a new system sometimes packages fail dependency
rpm -ivh --force --nodeps libss-1.42.7.wc1-7.el6.x86_64.rpm
rpm -ivh --force --nodeps libcom_err-1.42.7.wc1-7.el6.x86_64.rpm

Using llog_*:
------------
lctl > dl
  0 UP osd-ldiskfs lustrewt-MDT0000-osd lustrewt-MDT0000-osd_UUID 8
  1 UP mgs MGS MGS 5
  2 UP mgc MGC192.168.120.140@tcp ae08a1ec-b8b4-8374-cf76-b807726ba094 5
  3 UP mds MDS MDS_uuid 3
  4 UP lod lustrewt-MDT0000-mdtlov lustrewt-MDT0000-mdtlov_UUID 4
  5 UP mdt lustrewt-MDT0000 lustrewt-MDT0000_UUID 5
  6 UP mdd lustrewt-MDD0000 lustrewt-MDD0000_UUID 4
  7 UP qmt lustrewt-QMT0000 lustrewt-QMT0000_UUID 4
  8 UP lwp lustrewt-MDT0000-lwp-MDT0000 lustrewt-MDT0000-lwp-MDT0000_UUID 5
lctl > device 1
lctl > llog_catlist
config log: $lustrewt-client
config log: $lustrewt-MDT0000
lctl > llog_print $lustrewt-client
- { event: attach, device: lustrewt-clilov, type: lov, UUID: lustrewt-clilov_UUID }
- { event: setup, device: lustrewt-clilov, UUID:  }
- { event: attach, device: lustrewt-clilmv, type: lmv, UUID: lustrewt-clilmv_UUID }
- { event: setup, device: lustrewt-clilmv, UUID:  }
- { event: add_uuid, nid: 192.168.120.140@tcp(0x20000c0a8788c), node: 192.168.120.140@tcp }
- { event: attach, device: lustrewt-MDT0000-mdc, type: mdc, UUID: lustrewt-clilmv_UUID }
- { event: setup, device: lustrewt-MDT0000-mdc, UUID: lustrewt-MDT0000_UUID, node: 192.168.120.140@tcp }
- { event: add_mdc, device: lustrewt-clilmv, mdt: lustrewt-MDT0000_UUID, index: 0, gen: 1, UUID: lustrewt-MDT0000-mdc_UUID }
- { event: new_profile, name: lustrewt-client, lov: lustrewt-clilov, lmv: lustrewt-clilmv }
lctl > llog_print $lustrewt-MDT0000
- { event: attach, device: lustrewt-MDT0000-mdtlov, type: lov, UUID: lustrewt-MDT0000-mdtlov_UUID }
- { event: setup, device: lustrewt-MDT0000-mdtlov, UUID:  }
- { event: attach, device: lustrewt-MDT0000, type: mdt, UUID: lustrewt-MDT0000_UUID }
- { event: new_profile, name: lustrewt-MDT0000, lov: lustrewt-MDT0000-mdtlov }
- { event: setup, device: lustrewt-MDT0000, UUID: lustrewt-MDT0000_UUID, node: 0, options: lustrewt-MDT0000-mdtlov, failout: f }
lctl > llog_cancel
cancel one record in log.
This command supports both positional and optional arguments
usage (positional args): llog_cancel <catalog id|catalog name> [log id] <index>
usage (optional args): llog_cancel --catalog <catalog id|catalog name> --log_id <log_id> --log_idx <index>
lctl > llog_cancel --catalog 1 --log_id 1 --log_idx 1
OBD_IOC_LLOG_CANCEL failed: Invalid argument
Restart the Jenkins Build
[10:50:32 AM] Amir Shehata: Hey Joshua, that would be great...
[10:50:38 AM] Amir Shehata: my account is "ashehata"
[10:51:31 AM] Joshua Kugler: V6xIWmAEVbBclMuL303QTPWyPlLOszyC
[10:54:42 AM] Amir Shehata: great, thanks that worked
[10:54:53 AM] Amir Shehata: How do I restart a buid now?_id> --log_idx <index>
lctl > llog_cancel --catalog 1 --log_id 1 --log_idx 1
OBD_IOC_LLOG_CANCEL failed: Invalid argument


Restart the Jenkins Build
[10:55:18 AM] Joshua Kugler:
http://build.whamcloud.com/gerrit_manual_trigger/?
[10:56:09 AM | Edited 10:56:12 AM] Amir Shehata: How do I trigger a gerrit
event?
[10:56:49 AM] Joshua Kugler: search for the review number in there.
[10:56:55 AM] Joshua Kugler: Then click on the patch number you want to build.
[10:56:59 AM] Joshua Kugler: And click trigger.
Accessing data uploaded to ftp site for Jenkins bugs:
1. add the following to ~/.ssh/config
Host eric.whamcloud.com
   User ashehata
   ProxyCommand nc -x proxy.jf.intel.com:1080 %h %p
2. ssh eric.whamcloud.com
3. cd /scratch/ftp/uploads

debugfs Usage

[root@localhost ~]# debugfs -c -R dump CONFIGS/lustrewt-MDT0000 ~/conf_params /dev/sdb
debugfs 1.42.6.wc2 (10-Dec-2012)
CONFIGS/lustrewt-MDT0000: No such file or directory while opening filesystem
dump: Usage: dump_inode [-p] <file> <output_file>
[root@localhost ~]# debugfs /dev/sdb 
debugfs 1.42.6.wc2 (10-Dec-2012)
debugfs:  ls CONFIGS
 6291457  (12) .    2  (12) ..    6291458  (20) mountdata   
 80  (24) lustrewt-client    81  (4028) lustrewt-MDT0000   
debugfs:  cd CONFIGS
debugfs:  dump_inode lustrewt-MDT0000 /tmp/xyz
debugfs:  [root@localhost ~]# llog_reader /tmp/xyz 
Header size : 8192
Time : Sun Aug 10 23:59:56 2014
Number of records: 9
Target uuid : config_uuid 
-----------------------
#01 (224)marker   2 (flags=0x01, v2.5.53.0) lustrewt-MDT0000-mdtlov 'lov setup' Sun Aug 10 23:59:56 2014-
#02 (136)attach    0:lustrewt-MDT0000-mdtlov  1:lov  2:lustrewt-MDT0000-mdtlov_UUID  
#03 (176)lov_setup 0:lustrewt-MDT0000-mdtlov  1:(struct lov_desc)
                uuid=lustrewt-MDT0000-mdtlov_UUID  stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1
#04 (224)marker   2 (flags=0x02, v2.5.53.0) lustrewt-MDT0000-mdtlov 'lov setup' Sun Aug 10 23:59:56 2014-
#05 (224)marker   3 (flags=0x01, v2.5.53.0) lustrewt-MDT0000 'add mdt' Sun Aug 10 23:59:56 2014-
#06 (128)attach    0:lustrewt-MDT0000  1:mdt  2:lustrewt-MDT0000_UUID  
#07 (120)mount_option 0:  1:lustrewt-MDT0000  2:lustrewt-MDT0000-mdtlov  
#08 (168)setup     0:lustrewt-MDT0000  1:lustrewt-MDT0000_UUID  2:0  3:lustrewt-MDT0000-mdtlov  4:f  
#09 (224)marker   3 (flags=0x02, v2.5.53.0) lustrewt-MDT0000 'add mdt' Sun Aug 10 23:59:56 2014-

Debugging test failures:
Amir: there are two strategies I use for finding existing bugs that may be related:
- search in Maloo for FAIL or TIMEOUT results on the same test/subtest to find a previously assigned bug, then see if the symptoms match the current failure
- search in Jira for the test script name and failed test number to find bugs previously raised on the issue to see if the symptoms match the current failure
[9:27:46 AM] Andreas Dilger: Then, when you've found the existing bug(s) that cause your test failure(s),  you should Associate the failures in Maloo with the Jira ticket number (to make it easier for others to identify this failure in the future, or Raise a new bug if you think the failure is unrelated to any existing bugs
[9:29:24 AM] Andreas Dilger: Also, it is good to update the Jira ticket with a link to the test failure in Maloo, both to give the bug fixer more information, and to make it more clear when a test is failing repeatedly.

while this takes a bit of time, without doing this we don't make progress on finding and fixing intermittent test failures, and then everyone just ends up resubmitting all of their patches in a loop and never making progress.
[9:30:41 AM] Andreas Dilger: also, of course, properly investigating test failures will expose the bugs in the patches being submitted, which sometimes seem unrelated to the patch being tested, but sometimes are caused by the patches nonetheless

undoing a commit --amend
use the ref-log:

git branch fixing-things HEAD@{1}
git reset fixing-things

git rebase on latest and greatest
1. git pull on master
2. git rebase -i master dlc_6

How lustre build works

The autoconf program produces a configure script from either configure.in or configure.ac (see note below).
The automake program produces a Makefile.in from a Makefile.am.
The configure script is run to produce one or more Makefile files from Makefile.in files.
The make program uses the Makefile to compile the program.

In the main directory:
./configure.ac -> LB_CONFIGURE
LB_CONFIGURE is defined in ./config/lustre-build.m4
LB_CONFIGURE -> LN_CONFIGURE defined in ./lnet/autoconf/lustre-lnet.m4
LN_CONFIGURE -> multiple functions to configure lnet
LB_CONFIGURE -> LB_CONFIG_RPMBUILD_OPTIONS


Yes, use the following:
Test-Parameters: testlist=lnet-selftest,lnet-selftest,lnt-selftest, …
Repeat lnet-selftest as many times as you want it run. I don't know of a better way of specifying the same test other than writing it out.
[10/3/2014 2:02:07 PM] James Nunez: You should also use "fortestonly" in the Test-Parameters list. 


When creating an initial VM and installing lustre some packages might not be uptodate and dependencies prevent update:

 


[ashehata@localhost ~]$ cd Downloads/

...

[root@localhost Downloads]# 


To turn on Debug for everything on lustre, for code tracing

...