Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

crash /boot/System.map-2.6.32.lustremaster vmlinux vmcore
--> vmlinux is located in ./BUILD/kernel-2.6.32.lustremaster/vmlinux
--> /var/crash/*/vmcore

(NOTE: If there is a version mismatch in the system.map and /BUILD/<kernel> then try without adding the system.map )

Collecting bits

  1. download and install kernel debuginfo and debuginfo-common rpms. 
    1. If you know the gerrit review you're looking at then you can follow the links to get to the necessary rpms. The server builds have the kernel-debuginfo and kernel-debuginfo-common rpms
      1. EXAScaler Release Versions
    2. for client which use the stock kernel, you'll need to get that off centos debuginfo site
      1. EX: centos 7: http://debuginfo.centos.org/7/x86_64/
  2. download and install the lustre-debuginfo for the appropriate build
    1. Again for a gerrit review you can get to it through the build artifacts link on build.hpdd.intel.com
  3. Instead of installation you can extract the rpms in your local debug directory
    1. rpm2cpio <rpm> | cpio -idmv

...

It is often necessary to print certain structures and their values for testing. In order to do that we need to find the pointer to the structure memory. To accomplish that we need some understanding of AMD64 assembly and registry usage:

reference x86-64 abi

alternative reference http://6.s081.scripts.mit.edu/sp18/x86-64-architecture-guide.html

First, disassemble function

...

The same logic is done in the o2iblnd, follow the logic in kiblnd_setup_rd_iov()

...

Another debugging example:

Code Block
most#### x86-64Try assemblerand instructions performfind the operationCPT onnumber thebeing first argument
and storespassed on the resultstack to lnet_initiate_peer_discovery()
crash> bt
PID: 8874   TASK: ffff881ff97d3f40  CPU: 7   COMMAND: "mdt_rdpg02_014"
 #0 [ffff881e4a6b3660] machine_kexec at ffffffff8105d77b
 #1 [ffff881e4a6b36c0] __crash_kexec at ffffffff81108742
 #2 [ffff881e4a6b3790] panic at ffffffff816a863f
 #3 [ffff881e4a6b3810] __warn at ffffffff8108ae7a
 #4 [ffff881e4a6b3850] warn_slowpath_fmt at ffffffff8108aedf
 #5 [ffff881e4a6b38b8] __list_add at ffffffff8134405c
 #6 [ffff881e4a6b38e0] lnet_initiate_peer_discovery at ffffffffc0b07bc7 [lnet]
 #7 [ffff881e4a6b3918] lnet_handle_find_routed_path at ffffffffc0b0b90d [lnet]
 #8 [ffff881e4a6b3998] lnet_select_pathway at ffffffffc0b0c2c0 [lnet]
 #9 [ffff881e4a6b3a98] lnet_send at ffffffffc0b0d115 [lnet]
#10 [ffff881e4a6b3ab8] LNetPut at ffffffffc0b0d56c [lnet]
#11 [ffff881e4a6b3b18] ptl_send_buf at ffffffffc0e00ff6 [ptlrpc]
#12 [ffff881e4a6b3bd0] ptlrpc_send_reply at ffffffffc0e043ab [ptlrpc]
#13 [ffff881e4a6b3c48] target_send_reply_msg at ffffffffc0dc335e [ptlrpc]
#14 [ffff881e4a6b3c68] target_send_reply at ffffffffc0dcd7de [ptlrpc]
#15 [ffff881e4a6b3cc0] tgt_request_handle at ffffffffc0e73d11 [ptlrpc]
#16 [ffff881e4a6b3d50] ptlrpc_server_handle_request at ffffffffc0e16c6b [ptlrpc]
#17 [ffff881e4a6b3df0] ptlrpc_main at ffffffffc0e1a63a [ptlrpc]
#18 [ffff881e4a6b3ec8] kthread at ffffffff810b4031
#19 [ffff881e4a6b3f50] ret_from_fork at ffffffff816c155d

crash> disas  lnet_handle_find_routed_path

  2    0xffffffffc0b0b740 <+0>:»····nopl   0x0(%rax,%rax,1)
  3    0xffffffffc0b0b745 <+5>:»····push   %rbp
  4    0xffffffffc0b0b746 <+6>:»····mov    %rsp,%rbp
  5    0xffffffffc0b0b749 <+9>:»····push   %r15
  6    0xffffffffc0b0b74b <+11>:»···mov    %rdi,%r15 <--- RDI is used to pass the first argument gets stored in r15
  7    0xffffffffc0b0b74e <+14>:»···push   %r14
  8    0xffffffffc0b0b750 <+16>:»···push   %r13
  9    0xffffffffc0b0b752 <+18>:»···push   %r12
 10    0xffffffffc0b0b754 <+20>:»···push   %rbx

2001 static int
2002 lnet_handle_find_routed_path(struct lnet_send_data *sd,
2003 »·······»·······»·······     lnet_nid_t dst_nid,
2004 »·······»·······»·······     struct lnet_peer_ni **gw_lpni,
2005 »·······»·······»·······     struct lnet_peer **gw_peer)

2076 »·······sd->sd_msg->msg_src_nid_param = sd->sd_src_nid;
2077 »·······rc = lnet_initiate_peer_discovery(gwni, sd->sd_msg, sd->sd_rtr_nid,
2078 »·······»·······»·······»·······»·······  sd->sd_cpt);


crash> disas lnet_initiate_peer_discovery
  1 Dump of assembler code for function lnet_initiate_peer_discovery:
  2    0xffffffffc0b07aa0 <+0>:»····nopl   0x0(%rax,%rax,1)
  3    0xffffffffc0b07aa5 <+5>:»····push   %rbp
  4    0xffffffffc0b07aa6 <+6>:»····mov    %rsp,%rbp
  5    0xffffffffc0b07aa9 <+9>:»····push   %r15
  6    0xffffffffc0b07aab <+11>:»···push   %r14
  7    0xffffffffc0b07aad <+13>:»···push   %r13
  8    0xffffffffc0b07aaf <+15>:»···push   %r12
  9    0xffffffffc0b07ab1 <+17>:»···push   %rbx

# r15 is being saved on the stack by lnet_initiate_peer_discovery. So now we look there

crash> bt -f
 #6 [ffff881e4a6b38e0] lnet_initiate_peer_discovery at ffffffffc0b07bc7 [lnet]
    ffff881e4a6b38e8: ffff881f4be670c0 (%rbx) ffff881f4be67000 (%r12) 
    ffff881e4a6b38f8: ffff881f8d3b2400 (%r13) ffff881f8d33bb40 (%r14) 
    ffff881e4a6b3908: ffff881e4a6b39f8 (%r15) ffff881e4a6b3990 (%rbp)
    ffff881e4a6b3918: ffffffffc0b0b90d (return address in caller)
crash> lnet_send_data ffff881e4a6b39f8
struct lnet_send_data {
  sd_best_ni = 0xffff881f8d3b2200, 
  sd_best_lpni = 0xffff881f9575f600, 
  sd_final_dst_lpni = 0xffff881f9575f600, 
  sd_peer = 0xffff881ff7f4cd00, 
  sd_gw_peer = 0x0, 
  sd_gw_lpni = 0x0, 
  sd_peer_net = 0x0, 
  sd_msg = 0xffff880e4cb6f400, 
  sd_dst_nid = 3659191877107818, 
  sd_src_nid = 1407546850803763, 
  sd_rtr_nid = 18446744073709551615, 
  sd_cpt = 2, 
  sd_md_cpt = 0, 
  sd_send_case = 25
}


Some Assembler Tidbits

Code Block
most x86-64 assembler instructions perform the operation on the first argument
and stores the result in the second argumentin the second argument.
 
Example:
mov $0xffffffff,%eax
 
moves the value 0xffffffff into the register %eax

...

GDB commands can be used to create scripts to dissect the crash dump. Attached are a few scripts, courtesy of Alexey Lyashkov . These scripts can be used as a basis of building other useful gdb functions that can assist in analyzing a crash dump.link.gdb, lnet-cpt.gdb, lnet.gdb, o2ib.gdbI've also added more functionality to them. Also attached is a program which can extrack lustre logs from the dump file: Crash-tools.

Resources

Below are some resources that explain the registers and the architecture.

...