...
crash /boot/System.map-2.6.32.lustremaster vmlinux vmcore
--> vmlinux is located in ./BUILD/kernel-2.6.32.lustremaster/vmlinux
--> /var/crash/*/vmcore
(NOTE: If there is a version mismatch in the system.map and /BUILD/<kernel> then try without adding the system.map )
Collecting bits
- download and install kernel debuginfo and debuginfo-common rpms.
- If you know the gerrit review you're looking at then you can follow the links to get to the necessary rpms. The server builds have the kernel-debuginfo and kernel-debuginfo-common rpms
- for client which use the stock kernel, you'll need to get that off centos debuginfo site
- EX: centos 7: http://debuginfo.centos.org/7/x86_64/
- download and install the lustre-debuginfo for the appropriate build
- Again for a gerrit review you can get to it through the build artifacts link on build.hpdd.intel.com
- Instead of installation you can extract the rpms in your local debug directory
- rpm2cpio <rpm> | cpio -idmv
...
It is often necessary to print certain structures and their values for testing. In order to do that we need to find the pointer to the structure memory. To accomplish that we need some understanding of AMD64 assembly and registry usage:
reference x86-64 abi
alternative reference http://6.s081.scripts.mit.edu/sp18/x86-64-architecture-guide.html
First, disassemble function
...
The same logic is done in the o2iblnd, follow the logic in kiblnd_setup_rd_iov()
...
Another debugging example:
| Code Block |
|---|
most#### x86-64Try assemblerand instructions performfind the operationCPT onnumber thebeing first argument and storespassed on the resultstack to lnet_initiate_peer_discovery() crash> bt PID: 8874 TASK: ffff881ff97d3f40 CPU: 7 COMMAND: "mdt_rdpg02_014" #0 [ffff881e4a6b3660] machine_kexec at ffffffff8105d77b #1 [ffff881e4a6b36c0] __crash_kexec at ffffffff81108742 #2 [ffff881e4a6b3790] panic at ffffffff816a863f #3 [ffff881e4a6b3810] __warn at ffffffff8108ae7a #4 [ffff881e4a6b3850] warn_slowpath_fmt at ffffffff8108aedf #5 [ffff881e4a6b38b8] __list_add at ffffffff8134405c #6 [ffff881e4a6b38e0] lnet_initiate_peer_discovery at ffffffffc0b07bc7 [lnet] #7 [ffff881e4a6b3918] lnet_handle_find_routed_path at ffffffffc0b0b90d [lnet] #8 [ffff881e4a6b3998] lnet_select_pathway at ffffffffc0b0c2c0 [lnet] #9 [ffff881e4a6b3a98] lnet_send at ffffffffc0b0d115 [lnet] #10 [ffff881e4a6b3ab8] LNetPut at ffffffffc0b0d56c [lnet] #11 [ffff881e4a6b3b18] ptl_send_buf at ffffffffc0e00ff6 [ptlrpc] #12 [ffff881e4a6b3bd0] ptlrpc_send_reply at ffffffffc0e043ab [ptlrpc] #13 [ffff881e4a6b3c48] target_send_reply_msg at ffffffffc0dc335e [ptlrpc] #14 [ffff881e4a6b3c68] target_send_reply at ffffffffc0dcd7de [ptlrpc] #15 [ffff881e4a6b3cc0] tgt_request_handle at ffffffffc0e73d11 [ptlrpc] #16 [ffff881e4a6b3d50] ptlrpc_server_handle_request at ffffffffc0e16c6b [ptlrpc] #17 [ffff881e4a6b3df0] ptlrpc_main at ffffffffc0e1a63a [ptlrpc] #18 [ffff881e4a6b3ec8] kthread at ffffffff810b4031 #19 [ffff881e4a6b3f50] ret_from_fork at ffffffff816c155d crash> disas lnet_handle_find_routed_path 2 0xffffffffc0b0b740 <+0>:»····nopl 0x0(%rax,%rax,1) 3 0xffffffffc0b0b745 <+5>:»····push %rbp 4 0xffffffffc0b0b746 <+6>:»····mov %rsp,%rbp 5 0xffffffffc0b0b749 <+9>:»····push %r15 6 0xffffffffc0b0b74b <+11>:»···mov %rdi,%r15 <--- RDI is used to pass the first argument gets stored in r15 7 0xffffffffc0b0b74e <+14>:»···push %r14 8 0xffffffffc0b0b750 <+16>:»···push %r13 9 0xffffffffc0b0b752 <+18>:»···push %r12 10 0xffffffffc0b0b754 <+20>:»···push %rbx 2001 static int 2002 lnet_handle_find_routed_path(struct lnet_send_data *sd, 2003 »·······»·······»······· lnet_nid_t dst_nid, 2004 »·······»·······»······· struct lnet_peer_ni **gw_lpni, 2005 »·······»·······»······· struct lnet_peer **gw_peer) 2076 »·······sd->sd_msg->msg_src_nid_param = sd->sd_src_nid; 2077 »·······rc = lnet_initiate_peer_discovery(gwni, sd->sd_msg, sd->sd_rtr_nid, 2078 »·······»·······»·······»·······»······· sd->sd_cpt); crash> disas lnet_initiate_peer_discovery 1 Dump of assembler code for function lnet_initiate_peer_discovery: 2 0xffffffffc0b07aa0 <+0>:»····nopl 0x0(%rax,%rax,1) 3 0xffffffffc0b07aa5 <+5>:»····push %rbp 4 0xffffffffc0b07aa6 <+6>:»····mov %rsp,%rbp 5 0xffffffffc0b07aa9 <+9>:»····push %r15 6 0xffffffffc0b07aab <+11>:»···push %r14 7 0xffffffffc0b07aad <+13>:»···push %r13 8 0xffffffffc0b07aaf <+15>:»···push %r12 9 0xffffffffc0b07ab1 <+17>:»···push %rbx # r15 is being saved on the stack by lnet_initiate_peer_discovery. So now we look there crash> bt -f #6 [ffff881e4a6b38e0] lnet_initiate_peer_discovery at ffffffffc0b07bc7 [lnet] ffff881e4a6b38e8: ffff881f4be670c0 (%rbx) ffff881f4be67000 (%r12) ffff881e4a6b38f8: ffff881f8d3b2400 (%r13) ffff881f8d33bb40 (%r14) ffff881e4a6b3908: ffff881e4a6b39f8 (%r15) ffff881e4a6b3990 (%rbp) ffff881e4a6b3918: ffffffffc0b0b90d (return address in caller) crash> lnet_send_data ffff881e4a6b39f8 struct lnet_send_data { sd_best_ni = 0xffff881f8d3b2200, sd_best_lpni = 0xffff881f9575f600, sd_final_dst_lpni = 0xffff881f9575f600, sd_peer = 0xffff881ff7f4cd00, sd_gw_peer = 0x0, sd_gw_lpni = 0x0, sd_peer_net = 0x0, sd_msg = 0xffff880e4cb6f400, sd_dst_nid = 3659191877107818, sd_src_nid = 1407546850803763, sd_rtr_nid = 18446744073709551615, sd_cpt = 2, sd_md_cpt = 0, sd_send_case = 25 } |
Some Assembler Tidbits
| Code Block |
|---|
most x86-64 assembler instructions perform the operation on the first argument and stores the result in the second argumentin the second argument. Example: mov $0xffffffff,%eax moves the value 0xffffffff into the register %eax |
GDB Scripts
GDB commands can be used to create scripts to dissect the crash dump. Attached are a few scripts, courtesy of Alexey Lyashkov . I've also added more functionality to them. Also attached is a program which can extrack lustre logs from the dump file: Crash-tools.
Resources
Below are some resources that explain the registers and the architecture.
- SYSV ABI (calling conventions): http://wiki.osdev.org/System_V_ABI and from there https://www.uclibc.org/docs/psABI-x86_64.pdf
- page 21
- Developer info, AMD: http://developer.amd.com/resources/developer-guides-manuals/
- in particular: http://support.amd.com/TechDocs/24594.pdf ("AMD64 Architecture Programmer’s Manual Volume 3: General Purpose and System Instructions)
- Intel side: https://software.intel.com/en-us/articles/intel-sdm
- In particular https://software.intel.com/sites/default/files/managed/a4/60/325383-sdm-vol-2abcd.pdf (Intel® 64 and IA-32 architectures software developer's manual combined volumes 2A, 2B, 2C, and 2D: Instruction set reference, A-Z)
- Assembler cheat-sheet available.
...