Linux Applications Debugging Techniques/Core files
A core dump is a snapshot of the memory of the program, processor registers including program counter and stack pointer and other OS and memory management information, taken at a certain point in time. As such, they are invaluable for capturing the state of rare occurring races and abnormal conditions.
What is more, such rarities will be found usually on heavily used production or QA machines where gdb is not available, nor is access easy to the machine. Worse, the heaviest users are usually the biggest clients (moneywise...). As such, it is important to get as much forensic data as available, and plan for it.
One can force a core dump from within the program or from outside at chosen moments. What a core cannot tell is how the application ended up in that state: the core is no replacement for a good log. Verbose logs and core files go hand in glove.
Prerequisites
[edit | edit source]For a process to be able to dump core, a few prerequisites have to be met:
- the set core size limit should permit it (see the man page for ulimit). E.g.: ulimit -c unlimited. It can also be set from within the program.
- the process to dump core should have write permissions to the folder where the core is to be dumped to (usually the current working directory of the process)
Where is my core?
[edit | edit source]Usually the core is dumped in the current working directory of the process. But the OS can be configured otherwise:
# cat /proc/sys/kernel/core_pattern
%h-%e-%p.core
# sysctl -w "kernel.core_pattern=/var/cores/%h-%e-%p.core"
Dumping core from outside the program
[edit | edit source]One possibility is with gdb, if available. This will let the program running:
(gdb) attach <pid>
(gdb) generate-core-file <optional-filename>
(gdb) detach
Another possibility is to signal the process. This will terminate it, assuming the signal is not caught by a custom signal handler:
kill -s SIGABRT <pid>
Dumping core from within the program
[edit | edit source]Again, there are two possibilities: dump core and terminate the program or dump and continue:
void dump_core_and_terminate(void)
{
/*
* Alternative:
* char *p = NULL; *p = 0;
*/
abort();
}
void dump_core_and_continue(void)
{
pid_t child = fork();
if (child < 0) {
/*Parent: error*/
}
else if (child == 0) {
dump_core_and_terminate(); /*Child*/
}
else {
/*Parent: continue*/
}
}
Note: use dump_core_and_continue() with care: in a multi-threaded program, the forked child will have only a clone of the parent thread that called fork() [Butenhof Ch5; re: threads & fork]. This has number of implications, in particular with respect to mutexes, but the particular point here is that the core that the child will dump will contain information only for one thread. If you need to dump a core with all threads without aborting the process, try to use the google core dumper library, even if it has not been maintained for years.
Shared libraries
[edit | edit source]To obtain a good call stack, it is important that the gdb loads the same libraries that were loaded by the program that generated the core dump. If the machine we are analyzing the core has different libraries (or has them in different places) from the machine the core was dumped, then copy over the libraries to the analyzing machine, in a way that mirrors the dump machine. For instance:
$ tree .
.
|-- juggler-29964.core
|-- lib64
| |-- ld-linux-x86-64.so.2
| |-- libc.so.6
| |-- libm.so.6
| |-- libpthread.so.0
| `-- librt.so.1
...
At the gdb prompt:
(gdb) set solib-absolute-prefix ./
(gdb) set solib-search-path .
(gdb) file ../../../../../threadpool/bin.v2/libs/threadpool/example/juggler/gcc-4.1.2/debug/link-static/threading-multi/juggler
Reading symbols from /home/aurelian_melinte/threadpool/threadpool-0_2_5-src/threadpool/bin.v2/libs/threadpool/example/juggler/gcc-4.1.2/debug/link-static/threading-multi/juggler...done.
(gdb) core-file juggler-29964.core
Reading symbols from ./lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for ./lib64/librt.so.1
Reading symbols from ./lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for ./lib64/libm.so.6
Reading symbols from ./lib64/libpthread.so.0...(no debugging symbols found)...done.
Loaded symbols for ./lib64/libpthread.so.0
Reading symbols from ./lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for ./lib64/libc.so.6
Reading symbols from ./lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for ./lib64/ld-linux-x86-64.so.2
Core was generated by `../../../../bin.v2/libs/threadpool/example/juggler/gcc-4.1.2/debug/link-static/'.
Program terminated with signal 6, Aborted.
#0 0x0000003684030265 in raise () from ./lib64/libc.so.6
(gdb) frame 2
#2 0x0000000000404ae1 in dump_core_and_terminate () at juggler.cpp:30
Source Code
[edit | edit source]To point the debugger to the source files:
(gdb) set substitute-path /from/path1 /to/path1
(gdb) set substitute-path /from/path2 /to/path2
analyze-cores
[edit | edit source]Here is a script that will generate a basic report per core file. Useful the days when cores are raining on you:
#!/bin/bash
#
# A script to extract core-file informations
#
if [ $# -ne 1 ]
then
echo "Usage: `basename $0` <for-binary-image>"
exit -1
else
binimg=$1
fi
# Today and yesterdays cores
cores=`find . -name '*.core' -mtime -1`
#cores=`find . -name '*.core'`
for core in $cores
do
gdblogfile="$core-gdb.log"
rm $gdblogfile
bininfo=`ls -l $binimg`
coreinfo=`ls -l $core`
gdb -batch \
-ex "set logging file $gdblogfile" \
-ex "set logging on" \
-ex "set pagination off" \
-ex "printf \"**\n** Process info for $binimg - $core \n** Generated `date`\n\"" \
-ex "printf \"**\n** $bininfo \n** $coreinfo\n**\n\"" \
-ex "file $binimg" \
-ex "core-file $core" \
-ex "bt" \
-ex "info proc" \
-ex "printf \"*\n* Libraries \n*\n\"" \
-ex "info sharedlib" \
-ex "printf \"*\n* Memory map \n*\n\"" \
-ex "info target" \
-ex "printf \"*\n* Registers \n*\n\"" \
-ex "info registers" \
-ex "printf \"*\n* Current instructions \n*\n\"" -ex "x/16i \$pc" \
-ex "printf \"*\n* Threads (full) \n*\n\"" \
-ex "info threads" \
-ex "bt" \
-ex "thread apply all bt full" \
-ex "printf \"*\n* Threads (basic) \n*\n\"" \
-ex "info threads" \
-ex "thread apply all bt" \
-ex "printf \"*\n* Done \n*\n\"" \
-ex "quit"
done
An alternative worth exploring is btparser.
Canned user-defined commands
[edit | edit source]Same reporting functionality can be canned for gdb:
define procinfo
printf "**\n** Process Info: \n**\n"
info proc
printf "*\n* Libraries \n*\n"
info sharedlib
printf "*\n* Memory Map \n*\n"
info target
printf "*\n* Registers \n*\n"
info registers
printf "*\n* Current Instructions \n*\n"
x/16i $pc
printf "*\n* Threads (basic) \n*\n"
info threads
thread apply all bt
end
document procinfo
Infos about the debugee.
end
define analyze
procinfo
printf "*\n* Threads (full) \n*\n"
info threads
bt
thread apply all bt full
end
analyze-pid
[edit | edit source]A script that will generate a basic report and a core file for a running process:
#!/bin/bash
#
# A script to generate a core and a status report for a running process.
#
if [ $# -ne 1 ]
then
echo "Usage: `basename $0` <PID>"
exit -1
else
pid=$1
fi
gdblogfile="analyze-$pid.log"
rm $gdblogfile
corefile="core-$pid.core"
gdb -batch \
-ex "set logging file $gdblogfile" \
-ex "set logging on" \
-ex "set pagination off" \
-ex "printf \"**\n** Process info for PID=$pid \n** Generated `date`\n\"" \
-ex "printf \"**\n** Core: $corefile \n**\n\"" \
-ex "attach $pid" \
-ex "bt" \
-ex "info proc" \
-ex "printf \"*\n* Libraries \n*\n\"" \
-ex "info sharedlib" \
-ex "printf \"*\n* Memory map \n*\n\"" \
-ex "info target" \
-ex "printf \"*\n* Registers \n*\n\"" \
-ex "info registers" \
-ex "printf \"*\n* Current instructions \n*\n\"" -ex "x/16i \$pc" \
-ex "printf \"*\n* Threads (full) \n*\n\"" \
-ex "info threads" \
-ex "bt" \
-ex "thread apply all bt full" \
-ex "printf \"*\n* Threads (basic) \n*\n\"" \
-ex "info threads" \
-ex "thread apply all bt" \
-ex "printf \"*\n* Done \n*\n\"" \
-ex "generate-core-file $corefile" \
-ex "detach" \
-ex "quit"
Thread Local Storage
[edit | edit source]TLS data is rather difficult to access with gdb in the core files, and __tls_get_addr() cannot be called.