System Crashes (Overview)
System crashes can occur due to hardware malfunctions, I/O problems, and software errors.
If the system crashes, it will display an error message on the console,
and then write a copy of its physical memory to the dump
device. The system will then reboot automatically. When the system reboots, the savecore
command is executed to retrieve the data from the dump device and write
the saved crash dump to your savecore directory. The saved crash dump files
provide invaluable information to your support provider to aid in diagnosing the problem.
ZFS Support for Swap Devices
If you select a ZFS root file system during an initial installation or
use live upgrade to migrate from a UFS root file system to
a ZFS root file system, a swap area is created on a ZFS
volume in the ZFS root pool. The swap area size is based on
1/4 to 1/2 of physical memory.
For example:
# swap -l
swapfile dev swaplo blocks free
/dev/zvol/dsk/rpool/swap 253,3 16 8257520 8257520
A ZFS volume is also created for the dump device. Currently, the
swap area and the dump device must reside on separate ZFS volumes.
If you need to modify your ZFS swap area after installation, then
use the swap command as in previous Solaris releases. For more information, see
Chapter 21, Configuring Additional Swap Space (Tasks), in System Administration Guide: Devices and File Systems.
For information about managing dump devices, see Managing System Crash Dump Information.
x86: System Crashes in the GRUB Boot Environment
If a system crash occurs on an x86 based system in
the GRUB boot environment, it is possible that the SMF service that manages
the GRUB boot archive, svc:/system/boot-archive:default, might fail on the next system reboot. To troubleshoot
this type of problem, see x86: What to Do if the SMF Boot Archive Service Fails During a System Reboot. For more information on GRUB based
booting, see Booting an x86 Based System by Using GRUB (Task Map) in System Administration Guide: Basic Administration.
System Crash Dump Files
The savecore command runs automatically after a system crash to retrieve the crash
dump information from the dump device and writes a pair of files called
unix.X and vmcore.X, where X identifies the dump sequence number. Together, these files
represent the saved system crash dump information.
Crash dump files are sometimes confused with core files, which are images of
user applications that are written when the application terminates abnormally.
Crash dump files are saved in a predetermined directory, which by default, is
/var/crash/hostname. In previous Solaris releases, crash dump files were overwritten when a system
rebooted, unless you manually enabled the system to save the images of physical
memory in a crash dump file. Now, the saving of crash dump files
is enabled by default.
System crash information is managed with the dumpadm command. For more information, see
The dumpadm Command.
Saving Crash Dumps
You can examine the control structures, active tables, memory images of a live
or crashed system kernel, and other information about the operation of the kernel
by using the mdb utility. Using mdb to its full potential requires a
detailed knowledge of the kernel, and is beyond the scope of this manual.
For information on using this utility, see the mdb(1) man page.
Additionally, crash dumps saved by savecore can be useful to send to a
customer service representative for analysis of why the system is crashing.
The dumpadm Command
Use the dumpadm command to manage system crash dump information in the Solaris
Operating System.
The dumpadm command enables you to configure crash dumps of the operating system. The dumpadm configuration parameters include the dump content, dump device, and the directory in which crash dump files are saved.
Dump data is stored in compressed format on the dump device. Kernel crash dump images can be as big as 4 Gbytes or more. Compressing the data means faster dumping and less disk space needed for the dump device.
Saving crash dump files is run in the background when a dedicated dump device, not the swap area, is part of the dump configuration. This means a booting system does not wait for the savecore command to complete before going to the next step. On large memory systems, the system can be available before savecore completes.
System crash dump files, generated by the savecore command, are saved by default.
The savecore -L command is a new feature which enables you to get a crash dump of the live running the Solaris OS. This command is intended for troubleshooting a running system by taking a snapshot of memory during some bad state, such as a transient performance problem or service outage. If the system is up and you can still run some commands, you can execute the savecore -L command to save a snapshot of the system to the dump device, and then immediately write out the crash dump files to your savecore directory. Because the system is still running, you can only use the savecore -L command if you have configured a dedicated dump device.
The following table describes dumpadm's configuration parameters.
Dump Parameter |
Description |
dump device |
The device that stores dump
data temporarily as the system crashes. When the dump device is not the
swap area, savecore runs in the background, which speeds up the boot process. |
savecore
directory |
The directory that stores system crash dump files. |
dump content |
Type of memory data
to dump. |
minimum free space |
Minimum amount of free space required in the
savecore directory after saving crash dump files. If no minimum free space
has been configured, the default is one Mbyte. |
For more information, see dumpadm(1M).
Dump configuration parameters are managed by the dumpadm command.
How the dumpadm Command Works
During system startup, the dumpadm command is invoked by the svc:/system/dumpadm:default service to
configure crash dumps parameters.
Specifically, dumpadm initializes the dump device and the dump content through the /dev/dump
interface.
After the dump configuration is complete, the savecore script looks for the location
of the crash dump file directory. Then, savecore is invoked to check
for crash dumps and check the content of the minfree file in
the crash dump directory.
Dump Devices and Volume Managers
Do not configure a dedicated dump device that is under the control
of volume management product such as Solaris Volume Manager for accessibility and performance reasons.
You can keep your swap areas under the control of Solaris Volume Manager
and this is a recommend practice, but keep your dump device separate.