openSolaris 2008 - Detecting Memory Corruption - Solaris Modular Debugger Guide

Detecting Memory Corruption

One of the primary debugging facilities of the allocator is that it includes algorithms to recognize data corruption quickly. When corruption is detected, the allocator immediately panics the system. This section describes how the allocator recognizes data corruption. You must understand this to be able to debug these problems.

Memory abuse typically falls into one of the following categories:

Writing past the end of a buffer
Accessing uninitialized data
Continuing to use a freed buffer
Corrupting kernel memory

Keep these problems in mind as you read the next three sections. They will help you to understand the allocator's design, and enable you to diagnose problems more efficiently.

Freed Buffer Checking: `0xdeadbeef`

When the KMF_DEADBEEF (0x2) bit is set in the flags field of a kmem_cache, the allocator tries to make memory corruption easy to detect by writing a special pattern into all freed buffers. This pattern is 0xdeadbeef. Since a typical region of memory contains both allocated and freed memory, sections of each kind of block will be interspersed. The following example is from the kmem_alloc_24 cache:

0x70a9add8:     deadbeef        deadbeef
0x70a9ade0:     deadbeef        deadbeef
0x70a9ade8:     deadbeef        deadbeef
0x70a9adf0:     feedface        feedface
0x70a9adf8:     70ae3260        8440c68e
0x70a9ae00:     5               4ef83
0x70a9ae08:     0               0
0x70a9ae10:     1               bbddcafe
0x70a9ae18:     feedface        139d
0x70a9ae20:     70ae3200        d1befaed
0x70a9ae28:     deadbeef        deadbeef
0x70a9ae30:     deadbeef        deadbeef
0x70a9ae38:     deadbeef        deadbeef
0x70a9ae40:     feedface        feedface
0x70a9ae48:     70ae31a0        8440c54e

The buffers at 0x70a9add8 and 0x70a9ae28 are filled with 0xdeadbeefdeadbeef, which shows that these buffers are free. The buffer redzones are filled with 0xfeedfacefeedface, which indicates they are untouched (no buffer overrun has occurred). See the following section for an explanation of redzones. At 0x70a9ae00 an allocated buffer is located between the two free buffers.

Redzone: `0xfeedface`

Note the pattern 0xfeedface in the buffer shown in the previous section. This pattern is known as the redzone indicator. This pattern enables the allocator (and a programmer debugging a problem) to determine whether the boundaries of a buffer have been violated. Following the redzone is some additional information. The content of that data depends on other factors (see Memory Allocation Logging). The redzone and its suffix are collectively called the buftag region. Figure 9-1 summarizes this information.

Figure 9-1 The Redzone

The buftag is appended to each buffer in a cache when any of the KMF_AUDIT, KMF_DEADBEEF, or KMF_REDZONE flags is set in that buffer's cache. The content of the buftag depends on whether KMF_AUDIT is set.

Decomposing the memory region presented above into distinct buffers is now simple:

0x70a9add8:     deadbeef        deadbeef  \
0x70a9ade0:     deadbeef        deadbeef   +- User Data (free)
0x70a9ade8:     deadbeef        deadbeef  /
0x70a9adf0:     feedface        feedface  -- REDZONE
0x70a9adf8:     70ae3260        8440c68e  -- Debugging Data

0x70a9ae00:     5               4ef83     \
0x70a9ae08:     0               0          +- User Data (allocated)
0x70a9ae10:     1               bbddcafe  /
0x70a9ae18:     feedface        139d    -- REDZONE
0x70a9ae20:     70ae3200        d1befaed  -- Debugging Data

0x70a9ae28:     deadbeef        deadbeef  \
0x70a9ae30:     deadbeef        deadbeef   +- User Data (free)
0x70a9ae38:     deadbeef        deadbeef  /
0x70a9ae40:     feedface        feedface  -- REDZONE
0x70a9ae48:     70ae31a0        8440c54e  -- Debugging Data

The buffers at 0x70a9add8 and 0x70a9ae28 are filled with 0xdeadbeefdeadbeef, which shows that these buffers are free. The buffer redzones are filled with 0xfeedfacefeedface, which indicates they are untouched (no buffer overrun has occurred).

0xbaddcafe: Buffer is allocated but uninitialized (see Uninitialized Data: 0xbaddcafe).
0xdeadbeef: Buffer is free.
0xfeedface: Buffer limits were respected (no overflow).

In the allocated buffer beginning at 0x70a9ae00, the situation is different. Recall from Allocator Basics that there are two allocation types:

The client requested memory using kmem_cache_alloc(9F), in which case the size of the requested buffer is equal to the bufsize of the cache.
The client requested memory using kmem_alloc(9F), in which case the size of the requested buffer is less than or equal to the bufsize of the cache. For example, a request for 20 bytes will be fulfilled from the kmem_alloc_24 cache. The allocator enforces the buffer boundary by placing a marker, the redzone byte, immediately following the client data:
```
0x70a9ae00:     5               4ef83     \
0x70a9ae08:     0               0          +- User Data (allocated)
0x70a9ae10:     1               bbddcafe  /
0x70a9ae18:     feedface        139d    -- REDZONE
0x70a9ae20:     70ae3200        d1befaed  -- Debugging Data
```

The 0xfeedface value at 0x70a9ae18 is followed by a 32-bit word containing what seems to be a random value. This number is actually an encoded representation of the size of the buffer. To decode this number and find the size of the allocated buffer, use the formula:

size = redzone_value / 251

So, in this example,

size = 0x139d / 251 = 20 bytes.

This indicates that the buffer requested was of size 20 bytes. The allocator performs this decoding operation and finds that the redzone byte should be at offset 20. The redzone byte is the hex pattern 0xbb, which is present at 0x729084e4 (0x729084d0 + 0t20) as expected.

Figure 9-2 Sample `kmem_alloc(9F)` Buffer

This graphic depicts a sample kmem_alloc buffer. The redzone byte, uninitialized data, and debugging data are marked.

Figure 9-3 shows the general form of this memory layout.

Figure 9-3 Redzone Byte

This graphic shows the redzone byte being written after the end of the user data region. The redzone byte is determined by decoding the index.

If the allocation size is the same as the bufsize of the cache, the redzone byte overwrites the first byte of the redzone itself, as shown in Figure 9-4.

Figure 9-4 Redzone Byte at the Beginning of the Redzone

This overwriting results in the first 32-bit word of the redzone being 0xbbedface, or 0xfeedfabb depending on the endianness of the hardware on which the system is running.

Note - Why is the allocation size encoded this way? To encode the size, the allocator uses the formula (251 * size + 1). When the size decode occurs, the integer division discards the remainder of '+1'. However, the addition of 1 is valuable because the allocator can check whether the size is valid by testing whether (size % 251 == 1). In this way, the allocator defends against corruption of the redzone byte index.

Uninitialized Data: `0xbaddcafe`

You might be wondering what the suspicious 0xbbddcafe at address 0x729084d4 was before the redzone byte got placed over the first byte in the word. It was 0xbaddcafe. When the KMF_DEADBEEF flag is set in the cache, allocated but uninitialized memory is filled with the 0xbaddcafe pattern. When the allocator performs an allocation, it loops across the words of the buffer and verifies that each word contains 0xdeadbeef, then fills that word with 0xbaddcafe.

A system can panic with a message such as:

panic[cpu1]/thread=e1979420: BAD TRAP: type=e (Page Fault)
rp=ef641e88 addr=baddcafe occurred in module "unix" due to an
illegal access to a user address

In this case, the address that caused the fault was 0xbaddcafe: the panicking thread has accessed some data that was never initialized.

Associating Panic Messages With Failures

The kernel memory allocator emits panic messages corresponding to the failure modes described earlier. For example, a system can panic with a message such as:

kernel memory allocator: buffer modified after being freed
modification occurred at offset 0x30

The allocator was able to detect this case because it tried to validate that the buffer in question was filled with 0xdeadbeef. At offset 0x30, this condition was not met. Since this condition indicates memory corruption, the allocator panicked the system.

Another example failure message is:

kernel memory allocator: redzone violation: write past end of buffer

The allocator was able to detect this case because it tried to validate that the redzone byte (0xbb) was in the location it determined from the redzone size encoding. It failed to find the signature byte in the correct location. Since this indicates memory corruption, the allocator panicked the system. Other allocator panic messages are discussed later.

Detecting Memory Corruption

Freed Buffer Checking: 0xdeadbeef

Redzone: 0xfeedface