Red Hat Enterprise Linux 4: Debugging with gdb
Prev		Next

Appendix E. The GDB Agent Expression Mechanism

In some applications, it is not feasable for the debugger to interrupt the program's execution long enough for the developer to learn anything helpful about its behavior. If the program's correctness depends on its real-time behavior, delays introduced by a debugger might cause the program to fail, even when the code itself is correct. It is useful to be able to observe the program's behavior without interrupting it.

Using GDB's trace and collect commands, the user can specify locations in the program, and arbitrary expressions to evaluate when those locations are reached. Later, using the tfind command, she can examine the values those expressions had when the program hit the trace points. The expressions may also denote objects in memory -- structures or arrays, for example -- whose values GDB should record; while visiting a particular tracepoint, the user may inspect those objects as if they were in memory at that moment. However, because GDB records these values without interacting with the user, it can do so quickly and unobtrusively, hopefully not disturbing the program's behavior.

When GDB is debugging a remote target, the GDB agent code running on the target computes the values of the expressions itself. To avoid having a full symbolic expression evaluator on the agent, GDB translates expressions in the source language into a simpler bytecode language, and then sends the bytecode to the agent; the agent then executes the bytecode, and records the values for GDB to retrieve later.

The bytecode language is simple; there are forty-odd opcodes, the bulk of which are the usual vocabulary of C operands (addition, subtraction, shifts, and so on) and various sizes of literals and memory reference operations. The bytecode interpreter operates strictly on machine-level values -- various sizes of integers and floating point numbers -- and requires no information about types or symbols; thus, the interpreter's internal data structures are simple, and each bytecode requires only a few native machine instructions to implement it. The interpreter is small, and strict limits on the memory and time required to evaluate an expression are easy to determine, making it suitable for use by the debugging agent in real-time applications.

E.1. General Bytecode Design

The agent represents bytecode expressions as an array of bytes. Each instruction is one byte long (thus the term bytecode). Some instructions are followed by operand bytes; for example, the goto instruction is followed by a destination for the jump.

The bytecode interpreter is a stack-based machine; most instructions pop their operands off the stack, perform some operation, and push the result back on the stack for the next instruction to consume. Each element of the stack may contain either a integer or a floating point value; these values are as many bits wide as the largest integer that can be directly manipulated in the source language. Stack elements carry no record of their type; bytecode could push a value as an integer, then pop it as a floating point value. However, GDB will not generate code which does this. In C, one might define the type of a stack element as follows:
union agent_val { LONGEST l; DOUBLEST d; };

where LONGEST and DOUBLEST are typedef names for the largest integer and floating point types on the machine.

By the time the bytecode interpreter reaches the end of the expression, the value of the expression should be the only value left on the stack. For tracing applications, trace bytecodes in the expression will have recorded the necessary data, and the value on the stack may be discarded. For other applications, like conditional breakpoints, the value may be useful.

Separate from the stack, the interpreter has two registers:

pc: The address of the next bytecode to execute.
start: The address of the start of the bytecode expression, necessary for interpreting the goto and if_goto instructions.

Neither of these registers is directly visible to the bytecode language itself, but they are useful for defining the meanings of the bytecode operations.

There are no instructions to perform side effects on the running program, or call the program's functions; we assume that these expressions are only used for unobtrusive debugging, not for patching the running code.

Most bytecode instructions do not distinguish between the various sizes of values, and operate on full-width values; the upper bits of the values are simply ignored, since they do not usually make a difference to the value computed. The exceptions to this rule are:

memory reference instructions (refn): There are distinct instructions to fetch different word sizes from memory. Once on the stack, however, the values are treated as full-size integers. They may need to be sign-extended; the ext instruction exists for this purpose.
the sign-extension instruction (ext n): These clearly need to know which portion of their operand is to be extended to occupy the full length of the word.

If the interpreter is unable to evaluate an expression completely for some reason (a memory location is inaccessible, or a divisor is zero, for example), we say that interpretation "terminates with an error". This means that the problem is reported back to the interpreter's caller in some helpful way. In general, code using agent expressions should assume that they may attempt to divide by zero, fetch arbitrary memory locations, and misbehave in other ways.

Even complicated C expressions compile to a few bytecode instructions; for example, the expression x + y * z would typically produce code like the following, assuming that x and y live in registers, and z is a global variable holding a 32-bit int:
reg 1 reg 2 const32 address of z ref32 ext 32 mul add end

In detail, these mean:

reg 1: Push the value of register 1 (presumably holding x) onto the stack.
reg 2: Push the value of register 2 (holding y).
const32 address of z: Push the address of z onto the stack.
ref32: Fetch a 32-bit word from the address at the top of the stack; replace the address on the stack with the value. Thus, we replace the address of z with z's value.
ext 32: Sign-extend the value on the top of the stack from 32 bits to full length. This is necessary because z is a signed integer.
mul: Pop the top two numbers on the stack, multiply them, and push their product. Now the top of the stack contains the value of the expression y * z.
add: Pop the top two numbers, add them, and push the sum. Now the top of the stack contains the value of x + y * z.
end: Stop executing; the value left on the stack top is the value to be recorded.

Prev	Home	Next
File-I/O remote protocol extension		Bytecode Descriptions