Tail-call Optimization
When one function ends by calling another function, the compiler can engage in
tail-call optimization, in which the function being called reuses the caller's stack frame. This
procedure is most commonly used in the SPARC architecture, where the compiler reuses the
caller's register window in the function being called in order to minimize register
window pressure.
The presence of this optimization causes the return probe of the calling function
to fire before the entry probe of the called function. This ordering can
lead to quite a bit of confusion. For example, if you wanted to
record all functions called from a particular function and any functions that this
function calls, you might use the following script:
fbt::foo:entry
{
self->traceme = 1;
}
fbt:::entry
/self->traceme/
{
printf("called %s", probefunc);
}
fbt::foo:return
/self->traceme/
{
self->traceme = 0;
}
However, if foo() ends in an optimized tail-call, the tail-called function, and therefore
any functions that it calls, will not be captured. The kernel cannot be
dynamically deoptimized on the fly, and DTrace does not wish to engage in
a lie about how code is structured. Therefore, you should be aware of
when tail-call optimization might be used.
Tail-call optimization is likely to be used in source code similar to the
following example:
return (bar());
Or in source code similar to the following example:
(void) bar();
return;
Conversely, function source code that ends like the following example cannot have
its call to bar() optimized, because the call to bar() is not a
tail-call:
bar();
return (rval);
You can determine whether a call has been tail-call optimized using the following
technique:
While running DTrace, trace arg0 of the return probe in question. arg0 contains the offset of the returning instruction in the function.
After DTrace has stopped, use mdb(1) to look at the function. If the traced offset contains a call to another function instead of an instruction to return from the function, the call has been tail-call optimized.
Due to the instruction set architecture, tail-call optimization is far more common on
SPARC systems than on x86 systems. The following example uses mdb to
discover tail-call optimization in the kernel's dup() function:
# dtrace -q -n fbt::dup:return'{printf("%s+0x%x", probefunc, arg0);}'
While this command is running, run a program that performs a dup(2),
such as a bash process. The above command should provide output similar to
the following example:
dup+0x10
^C
Now examine the function with mdb:
# echo "dup::dis" | mdb -k
dup: sra %o0, 0, %o0
dup+4: mov %o7, %g1
dup+8: clr %o2
dup+0xc: clr %o1
dup+0x10: call -0x1278 <fcntl>
dup+0x14: mov %g1, %o7
The output shows that dup+0x10 is a call to the fcntl() function
and not a ret instruction. Therefore, the call to fcntl() is an example
of tail-call optimization.