TOM Highlight: debugging TOM programs

TOM

Debugging TOM programs

FAQ
News
Highlights
Publications
Documentation
Download TOM
TOM Software
Bug Database
Mailing Lists

Mail:
tiggr at gerbil.org

Short Cuts:

Tesla
TOM/Gtk
GP
MU

Snapshots:

all of 'em
tom [an error occurred while processing this directive]
tesla [an error occurred while processing this directive]
mu [an error occurred while processing this directive]
tomgtk [an error occurred while processing this directive]

Released:

all of 'em
tom 1.1.1
tomgtk 0.11
tesla 0.91
gp 0.5
mu 1.0

Misc:

GIF free NOW!

The TOM compiler compiles TOM source to C, intended for compilation by GNU CC. Every effort is made to ensure that tools available for programs written in C can be applied to TOM programs as well. TOM programs can be profiled using gprof or Quantify, and tested using PureCoverage You won't need a malloc debugger, though Purify is known to work. Other testing products probably work as well; those mentioned here have actually been used on TOM programs. (If you know of any developer tool that apparently does or does not work with TOM, please mail tiggr at gerbil.org.)

TOM programs can be debugged using the GNU debugger, GDB, provided GNU CC on your system supports GDB debugging (this includes all machines currently supported by TOM). Using an unmodified GDB, debugging TOM code is at least as easy as debugging C code. If GDB is unavailable on your system, you can probably get the same functionality using any other debugger, modulo the differences between that debugger and GDB. What follows is a collection of remarks and hints on the subject; they are the result of more than a year of experience with using GDB on TOM programs.

Example input to GDB is prefixed by the usual GDB prompt, `(gdb)'. Shell input is prefixed by `$'

Fool proof

To usefully run programs under a debugger, you need to pass the -g option to the C compiler for every C source file you compile. In the usual setup of TOM compilation (using the TOM makefiles) -g is already passed to the C compiler automatically.

The TOM compiler emits #line directives to point the C compiler and the debugger to the actual TOM source file instead of the intermediate C file emitted by the TOM compiler. Thus, any error message emitted by the C compiler (which should not happen or you'd have encountered a bug in the TOM compiler) and every action in the debugger will be concerned with the TOM source. You never see the intermediate C file. Throw it away if you feel like it; it won't matter.

Local variables

Every TOM method has a corresponding C function (see methods). Local variables in a method map directly to the C local variables understood by GDB. When compiling with optimization turned on, GCC can map multiple local variables to the same stack slot or register; it can even eliminate local variables. If this hurts, compile with optimization turned off: i.e. do not pass -O or -O2 to the compiler. With the TOM Makefiles, simply invoke make with extra CFLAGS, as in


$ make CFLAGS=-O0

Class variables

Class variables, which are the closest TOM has resembling global variables, come in three flavors: local (to a thread), static, and normal. Thread-local variables won't be discussed here.

Static variables in TOM correspond to global variables in C, with some simple prefixing to ensure a unique name. For example, the static variable num_cows in the class MyClass in the unit MyUnit is availabe in the debugger as c_MyUnit_MyClass_num_cows. If an extension named MyExtension of MyClass defines a static variable num_bulls, it will be available as c_MyUnit_MyClass_MyExtension_num_cows in the debugger. Note that the name of the unit containing the extension is irrelevant; extension names must be unique within a class anyway.

A normal class variable is part of the class object and of the class object of every subclass. They can be examined in a way similar to how instance variables can be examined. For this purpose, you need a pointer to the class object, which you can retrieve in two different ways, again using the MyUnit.MyClass as an example:

&_md_c_MyUnit_MyClass: This is the actual address of the class object. It will never change, since the class object is a statically allocated C struct.
_mr_c_MyUnit_MyClass: This refers to the class object in the same way that invoking a class method does. The value of this pointer is thus affected by posing. Without posing, its value is &_md_c_MyUnit_MyClass.

To refer to the meta class object, replace the c (for Class) by an m (for Meta class), as in _mr_m_MyUnit_MyClass and &_md_m_MyUnit_MyClass.

Methods

Every method implementation is a C function to GDB. Given the class implementing the method and the selector to which the method corresponds, the name of the C function is easily deduced. For example, the alloc method implemented by the tom.State class corresponds to the function c_tom_State_r_alloc. The prefix is the same as for class variables, except that for instance methods, the leading c should be replaced by an i. The suffix of the method name is the mangled name of the corresponding selector. Once you've seen a few examples, you'll get the hang of it; see the TOM language reference manual for a discussion of the selector name mangling scheme.

To invoke a method from the debugger, there are several options. The simplest is of course to directly invoke the function implementing the method. For example, to allocate a new State object:
(gdb) print c_tom_State_r_alloc (_mr_c_tom_State, 0) $3 = (struct trt_instance *) 0x2002380
The first argument is the receiver, in this case the State class object. The second argument is the second implicit argument, the selector. Since we `know' that the alloc method won't use the selector, we can safely pass 0. If we were to pass the actual selector, it would be &_sd_r_alloc, i.e. the mangled selector name preceded by _sd_. Any arguments to the method come after the selector.

An advantage of directly invoking the implementation is that you know which implementation you invoke. In fact, you've just done method binding by hand. This is not free of perils: you can make errors, for example forgetting that the object is (an instance of) a subclass that actually overrides that method. To overcome this problem, the function send_msg, or its shorthand s, can be used, which first checks that the receiver is a valid object and then dispatches the message as usual.
(gdb) p s(_mr_c_tom_State,&_sd_r_alloc) $4 = (struct trt_instance *) 0x2002390
Keep in mind that this is a little more tricky than a direct call, since the types of the arguments are not known to the compiler, as send_msg employs va_arg to retrieve the arguments to be passed to the method implementation.

What is left is a word about multiple return values. All but the first are passed as invisible last arguments. In the following example, a Cons cell is allocated and initialized with the two previously allocated State objects.
(gdb) p c_tom_Cons_r_with__rr_ (_mr_c_tom_Cons, 0, $3, $4) $6 = (struct i_tom_Cons *) 0x20023a0
We now want to invoke the decons method which returns the two objects referenced by the Cons object, and which is decalred as:
(Any, Any) decons;

The first return value will be returned as normal; for the second return value to be returned by-reference, we need some space, for instance by calling malloc.
(gdb) p malloc (sizeof (void *)) $7 = 3380356
And now we can retrieve the actual objects:
(gdb) p i_tom_Cons__rr__decons ($6, 0, $7) $8 = (struct i__builtin__Any *) 0x2002380 (gdb) x/x $7 0x339484: 0x02002390
Obviously, invoking a method with multiple return values isn't all that trivial. Luckily, you won't need to perform this excercise in the debugger very often since most methods return a single value.

Examining objects

At compile time, every TOM object is declared to the C compiler as a C struct. Unless you have modified the runtime library to not provide the functions described below, you should never need to look at an object directly, for instance by using


(gdb) print *my_obj_ptr

for several reasons:

The struct only resembles information known to the compiler at compile time while compiling the source file containing the current method. Objects can be amended and extended at compile, link, and run time and the odds are that the order of the variables in the object will actually be different from the ordering of tags in the struct.
With multiple inheritance, the order of direct superclasses at various places in the tools and runtime library is almost random. This means that it will be very likely, extremely likely, for the compiler to have a different view of the contents of an object than the object itself at run time.
Need a stronger reason? Here's one: It Just Doesn't Work. Do Not Try This Anywhere, Kids.

The TOM runtime library has two functions which provide much more functionality than what you get by staring at a struct: print_object, or d for short, lets an object descriptively print itself, dump_object, or u for short, will dump the instance variables of an object, recursively to the level specified. Normally you'll use the short names d and u, unless a local variable hides the name, and GDB will complain `called object is not a variable'.

Both functions send their output to the err stream declared by the stdio class. Also, they check that the address of the object passed to them actually denotes an object. Instead of causing a fatal signal, they'll moan about an address that does not denote an object, or that denotes a dead object.

void print_object (tom_object o);

Invoke the object's write method which is declared as


OutputStream
  write OutputStream s;

This is similar to simply printing an object to a stream; the following call also invokes the write method:


[[stdio err] print my_obj];

Objects are free to implement the write method as they like. For example, an array prints its elements, and a string its characters. The default implementation of the method prints the object's address and class:


(gdb) call d(c_tom_stdio_err)
#<instance 020020a0 ByteStream>

An object can add extra information to this by implementing the writeFields method. For example, this is what a File outputs:


(gdb) finish
Value returned is $2 = (struct i_tom_File *) 0x20002f0
(gdb) call d($)
#<instance 020002f0 File filename=TestParse.tp flags=256>

void dump_object (tom_object o, int simple, int level);

If print_object does not provide enough information, dump_object can be called. This function calls the dump simple method implemented by the tom.All instance and which is not intended to be overridden. There are however, several ways in which an object can adjust the way it is dumped.

Dumping an object outputs the object's variables:
(gdb) call u($,1,1) #(File 020002f0 asi=0 descriptor=3 name="TestParse.tp" flags=256)
The level argument restricts the level up to which object references are traversed. If a variable references another object, that object is dumped with one level less to traverse. If however the level would become 0, the object is simply printed, which should cause only limited, albeith descriptive, output.

If an object responds positively to the question dump_simple_p, it will not be dumped in the usual way. Instead, its dump_simple method is invoked. String objects use this to simply print themselves instead of their instance variables, which aren't all that interesting anyway (an int and a pointer).

If the argument simple to dump_object is not 0, and an object responds positively to the question dump_self_p, it will be given the opportunity to dump itself, and it must implement the method dumpSelf indent simple level to. Array objects use this to actually dump their elements instead of having their instance variables dumped, which are just as boring as a string's ivars.

See the documentation for the All instance for more information on these methods.

(This highlight is a slight revamp of the developer's corner on the old TOM site.)

Up: Highlights