TOM Highlight: Glueing TOM and C

TOM

Glueing TOM and C

FAQ
News
Highlights
Publications
Documentation
Download TOM
TOM Software
Bug Database
Mailing Lists

Mail:
tiggr at gerbil.org

Short Cuts:

Tesla
TOM/Gtk
GP
MU

Snapshots:

all of 'em
tom [an error occurred while processing this directive]
tesla [an error occurred while processing this directive]
mu [an error occurred while processing this directive]
tomgtk [an error occurred while processing this directive]

Released:

all of 'em
tom 1.1.1
tomgtk 0.11
tesla 0.91
gp 0.5
mu 1.0

Misc:

GIF free NOW!

At the lower levels of abstraction, it is often necessary to glue code written in one programming language to code written in another. Most languages can interface to C and, in fact, the TOM compiler translates TOM code to C code. Therefore, TOM provides extensive support for interaction between TOM code and C code.

There are two ways of mixing C code with a TOM program. One is straightforward and could be called elegant, certainly with respect to the other one, which is a hack. The straightforward mix of TOM code and C code is by implementing TOM methods in C. To inform the compiler of this setup, the method is qualified extern.
extern double cos double arg;
To the TOM compiler (tomc), this declaration doubles as a definition: a method declared extern can not have a method body. Though the actual C (or other language) function implementing this method is beyond the control of tomc, it is mandatory that the function is provided, or the resulting program will not link.

To implement the cos method in C, we need to know a little more about the name of a C function that implements a given method. In general, the C function name of a method has the following structure:
ic_unit_extension-name_mangled-selector
where each element has the following meaning:

ic: This is i for an instance method; c for a class method.
unit: The name of the unit containing this method definition. If the method is defined in a class, it is the unit containing the class. If the method is defined in an extension, it is the unit containing the extension, which is not necessarily equal to the unit containing the class.
For example, the too unit defines a Proxy extension to the State class, which itself is defined in the tom unit. For methods defined in this extension, the unit element will be too, not tom.
extension-name: This is the class name for a method defined in a class, i.e., in the main extension, or the composite name Foo_Bar for the Bar extension of the Foo class.
mangled-selector: This is the mangled selector name, i.e., the name of the selector after it has been mangled to fit the restrictions imposed on a C identifier: all characters that are not allowed in such an identifier are replaced by an underscore `_'. Given the kinds of characters that can occur in a selector name, this means that every `(', `)', `-', or `:' is replaced by an `_'.

Selector names

Before we can continue implementing the extern method, we need to know how the name of a selector is constructed (but see the new selector syntax highlight for the current way of getting a selector in TOM; what is explained here are the selector names as used internally by the compiler and run-time library, which is what we need for C interfacing). This is best explained starting with the method that is invoked when a message with that selector is sent to an object. Suppose the cos method is invoked, then the selector contains its name, cos, and an encoding of its return type and argument type. Table 1 lists the TOM types and for each type the character that is used to encode a value of that type.

Table 1: type encodings
type encoding type encoding type encoding

void v int i pointer p

boolean o long l selector s

byte b float f reference r

char c double d dynamic x

The elements that make up the selector name are separated by underscores (`_'), and their order is maintained. The selector name of our double returning cos method accepting a double argument becomes:
d_cos_d
A few words need to be said about the reference and dynamic types in table 1. First, a reference is not a TOM type: there is not a type in the language that has the concrete syntactic representation reference. A reference stands for a reference to an object, any object. Thus, at the level of selectors and selector names, all objects are equal.

The encoding of the dynamic type only occurs in method names (in the mangled selector part), never in the selector of a message. For example, the name of the function implementing the instance method
void print dynamic a;
of the Bar class in the foo unit will be
i_foo_Bar_v_print_x
but when a message is sent that will invoke this method, the actual arguments of the message are known, and the selector passed to the method will convey their types. Thus, when invoking the method like this:
foo.Bar mybar = ...; [mybar print FALSE];
the selector that is passed to the method will be v_print_o, showing that for the dynamic formal argument, the actual argument passed is a single boolean.

Also a tuple can be passed as an actual argument to a formal dynamic argument, and how are they encoded? The encoding of a tuple type starts with a `(', followed by the encoding of its elements, followed by `)'. For the following invocation of print
[mybar print (3.14e0, 9876543210, FALSE, 1234567890, 1.6d-19)];
the selector passed to the method will be
v_print_(floid)
Tuples can of course also occur in a method name, and hence, in mangled form, in the name of a C function implementing that method. The following method
extern double atan2 (double, double) (x, y);
responds to the selector d_atan2_(dd), and the C function implementing this method in the Math class of the C library unit is c_C_Math_d_atan2__dd_.

Type names

Before we can implement our cos method, we must know how to denote the TOM types in C. Table 2 lists the TOM types and for each type the equivalent type to be used in C. These types are defined in <tom/trt.h>.

Table 2: C types for TOM types
TOM type C type TOM type C type TOM type C type

void void int tom_int pointer void *

boolean tom_byte long tom_long selector selector

byte tom_byte float tom_float reference tom_object

char tom_char double tom_double dynamic ...

The triple dots as the C type for the TOM dynamic type actually refer to the triple dots used in a C function to denote a variable number of arguments and the used of <stdarg.h>. However, that is a very hairy issue we will not delve into right now.

External implementation

From the information in table 2, we are finally able to write our cos method, supposedly for the Math class of the C unit:


#include <math.h>
#include <C-r.h>

tom_double
c_C_Math_d_cos_d (tom_object self, selector cmd, tom_double arg)
{
  return cos (arg);
}

The conversion from the tom_double arg to the (C) double accepted by the function cos is handled by the C compiler, as is the conversion of the result of cos to the value that is returned. (On all machines currently supporting TOM, a tom_double is simply a double, making the conversion rather easy.)

A few things can be said about this code:

The inclusion of <C-r.h> isn't strictly necessary in this example, but in the case of less trivial implementations, including the file unit-r.h is mandatory. This file is the resolver output and contains vital information about the classes and selectors that are defined in the unit. It also includes the resolver output of the units on which the unit depends, plus the TOM runtime header file <tom/trt.h>. The latter is mandatory (for the C equivalent definitions of the TOM types). Often you will also find use for including <tom/util.h> which contains less elementary information for interfacing with TOM code and the TOM Run Time library (trt).
The first argument to a method implementation is always the (implicit) receiver object. In C you should always declare the type to be a tom_object, even if you think to know that it will be something more specific. The tom_object type is pretty opaque, being defined as follows (in <tom/trt.h>:
typedef struct trt_instance { /* The class of this object. */ struct trt_class *isa; /* The flags needed by the runtime. */ tom_int asi; } *tom_object;
The isa is the pointer to the class of the object; the asi field is used by trt to store (1) whether the object is an instance, a class, or a meta class, and (2) information for the garbage collector. Any instance (or non-static class) variables of the object are not directly available by dereferencing self; a future TOM highlight will shed light on how that can be achieved.
The selector argument cmd is the second implicit argument to every method invocation. The arg is the first `real' argument.

The hack

As promised at the start of this highlight, there also is a hack to write functionality in C. This hack uses the fact that the output of the TOM compiler actually is C.

Normally, the TOM compiler ignores anything enclosed within <foo> and </foo>, regarding it as comment (which does not nest). The flexibility of this commenting scheme is that special comments, i.e., `comments' with more meaning than just some remark on the code to follow, can be qualified. For example the TOM documentation generator extracts comments enclosed in <doc> and </doc>, regarding them as documentation on the class, variable, or method to follow. It skips all other comments, including the <copyright> and </copyright> at the top of the TOM library units source files, since copyright information is not interesting for the reader that wants to learn how a certain class works.

The single exception to the above rule is that text enclosed with <c> and </c> is not taken to be comment. Instead, the enclosed text is copied verbatim to the output, implying that the text better be literal C code, which is actually what it was meant for.

Our cos method can now be written as follows:
<c> #include <math.h> </c> <doc> Return the cosine of the argument {arg}. </doc> double (result) cos double arg { <c> result = cos (arg); </c> }
Again, a few notes:

Do not use any nasties like the C return statement in your C code. Instead, assign a value to the return value of the method, as is done in this example.
You can include header files like <math.h> (at the global level; not within a method) but you can not include the resolver output as was done with the external implementation of cos. This has some implications that increase the complexity of including C code like this, as will be explained in a future highlight.
If you C code starts with a declaration, it should start its own block.
The C code is included literally, so it does not need to be a fully delimited entity. For example, the following implementation of cos is `legal':
double (result) cos double arg { <c> { double a = arg; result = cos (arg); </c> return; <c> } </c> }

Conclusions

There is a lot more to mixing TOM with C, but withinin the limited space (especially in my time) available for a TOM highlight, what has been discussed here serves as a solid basis and good starting point. The eager reader is pointed at the sources for more information, as always.

Up: Highlights