The TOM Tome
Prev	Chapter 6. Advanced topics	Next

6.3. Glueing TOM and C

At the lower levels of abstraction, it is often necessary to glue code written in one programming language to code written in another. Most languages can interface to C and, in fact, the TOM compiler translates TOM code to C code. Therefore, TOM provides extensive support for interaction between TOM code and C code.

6.3.1. C functions for TOM methods

There are two ways of mixing C code with a TOM program. One is straightforward and could be called elegant, certainly with respect to the other one, which is a hack. The straightforward mix of TOM code and C code is by implementing TOM methods in C. To inform the compiler of this setup, the method is qualified extern.

extern double
  cos double arg;

To a TOM compiler, this declaration doubles as a definition: a method declared extern can not have a method body. Though the actual C (or other language) function implementing this method is beyond the control of tesla or tomc, it is mandatory that the function is provided, or the resulting program will not link.

To implement the cos method in C, we need to know a little more about the name of a C function that implements a given method. In general, the C function name of a method has the following structure:

ic_unit_extension-name_mangled-selector

where each element has the following meaning:

ic

This is i for an instance method; c for a class method.

unit

The name of the unit containing this method definition. If the method is defined in a class, it is the unit containing the class. If the method is defined in an extension, it is the unit containing the extension, which is not necessarily equal to the unit containing the class.

For example, the too unit defines a Proxy extension to the State class, which itself is defined in the tom unit. For methods defined in this extension, the unit element will be too, not tom.

extension-name

This is the class name for a method defined in a class, i.e., in the main extension, or the composite name Foo_Bar for the Bar extension of the Foo class.

mangled-selector

This is the mangled selector name, i.e., the name of the selector after it has been mangled to fit the restrictions imposed on a C identifier: all characters that are not allowed in such an identifier are replaced by an underscore `_'. Given the kinds of characters that can occur in a selector name, this means that every `(', `)', `-', or `:' is replaced by an `_'.

6.3.2. Selector names

Before we can continue implementing the extern method, we need to know how the name of a selector is constructed. This is best explained starting with the method that is invoked when a message with that selector is sent to an object. Suppose the cos method is invoked, then the selector contains its name, cos, and an encoding of its return type and argument type. Table 6-1 lists all TOM types and for each type the character that is used to encode that type.

Table 6-1. type encodings

type	encoding	type	encoding	type	encoding
void	`v`	int	`i`	pointer	`p`
boolean	`o`	long	`l`	selector	`s`
byte	`b`	float	`f`	reference	`r`
char	`c`	double	`d`	dynamic	`x`

The selector name is made up of method names and encoded argument types, preceded by the encoded return type. Each type is enclosed in parenthesis. The selector name of our double returning cos method accepting a double argument becomes:

(d)cos(d)

A few words need to be said about the reference and dynamic types in Table 6-1. First, a reference is not a TOM type: there is not a type in the language that has the concrete syntactic representation reference. A reference stands for a reference to an object, any object. Thus, at the level of selectors and selector names, all objects are equal.

The encoding of the dynamic type only occurs in method names (in the mangled selector part), never in the selector of a message. For example, the name of the function implementing the instance method

void
  print dynamic a;

of the Bar class in the foo unit will be

i_foo_Bar_v_print_x_

but when a message is sent that will invoke this method, the actual arguments of the message are known, and the selector passed to the method will convey their types. Thus, when invoking the method like this:

foo.Bar mybar = ...;
[mybar print FALSE];

the selector that is passed to the method will be (v)print(o), showing that for the dynamic formal argument, the actual argument passed is a single boolean.

As an example of the encoding of a tuple, for the following invocation of print

[mybar print (3.14e0, 9876543210, FALSE, 1234567890, 1.6d-19)];

the selector passed to the method will be

(v)print(floid)

Tuples can of course also occur in a method name, and hence, in mangled form, in the name of a C function implementing that method. The following method

extern double
  atan2 (double, double) (x, y);

responds to the selector (d)atan2(dd), and the C function implementing this method in the Math class of the C library unit is c_C_Math_d_atan2_dd_.

6.3.3. Type names

Before we can implement our cos method, we must know how to denote the TOM types in C. Table 6-2 lists the TOM types and for each type the equivalent type to be used in C. These types are defined in <tom/trt.h>.

Table 6-2. C types for TOM types

TOM type	C type	TOM type	C type	TOM type	C type
void	void	int	tom_int	pointer	void *
boolean	tom_byte	long	tom_long	selector	selector
byte	tom_byte	float	tom_float	reference	tom_object
char	tom_char	double	tom_double	dynamic	`...`

The triple dots as the C type for the TOM dynamic type actually refer to the triple dots used in a C function to denote a variable number of arguments and the used of <stdarg.h>. However, that is a very hairy issue we will not delve into right now.

6.3.4. External implementation

From the information in Table 6-2, we are finally able to write our cos method, supposedly for the Math class of the C unit:

#include <math.h>
#include <C-r.h>

tom_double
c_C_Math_d_cos_d_ (tom_object self, selector cmd, tom_double arg)
{
  return cos (arg);
}

The conversion from the tom_double arg to the (C) double accepted by the function cos is handled by the C compiler, as is the conversion of the result of cos to the value that is returned. (On all machines currently supporting TOM, a tom_double is simply a double, making the conversion rather easy.)

A few things can be said about this code:

The inclusion of <C-r.h> isn't strictly necessary in this example, but in the case of less trivial implementations, including the file unit-r.h is mandatory. This file is the resolver output and contains vital information about the classes and selectors that are defined in the unit. It also includes the resolver output of the units on which the unit depends, plus the TOM runtime header file <tom/trt.h>. The latter is mandatory (for the C equivalent definitions of the TOM types). Often you will also find use for including <tom/util.h> which contains less elementary information for interfacing with TOM code and the TOM Run Time library (trt).
The first argument to a method implementation is always the (implicit) receiver object. In C you should always declare the type to be a tom_object, even if you think to know that it will be something more specific. The tom_object type is pretty opaque, being defined as follows (in <tom/trt.h>:
typedef struct trt_instance { /* The class of this object. */ struct trt_class *isa; /* The flags needed by the runtime. */ tom_int asi; } *tom_object;
The isa is the pointer to the class of the object; the asi field is used by trt to store (1) whether the object is an instance, a class, or a meta class, and (2) information for the garbage collector. Any instance (or non-static class) variables of the object are not directly available by dereferencing self; a future TOM highlight will shed light on how that can be achieved.
The selector argument cmd is the second implicit argument to every method invocation. The arg is the first `real' argument.

6.3.5. The hack

As promised at the start of this highlight, there also is a hack to write functionality in C. This hack uses the fact that the output of Tesla actually is C.

Normally, the TOM compiler ignores anything enclosed within <foo> and </foo>, regarding it as comment (which does not nest). The flexibility of this commenting scheme is that special comments, i.e., `comments' with more meaning than just some remark on the code to follow, can be qualified. For example the TOM documentation generator extracts comments enclosed in <doc> and </doc>, regarding them as documentation on the class, variable, or method to follow. It skips all other comments, including the <copyright> and </copyright> at the top of the TOM library units source files, since copyright information is not interesting for the reader that wants to learn how a certain class works.

The single exception to the above rule is that text enclosed with <c> and </c> is not taken to be comment. Instead, the enclosed text is copied verbatim to the output, implying that the text better be literal C code, which is actually what it was meant for.

Our cos method can now be written as follows:

<c>
#include <math.h>
</c>

<doc> Return the cosine of the argument {arg}.  </doc>
double (result)
  cos double arg
{
<c>
  result = cos (arg);
</c>
}

Again, a few notes:

Do not use any nasties like the C return statement in your C code. Instead, assign a value to the return value of the method, as is done in this example.
You can include header files like <math.h> (at the global level; not within a method) but you can not include the resolver output as was done with the external implementation of cos. This has some implications that increase the complexity of including C code like this.
If you C code starts with a declaration, it should start its own block.
The C code is included literally, so it does not need to be a fully delimited entity. For example, the following implementation of cos is `legal':
double (result) cos double arg { <c> { double a = arg; result = cos (arg); </c> return; <c> } </c> }

Prev	Home	Next
Conditions	Up	Interaction with the Garbage Collector