At the lower levels of abstraction, it is often necessary to glue
code written in one programming language to code written in
another. Most languages can interface to C and, in fact, the TOM
compiler translates TOM code to C code. Therefore, TOM provides
extensive support for interaction between TOM code and C code.
There are two ways of mixing C code with a TOM program. One is
straightforward and could be called elegant, certainly with
respect to the other one, which is a hack. The straightforward
mix of TOM code and C code is by implementing TOM methods in C.
To inform the compiler of this setup, the method is qualified
extern .
|
extern double
cos double arg;
|
To the TOM compiler (tomc), this declaration doubles as a
definition: a method declared extern can not have a
method body. Though the actual C (or other language) function
implementing this method is beyond the control of tomc, it is
mandatory that the function is provided, or the resulting program
will not link.
To implement the cos method in C, we need to know a little
more about the name of a C function that implements a given
method. In general, the C function name of a method has the
following structure:
|
ic_unit_extension-name_mangled-selector
|
where each element has the following meaning:
ic
- This is
i for an instance method; c for a
class method.
unit
- The name of the unit containing this method definition. If the method
is defined in a class, it is the unit containing the class. If
the method is defined in an extension, it is the unit containing
the extension, which is not necessarily equal to the unit
containing the class.
For example, the too unit defines
a Proxy extension to the State class,
which itself is defined in the tom unit. For methods
defined in this extension, the unit element will be
too , not tom .
extension-name
- This is the class name for a method defined in a class, i.e., in the
main extension, or the composite name
Foo_Bar for the Bar extension of the
Foo class.
mangled-selector
- This is the mangled selector name, i.e., the name of the selector
after it has been mangled to fit the restrictions imposed on a C
identifier: all characters that are not allowed in such an
identifier are replaced by an underscore `_'. Given the kinds of
characters that can occur in a selector name, this means that
every `(', `)', `-', or `:' is replaced by an `_'.
Selector names
Before we can continue implementing the extern
method, we need to know how the name of a selector is constructed
(but see the new selector syntax
highlight for the current way of getting a selector in TOM; what
is explained here are the selector names as used internally by the
compiler and run-time library, which is what we need for C
interfacing). This is best explained starting with the method
that is invoked when a message with that selector is sent to an
object. Suppose the cos method is invoked, then the
selector contains its name, cos , and an encoding of
its return type and argument type. Table 1
lists the TOM types and for each type the character that is used
to encode a value of that type.
Table 1: type encodings
type |
encoding |
type |
encoding |
type |
encoding |
void |
v |
int |
i |
pointer |
p |
boolean |
o |
long |
l |
selector |
s |
byte |
b |
float |
f |
reference |
r |
char |
c |
double |
d |
dynamic |
x |
The elements that make up the selector name are separated by
underscores (`_'), and their order is maintained. The selector
name of our double returning cos method
accepting a double argument becomes:
A few words need to be said about the reference and dynamic types
in table 1. First, a reference is
not a TOM type: there is not a type in the language that has the
concrete syntactic representation reference . A
reference stands for a reference to an object, any
object. Thus, at the level of selectors and selector names, all
objects are equal.
The encoding of the dynamic type only occurs in method names (in the
mangled selector part), never in the selector of a message. For
example, the name of the function implementing the instance method
of the Bar class in the foo unit will be
but when a message is sent that will invoke this method, the
actual arguments of the message are known, and the selector passed
to the method will convey their types. Thus, when invoking the
method like this:
|
foo.Bar mybar = ...;
[mybar print FALSE];
|
the selector that is passed to the method will be
v_print_o , showing that for the dynamic formal
argument, the actual argument passed is a single boolean.
Also a tuple can be passed as an actual argument to a formal dynamic
argument, and how are they encoded? The encoding of a tuple type
starts with a `(', followed by the encoding of its elements,
followed by `)'. For the following invocation of
print
|
[mybar print (3.14e0, 9876543210, FALSE, 1234567890, 1.6d-19)];
|
the selector passed to the method will be
Tuples can of course also occur in a method name, and hence, in
mangled form, in the name of a C function implementing that
method. The following method
|
extern double
atan2 (double, double) (x, y);
|
responds to the selector d_atan2_(dd) , and the C
function implementing this method in the Math class
of the C library unit is
c_C_Math_d_atan2__dd_ .
Type names
Before we can implement our cos method, we must know
how to denote the TOM types in C. Table
2 lists the TOM types and for each type the equivalent type to
be used in C. These types are defined in
<tom/trt.h> .
Table 2: C types for TOM types
TOM type |
C type |
TOM type |
C type |
TOM type |
C type |
void |
void |
int |
tom_int |
pointer |
void * |
boolean |
tom_byte |
long |
tom_long |
selector |
selector |
byte |
tom_byte |
float |
tom_float |
reference |
tom_object |
char |
tom_char |
double |
tom_double |
dynamic |
... |
The triple dots as the C type for the TOM dynamic type actually refer
to the triple dots used in a C function to denote a variable
number of arguments and the used of <stdarg.h> .
However, that is a very hairy issue we will not delve into right
now.
External implementation
From the information in table 2, we are
finally able to write our cos method, supposedly for
the Math class of the C unit:
|
#include <math.h>
#include <C-r.h>
tom_double
c_C_Math_d_cos_d (tom_object self, selector cmd, tom_double arg)
{
return cos (arg);
}
|
The conversion from the tom_double arg
to the (C) double accepted by the function
cos is handled by the C compiler, as is the
conversion of the result of cos to the value that is
returned. (On all machines currently supporting TOM, a
tom_double is simply a double , making
the conversion rather easy.)
A few things can be said about this code:
- The inclusion of
<C-r.h> isn't strictly necessary
in this example, but in the case of less trivial implementations,
including the file unit-r.h is mandatory.
This file is the resolver output and contains vital information
about the classes and selectors that are defined in the
unit. It also includes the resolver output of the units
on which the unit depends, plus the TOM runtime header
file <tom/trt.h> . The latter is mandatory (for
the C equivalent definitions of the TOM types). Often you will
also find use for including <tom/util.h> which
contains less elementary information for interfacing with TOM code
and the TOM Run Time library (trt).
- The first argument to a method implementation is always the (implicit)
receiver object. In C you should always declare the type to be a
tom_object , even if you think to know that it will be
something more specific. The tom_object type is
pretty opaque, being defined as follows (in
<tom/trt.h> :
|
typedef struct trt_instance
{
/* The class of this object. */
struct trt_class *isa;
/* The flags needed by the runtime. */
tom_int asi;
} *tom_object;
|
The isa is the pointer to the class of the object;
the asi field is used by trt to store (1) whether the
object is an instance, a class, or a meta class, and (2)
information for the garbage collector. Any instance (or
non-static class) variables of the object are not directly
available by dereferencing self ; a future TOM
highlight will shed light on how that can be achieved.
- The
selector argument cmd is the second
implicit argument to every method invocation. The
arg is the first `real' argument.
The hack
As promised at the start of this highlight, there also is a hack
to write functionality in C. This hack uses the fact that the
output of the TOM compiler actually is C.
Normally, the TOM compiler ignores anything enclosed within
<foo> and </foo> , regarding
it as comment (which does not nest). The flexibility of this
commenting scheme is that special comments, i.e., `comments' with
more meaning than just some remark on the code to follow, can be
qualified. For example the TOM documentation generator extracts
comments enclosed in <doc> and
</doc> , regarding them as documentation on the
class, variable, or method to follow. It skips all other
comments, including the <copyright> and
</copyright> at the top of the TOM library
units source files, since copyright information is not interesting
for the reader that wants to learn how a certain class works.
The single exception to the above rule is that text enclosed with
<c> and </c> is not
taken to be comment. Instead, the enclosed text is copied
verbatim to the output, implying that the text better be literal C
code, which is actually what it was meant for.
Our cos method can now be written as follows:
|
<c>
#include <math.h>
</c>
<doc> Return the cosine of the argument {arg}. </doc>
double (result)
cos double arg
{
<c>
result = cos (arg);
</c>
}
|
Again, a few notes:
- Do not use any nasties like the C
return statement in
your C code. Instead, assign a value to the return value of the
method, as is done in this example.
- You can include header files like
<math.h> (at the
global level; not within a method) but you can not include the
resolver output as was done with the external implementation of
cos . This has some implications that increase the
complexity of including C code like this, as will be explained in
a future highlight.
- If you C code starts with a declaration, it should start its own
block.
- The C code is included literally, so it does not need to be a fully
delimited entity. For example, the following implementation of
cos is `legal':
|
double (result)
cos double arg
{
<c>
{
double a = arg;
result = cos (arg);
</c>
return;
<c>
}
</c>
}
|
Conclusions
There is a lot more to mixing TOM with C, but withinin the limited
space (especially in my time) available for a TOM highlight, what
has been discussed here serves as a solid basis and good starting
point. The eager reader is pointed at the sources for more
information, as always.
Up: Highlights
|