On the Flexibility of Programming Languages

TOM

On the Flexibility of Programming Languages

FAQ
News
Highlights
Publications
Documentation
Download TOM
TOM Software
Bug Database
Mailing Lists

Mail:
tiggr at gerbil.org

Short Cuts:

Tesla
TOM/Gtk
GP
MU

Snapshots:

all of 'em
tom [an error occurred while processing this directive]
tesla [an error occurred while processing this directive]
mu [an error occurred while processing this directive]
tomgtk [an error occurred while processing this directive]

Released:

all of 'em
tom 1.1.1
tomgtk 0.11
tesla 0.91
gp 0.5
mu 1.0

Misc:

GIF free NOW!

by Pieter J. Schoenmakers <tiggr at gerbil.org>
Programmers Without Deadlines
Eindhoven, the Netherlands

Abstract

This paper explores the difference between flexibility of design and flexibility of construction, and how this affects programming and programming languages. Including the right amount of flexibility in the design of a program is a delicate task best suited to humans. The flexibility of construction, on the other hand, is defined by the programming language that is used. We define what a programming language should offer in support of flexibility of construction and explore the advantages of having this flexibility of code. We conclude with a short taste of TOM, an object-oriented programming language that has been developed to provide flexibility of code.

Introduction

According to Merriam Webster's on-line dictionary, flexibility is the noun corresponding to the adjective flexible:

flex·i·ble

capable of being flexed: PLIANT
yielding to influence: TRACTABLE
characterized by a ready capability to adapt to new, different, or changing requirements <a flexible foreign policy> <a flexible schedule>

In the context of programming, program design, or design in general, a design exhibits some flexibility if it can adapt to new, different, or changing requirements. Eliminating the term the requirements, which stress the design process too much, flexibility refers to the unexpectedly many ways in which a design can be used.

Flexibility

The flexibility in a design is important. Flexibility provides the design with room to grow, to cater for requirement changes during development and for additional requirements that the users will want in the future. Putting the right amount of flexibility into a design is an important part of the act of designing: it requires human creativity. We can call this kind of flexibility flexibility by design.

During the design process, the designer has to make choices about the construction he is creating and about the building blocks he will be using. The choice of these building blocks determine part of the flexibility that will be exhibited by the final product. We can call this flexibility by construction.

As an example, if I order a wall, the supplier can deliver a concrete wall or a brick wall. Both walls fit the requirements, yet if I change my mind later on, e.g., because I want windows in my wall, adding windows to the brick wall will be easier than to the concrete one. The bricks offer more flexibility than the concrete.

As another example, if a problem needs to be solved by writing a C program, some code needs to be written as the solution. The code is designed by the programmer, and the flexibility of the solution is determined by its design. If the code is packed into a single function, the code can only be applied to solve that particular problem. If the solution is split in several sub-solutions to sub-problems, and the code split in various functions, each of which solves a sub-problem, those functions may later be applied to solve the same sub-problem in other problems. Put differently, while the design of the solution is not different, making the sub-solutions accessible makes them usable for solving other problems. Multiple functions exhibit more flexibility than a single function.

Flexibility of programming languages

Analogous to flexibility in general, the flexibility of programming languages refers to the unexpectedly many ways in which utterings in the language can be used. The flexibility in the design of a program is offered through source code: modifying a program's source code adapts the program's design. The flexibility by construction of a programming language is the flexibility of code. It dictates the possibilities for code reuse without source code modification or recompilation.

For example, consider libraries. A library consists of object code accompanied by an interface declaration somewhere in /usr/include. A library can not be recompiled for its source code lacks or is unmodifiable or, in the case of shared libraries, the library must be redistributed before changes take effect. As a result, the flexibility of source code is absent. Since its design can not be adjusted to make it fit reuse in our program, it is the flexibility of construction of the library that determines its successful reuse.

Flexibility in C

To illustrate flexibility of code, consider the following piece of C code:

printf ("Hello, world\n");

The level of flexibility that can be exhibited by this piece of code depends on how printf was declared. With K&R in mind, we expect this code to cause a string to be printed on stdout, usually our terminal. Now, if printf was declared as a function pointer, as in:

int (*printf) (char *, ...);

our piece of code can be made to do new things that we did not envision when we wrote it. For example, it can make a window pop up on a screen with the indicated text contained in it. All that needs to do be done is to make printf point at some other function with the same interface.

In reality, printf is not declared as a function pointer, but as a function:

int printf (char *, ...);

The call in our piece of code to this function actually binds our code to a piece of code named printf rather strongly. An invocation through an interface---its declaration---has bound us to an implementation. Only outside the language, by reverting to linker tricks that are platform dependent and even then obscure, it is possible to change the function that is invoked from the printf in the C library to some other function.

Though the flexibility offered by function pointers is eminent, they have one important drawback: being a function pointer instead of a function, they depend on timely source code annotations and are therefore flexibility by design.

Object-oriented flexibility

Object-oriented programming languages have a natural flexibility in polymorphism: when we invoke a method M of an instance of a class C, we can not be sure whether M will be invoked, or a method by a subclass of C that overrides M.

Java employs the final qualifier, which exists to restrict this flexibility:

classes: A final class can not be subclassed. As a result, if we invoke a method M of a final class C, we can be sure that we invoke that method M on an instance of that class C.
Java requires final classes for the secure deployment of applets. For example, instances of the Java String class are unmodifiable strings. String is final to prevent subclasses from violating the property of unmodifiability, and making "/tmp/foo" appear as "/etc/passwd" at unfortunate moments.
methods: A final method can not be overridden by subclasses: a final method disables polymorphism. As a result, if we invoke a final method M of a class C, we can be sure that we invoke the method M on an instance of C or one of its subclasses: M will not have been overridden.
This use of final resembles C++ methods that are not virtual. However, final imposes a much heavier restriction: whereas a virtual method disables dynamic method binding, it still allows the method to be overridden by subclasses, which is not allowed by final.

Resuming, the natural flexibility of object-oriented programming languages is, in the object-oriented programming language Java, killed by the final qualifier. This severely hampers reuse of the classes involved.

Past, present, and future code

The examples of flexibility in C and Java in the previous sections are concerned with restrictions to flexibility which affect programmers who do not have access to the source code. The difference between having source code access to a certain piece of code and not having such access is the difference between two programming teams.

If we look at the code that is present in a running program, we can discern one or more libraries, the program proper, and any plug-ins it may have loaded at run time (see figure 1). The libraries support the program: the program uses the services offered by the libraries. In turn, the program supports the plug-ins.

Figure 1: Code support relation.

In general, each library, program, and plug-in is developed by a separate team. Even in an Open Source world, a program team does not have modifying access to the shared libraries being used: when they modify the source of a shared library, the shared library has become part of the program.

Each node in the code support relation represents a design: within the node, the source code provides the flexibility of design. Between nodes, the flexibility of the programming language used determines the flexibility of the construction.

Abstracting from the kind of code a team designs---library, program, or plug-in---we can determine a taxonomy of code, which is depicted in figure 2.

Figure 2: A taxonomy of code.

The developer of present code has modifying source access. Past code is unmodifiable and unrecompilable: it can consist of the shared libraries to the program developer, or include the program to the plug-in developer. Traversing the graph from present code to past code crosses a boundary of source code modification.

What we think of as present code is past code to the developers of future code. Going from present code to future code crosses a boundary of anticipation: we can not foresee, anticipate, and implement all needs of all future code. The difference between present code and future code is the difference between two designs. Present code might be useful to future code if the present design does not conflict with the future design too much.

It is clear that source code annotations like C++ virtual or Java final impose the design of present code upon future code. This allows the designers of present code to patronize the designers of future code, which is contraproductive, irrespective of the reasons underlying the present code designer's decision.

Encapsulation

An important reason underlying the single-design approach which allows the designer of present code to impose restrictions on future code is that of encapsulation, the reasoning being `if we do not tell about implementation aspect X we are able to change it later on, without affecting future code.'

As an example, consider the following declaration of the FILE type in an ANSI C <stdio.h> header file, and a few accompanying functions:

typedef struct stdio_file_struct FILE;

FILE *fopen (char *, char *);
int fprintf (FILE *, char *, ...);

The FILE type is fully opaque: nothing is known about the contents of the struct stdio_file_struct. Future code can not refer to the contents of the FILE and, as a result, can not require recompilation if we were to change their contents. This encapsulation gives some flexibility to the C library which does not affect all programs that depend on it. More generally, encapsulation gives flexibility to past code.

The C example presented here, while great for backwards flexibility, severely hampers forward flexibility. If the FILE were a class, we would not be able to subclass it, since we do not have information about its contents. Even though the contents can be marked private, rendering it inaccessible, a compiler for an object-oriented language needs information about the size of superclasses to be able to create a subclass. However, dependence on such information instantly makes the code of the subclass fragile with respect to changes affecting the size of the superclass. Put differently, encapsulation through private qualifiers does not aid backward flexibility, even though that was the goal of the encapsulation.

Backward flexibility

Backward flexibility is the flexibility to change your mind; to change, as the developer of past code, aspects of your implementation without affecting future code. Code is not affected if it does not require recompilation.

Backwards flexibility is dictated by the fragility of code. Fragile code requires recompilation sooner than flexible code. Fragile code is a problem in environments where recompilation is a burden:

When all programs depending on a shared library require recompilation, an incompatible version change of the shared library occurs. This means that until a program is recompiled (and redistributed!), it will not enjoy bug fixes and feature additions to the shared library, unless the shared library is really important enough to warrant separate maintenance, such as libc5 and the other few libraries in the Debian oldlibs section.
During development of a program or library, every object file in present code that requires recompilation because of a dependency of fragile code makes recompilation take longer, stretching the edit-compile-link-run-crash cycle.

As an example of a fragile programming language, C++ can be mentioned: changes in the size of the member variables in a class, or in the number of virtual methods, quickly require recompilation of all subclasses and client code.

Forward flexibility

Where backward flexibility offers the flexibility to change your mind, forward flexibility can be thought of as providing the flexibility to change the previous designer's mind. Forward flexibility is the flexibility that enables old code to do new things.

Flexibility in object-oriented languages

The flexibility of construction in an object-oriented programming language has everything to do with code reuse. An inflexible construction leaves not much room to apply the code to a new problem.

Class reuse

In contemporary, single-dispatch, object-oriented programming languages including Java, C++, Objective-C, and Eiffel, the unit of code is the method and the unit of code reuse is the class.

Suppose we require some functionality and a particular class has drawn our interest as possibly fitting those requirements. Several possibilities exist at that moment:

as is

The class fully fits our requirements: we can use it as is.

subclass

The class almost fits our requirements: we create a subclass of it and use that subclass in our application. This approach is frequently used: normally all classes are created as a subclass of their superclasses.

operate upon

We can write one or more functions and pass objects of the class as an argument. The functions can implement the required functionality by operating upon the objects. This is Plan B: though inherently non-object-oriented, we can still get the work done.

wrap

We could wrap the objects: for each object of the class, a wrapper object is maintained that stands for the wrapped object. To the outside world, it provides an object-oriented view; on the inside it operates upon the wrapped object. Each of these possibilities comes with its own shortcomings:

as is: Most classes are developed because of the shortcomings of existing classes. To use classes from past code as-is is not necessarily possible.
subclass: Subclassing requires that past code does not already allocate instances of the superclass. If it does, subclassing is not an option, since we would end up with two kinds of objects: those of the desired subclass and those of a superclass.
operate upon: Operating upon objects depends on functionality provided by past code. It is possible for the class to not provide enough functionality to support this approach. Apart from that, this approach resembles too much a return to programming with abstract data types.
wrap: This approach suffers the same drawback, since part of its implementation is to operate upon the wrapped objects. In addition, the administrative overhead of maintaining wrappers can be prohibitive.

The problem is that a class provides only as much possibilities for reuse as allowed by its design. Put differently, the class---the unit of reuse---provides only as much flexibility as provided by the class---the unit of design. While this suits planned reuse, it hampers unplanned reuse.

A reuse example

Reuse problems stem from incompatibilities: a class provided by past code does not implement some desired functionality or does not conform to some desired type. In [cecil] an excellent example is given that concerns `a text processing application [which] may add specialized tab-to-space conversion behavior for strings and other collections of characters defined in the standard library.'

It is very well possible that if such behavior can not be added to the string class, the class can not be used. It is even possible that the library providing the string class can not be used as a result. If it is possible to add the desired tab-to-space method to the class a lot of effort reimplementing, debugging, and testing a new string class can be saved.

It is possible that a library developer envisions use of his library by many string manipulation applications: he adds tab-to-space behavior to the string class in his library. Unfortunately, for example, in our application we require a variable tab setting, something which the library-provided tab-to-space does not have. In addition, past code invokes the tab-to-space method, so implementing our variable tab-to-space conversion under a different name is not a solution. We need to replace the tab-to-space method.

This kind of flexibility can not be catered for by a design. It is flexibility of code.

Flexibility through extensibility

Extensibility provides flexibility of construction. Extensible classes can be adjusted to make them fit particular situations. Extensibility of classes enables reuse of those classes beyond what was anticipated in their design. Extensibility enables unplanned reuse.

Extensibility of classes encompasses the ability to do, without recompiling past code:

add methods,
replace methods,
add state (instance variables), as possibly needed by new methods, and
add superclasses/supertypes.

The flexibility offered by extensibility is not part of the design of the classes: it is flexibility by construction. Extensibility requires the unit of code to not be fragile with respect to changes in the entities with which it interacts: it must neither break when units of code are added or replaced, nor when the data that it manipulates is moved. An example of the latter is a change in the size of an object when extra state is introduced.

Extensibility incurs overhead: it requires dynamic method binding and what we can call dynamic state binding. This overhead may be undesirable, but luckily a compiler may remove it where possible. However, libraries that are delivered in object form must remain fully extensible, otherwise efficiency decisions by the library compiler may negatively affect future code. When we know that there will not be future code, for example when we are building a single program that will run from a ROM and will not accommodate plug-ins, the compiler can very well remove extensibility. With more source code available to scrutinize, the compiler can do a better job. The best job can be done by a whole-program compiler, which digests all past and present source code of the program to be built, including that of libraries.

Extensibility during development is a positive thing, since it reduces the fragility of code and the recompiles thus induced. Extensibility during deployment can be eliminated by a compiler. Moreover, extensibility, being flexibility of code, can be eliminated by a compiler, whereas the creation of flexibility of design requires a human. This is an important difference.

Resuming, extensibility makes classes extensible entities. To the developer of future code it provides forward flexibility. To the developer of past code it provides backward flexibility.

TOM

TOM has been designed as an object-oriented programming language that provides extensible classes in support of unplanned reuse of code. A thorough examination of TOM and its design goals is given in [tom]. In this section, we give a short introduction to TOM.

Consider the following TOM program, which is explained in detail below:

implementation class
HelloUniverse: State

int
  main Array arguments
{
  [[self new] hello];
}

end;

implementation instance
HelloUniverse

void
  hello
{
  [[[stdio out] print "Hello, universe!"] nl];
}

end;

This program is defined by a class HelloUniverse which inherits from the standard-library class State. It defines one class method, which returns an int, is named main, and accepts one argument of type Array that is called arguments. This Array is also a standard-library class.

In the body of the main method we discern two method invocations, those things with square brackets. (TOM method invocations resemble Objective-C and Smalltalk method invocations but they lack the colons.) The first method invocation invokes the new method on self. In this example, self is the HelloUniverse class object, which inherits the new method from the State class. This method returns a newly allocated and initialized instance of the receiving class, a HelloUniverse object in our case. Of this object we subsequently invoked the hello method. That method is only defined later on in our program, but that is OK, since the TOM compiler will know where to expect it.

With the semicolon, the method invocation statement is finished and that concludes the main method. Isn't there a return statement missing? The answer is `no': like all variables in TOM, the return value defaults to 0, and that suffices for our purpose.

The second part of our program defines the instances of the HelloUniverse class. Since the class inherits from State, the instances resemble State instances. However, we add one method, hello, which does not return a useful value, as indicated by void.

The hello method asks the stdio standard-library class for the out stream, which is better known as stdout in C. To this stream, the string `Hello, universe!' is printed. The print methods of the standard-library streams return themselves, so the this method call is the nl method of the out stream, which emits a new line and flushes the output.

Suppose the program is delivered to us in the for of an executable and interface information, much like a familiar C library is usually delivered. The interface can be generated from the executable if it is lacking, so that is no obstacle. Also suppose that we think that our first TOM program should emit a more modest attribution. We can write the following extension to our HelloUniverse class.

implementation class
HelloUniverse extension Modesty

end;

implementation instance
HelloUniverse extension Modesty

void
  hello
{
  [[[stdio out] print "Hello, world"] nl];
}

end;

This extension defines a hello method, which overrides the one previously defined. Now, if the executable program is called hello and the object file containing this extension is called hellofix.so, we can run a modified version of the program by running it like this:

bash$ ./hello :extend=hellofix.so
Hello, world

The :extend option is a library option that dynamically loads the indicated object into the running program before invoking the main method. This way we have extended the program to do more suitable things, without needing to modify the original source code, or even recompile it. That's extensibility.

This example is of course a simple example showing only how a method may be replaced. There is more to TOM than this, as you may want to discover by visiting http://gerbil.org/tom.

Conclusions

Flexibility is an important part in every design. We can discern flexibility by design and flexibility by construction. Adding to a design the right amount of flexibility is an important part of the designer's job, irrespective of whether the designer is busy on software or something else. The design of flexibility is a human's job.

Flexibility by construction on the other hand, depends on the versatility of the building blocks from which a design is made. This versatility is a result of the design flexibility of those building blocks. With respect to computer software, flexibility by construction is the flexibility of code.

TOM is an object-oriented programming language that has been designed to provide flexibility of code. In TOM classes are extensible entities; a class can be adjusted to make it fit a particular reuse situation. The extensibility of classes fosters the unplanned reuse of code.

References

cecil: Graig Chambers, The Cecil Language---Specification and Rationale, report, University of Washington, Seattle, WA, March 1997.
tom: Pieter J. Schoenmakers, Supporting the Evolution of Software, Ph.D. thesis, Eindhoven University of Technology, Eindhoven, the Netherlands, July 1999.

Pieter J. Schoenmakers <tiggr at gerbil.org>

Up: TOM Publications