Typed Collections (2002)

Improved reliability of software using the collections framework.

Introduction

Scope of this whitepaper
This whitepaper shortly discusses the main difference between Java and C++ collections, and then introduces an interesting alternative for Java that can be used immediately. This alternative will help to improve the reliability of Java software that depends on Java's collection framework. A reference implementation can be found at http://www.javagazette.com/frameworks/typedcollections/.

Collections framework
Since its first release in 1995 Java has featured several types of collection classes, that allows Java software to collect objects in various ways. Without collections, a Java program would not be able to easily maintain pointers to a collection of Java objects. Initially Java only featured simple collections like Vector, Hashtable and Stack. Later on this has been extended with a more advanced and better thought-out "collections framework".

Other languages
The notion of collections is not unique to Java, as every programming language requires support for this task. For example, in C and C++ there are various "container frameworks" that are often extended for personal use. These containers allow you to maintain pointers or references to various allocated memory spaces, just like the collection framework in Java.

Problem statement

Java's collection classes do not have any way to ensure the class of objects that will be in it. Example:

    A software application needs to maintain a list of employees.
    In Java code terms: an ArrayList with Employee objects.
    Problem: the ArrayList cannot guarantee that it will only collect Employee objects. The Java application can still put any other class of object in there, let's say a Car object!

This seriously affects the reliability of Java software using plain collections.

Existing solutions

To work around the problem that Java's collection framework does not ensure the class of objects in a collection, there are two existing solutions.

Parameterized collections
In C++ there is a way to specify the class of objects in a C++ container. This is called "parameterized containers", or "template containers". The way template containers work, is that the C++ compiler recognizes a template container declaration in the C++ code, and immediately generates a new container class specifically for that declaration.

Example:

    When applying the "list of employees" example,
    this would translate to C++ code that looks like this: List<Employee> .

The same trick that C++ uses here, is also possible for Java. A reference implementation already exists: the "pizza compiler" project. This project resulted in a java compiler that recognized parameterised collection declarations, as described in the previous C++ example. This has the advantage that the type checking is declarative, and that errors are detected early in the development process (namely at compile time). Disadvantage is that a custom pre-compiler or java compiler is required that supports the added declarations.
Required: special Java (pre-) compiler
Disadvantage: not standardized, such that the code that uses the newly supported declarations, fully depends on the proprietary compiler that is required to (pre-) compile that code.

Encapsulation of collections
A simple and effective way to ensure a collection will always contain the correct class of objects, is by encapsulating the collection in a 'wrapper' class that does these checks. The 'wrapper' class would:

  1. provide an interface that only supports the class of objects that are supposed to end up in the collection
  2. encapsulates a collection that will store those objects

This has the advantage that it can be easily implemented in any Java application, and allows for early detection (at compile time) against the wrapper's interface. However, it requires duplication of functionality if this occurs more than once in the application.
Required: 'wrapper' classes in application layer
Disadvantage: of course, any software utilizing the collections can build its own logic to encapsulate the way objects are inserted in the collection. This is error-prone though, while it would be nice if the collection itself could guarantee the class of objects that it collects.

Alternative approach: typed collections

So far two solutions to ensuring the class of objects in a collection have been mentioned. One required a special java compiler, the other one left it up to the developer. In order to keep the advantages of parameterised collections, but remove the disadvantages just mentioned, we need an alternative approach to the problem.

Compile-time vs. run-time
Before continuing it is important to distinguish the two moments at which type-checking for collections can be executed.

  1. compile-time
    The compiler ensures that all Java code never collects an object in a collection object that does not have an "is a" relationship with the type of objects that the collection guarantees to collect.
    Attempts to add other types of objects to the collection are impossible.

  2. run-time
    During run-time, the collection class or application ensures that the collection is still collecting only objects with an "is a" relationship with the type of objects that are expected to be in that collection.
    Attempts to add other types of objects will result in exceptions.

Run-time type checking
The solutions mentioned so far did type checking at compile-time. There is the alternative to move this to run-time.
Disadvantage: this approach has the disadvantage that error detection happens at a later stage, such that the application can show type errors while it is running.
Advantages: we keep the advantage that the incorrectly typed objects do not end up in the collection, which is what we need to improve the reliability of software utilizing collections. Also, by making the functionality generic, we have the advantage of not having to re-invent the wheel for every type-check we need to do, and furthermore it still does not require a proprietary compiler.

Extending the collections framework
Based on the advantages of moving type-checking to run-time, the proposal is to go for that.

So how to do this?

The proposal is to extend the existing collections framework with the capability of type checking during run-time:

  • use the collections framework like we are used to
  • during inserts and updates the extended collection framework ensures that the inserted/updated object is of the correct class of objects
  • if a type error is detected, the typed collection framework throws an IllegalArgumentException indicating the method argument is not of the correct type

This means that every collection class that exists in the collections framework needs to be subclassed and extended with the type-checking support. To indicate that a collection is typed, it needs to implement the following new interface:

public interface TypedCollection {
	public Class getAllowedType();
}

By implementing this interface, the subclasses of each collection indicate that they support type-checking during run-time, and allow you to check what the allowed type actually is. A typed collection does not have the ability to have the allowed type changed once it has been constructed, as this could result in unexpected side-effects. To ensure that a typed collection is always typed the same during its lifetime, subclasses that implement the "TypedCollection" interface have constructors that take the "allowedType" as argument. It is that allowedType that is returned by the "getAllowedType" method.

Samples
The proposed extension of the collections framework looks interesting. How can it be used in practice?

Example:

    Let's take the "list of employees" example again.
    Declaring the list:

       ArrayList myList = new TypedArrayList(Employee.class)
    

As you can see, we would pass the allowed type to the constructor of the extended ArrayList class. From then on, the ArrayList object knows what the "allowedType" is, and can check whether objects that are added to the collection have an "instanceof" relationship with the allowedType!

Example:

    Adding a ParttimeEmployee to the list is okay:

       myList.add(new ParttimeEmployee())
    
    Adding a Car object to the list is not okay of course:
       // this throws an IllegalArgumentException...
       myList.add(new Car())