What every Java programmer should know about generics and type erasure

adminguy's picture
Posted November 18th, 2014 by adminguy

Image of the List interface from the JDK source.


 
Generics in Java resemble templates in C++. ... The syntax is deliberately similar and the semantics are deliberately different. ... Semantically, Java generics are defined by erasure, where as C++ templates are defined by expansion.
- From the book Java Generics and Collections, also referred from this SO answer.
 
 
Before generics were introduced in the language. a very popular Java interview question was 
 
"what are the advantages of using collections as compared to simple arrays ?" 
 
and part of the expected answer was 
 
"Java arrays can only hold one type of object. An array of strings can only hold String objects, while a Collection can hold any type of object".
 
A few years of putting strings, dogs, carrots and employees in the same collection object made developers realize this wasn't really an advantage. Imagine a collection containing all of the aforementioned types of objects. When your code fetches an object from the collection, it has no way of knowing what it just fetched. Developer either had to use instanceof hacks, or just make sure - with threatening comments - that developers put only one type of object in that collection.
 
// PLEASE DO NOT PUT ANYTHING BUT EMPLOYEE OBJECTS IN THIS LIST
// IF YOU DISOBEY THIS WARNING, I WILL MAKE SURE YOU CODE IN ASSEMBLY
// LANGUAGE FOR THE REST OF YOUR LIFE
List employees = new ArrayList();
 
code example 1
 
 
A few developers heeded such warnings, but many put all sorts of things in collections without their fingers quivering or showing any signs of guilt or remorse. Something had to be done to tame these unruly beasts. Generics was the answer. Generics allowed developers to declare their collections objects thus:
 
 
List<Employee> employees = new ArrayList<Employee>();
 
code example 2
 
 
With generics in place, anyone who wrote employees.add(new String("i am crazy")); would be disciplined with a compiler error. But this change was implemented in Java 2 version 1.5. Unfortunately  by this time a lot of code was already written against the earlier, non generics, form of the Collections library, and deployed. If such a huge change was introduced so late in the evolution of Java, all legacy code which referred to any collections library would have to be rewritten. This would be a very expensive exercise, and would piss off a lot of Java's users. It would also go against Sun's original commitment to backwards compatibility. 
 
A few compromises had to be made when generics were introduced in Java 2 version 1.5 to maintain backwards compatibility. A workable compromise was to introduce generics with type erasure. Let's look at a code snippet to understand what this means.
 
public class GenericsErasure {
    public static void main(String args[]) {
        List<String> list = new ArrayList<String>();
        list.add( "Hello");
        Iterator<String> iter = list.iterator();
        while (iter.hasNext()) {
            String s = iter.next();
            System. out.println(s);
        }
    }
}
 
code example 3
 
 
Type erasure works at the bytecode level by erasing all information related to generic types in the compiled bytecode. To rephrase it once again, additional type information (generics) is present in Java source code for the compiler to use, but it is not included in the compiled bytecode. 
 
To better understand what this means, we will take the bytecode and decompile it. There are several decompilers - we used JAD. Here's the decompiled code. 
 
public class GenericsErasure
{
    public GenericsErasure()
    {
    }
    public static void main(String args[])
    {
        List list = new ArrayList ();
        list.add("Hello" );
        String s;
        for(Iterator iter = list.iterator(); iter.hasNext(); System.out.println(s))
            s = (String)iter.next();
 
    }
}
 
code example 3
 
 
Notice anything interesting ? The code reads List list and not List<String> list. The compiled client code does not have any information related to generics. 
 
The Java 1.5 runtime does not have expect the client bytecode to contain any additional type information. Such type information is used only by the compiler to enforce compile time type safety. And this is where magic happens. Because the bytecode is not expected to contain any type information, all the legacy code which was written against old, non generified Collections classes, will continue to run with the new runtime. This allowed system administrators to upgrade their runtime for deployed software to Java 2 version 1.5, without having to worry about any old code breaking. 
 
It's pretty neat that type erasure was able to solve the backwards compatibility problem and generics were introduced into the Java language, but this came with a price, as we will see in future blog posts.
 
 
Technical Word Power
 
Before we say goodbye we would like to upload a technical word to your memory. Do you know what the term covariance means ?
 
Orthogonality is the property that means "Changing A does not change B". An example of an orthogonal system would be a radio, where changing the station does not change the volume and vice-versa. You can read more about orthogonality on Wikiedia. This thread on Stack Overflow also explains orthogonality with real world examples in the context of software development.