Comparing Interned Strings in Java

Published in Java on 2-Dec-2011

Recently, I came across a conversation thread in one of the LinkedIn discussion groups on comparing strings in Java.  The discussion started with what seems like a fairly simple question: What is the output of this simple snippet of code:

The Code

public class Test {
    public static void main(String[] args) {
        String a = "Hello";
        String b = "Hello";

        if (a == b) {
            System.out.println("a == b");
        } else {
            System.out.println("a != b");
        }
    }
}

Note: This is a reasonable approximation of the code being discussed, it may not be verbatim.

What I found most amazing was that the discussion over the output of this simple piece of code went on for days, with dozens and dozens of posts, even after the right answer had been given and had been clearly explained. As such, it seems that what is going on in this bit of code may not be as widely understood as I thought it would be.

The Answer

This code in fact prints a == b. So, where does the confusion come from?

In every introductory Java textbook we are taught that we should never compare objects using the == operator. Instead, we should compare objects using the .equals() method.

When you instantiate an object, that object is placed in a section of memory called the heap. The variable itself exists in a separate section of memory called the stack and the value of the variable is the address in the heap memory where the object can be located. When we use the == operator we are actually comparing the addresses of the object in memory, not the object's contents.

This is relevant because if you instantiate two objects they are going to have different addresses in the heap. So, even if you create two objects that are otherwise identical, comparing the objects using the == operator will return false.

This is where the confusion comes from. In the sample code above, we have two objects, a and b. The objects have identical contents (they both are set to "Hello"). If, as we have said above, when we create two objects they will each have different memory addresses, we would expect the == operator to return false. However, in the code snippet above, we see that a == b returns true.

The Explanation

Java has a performance optimization called Interned Strings. When a String literal is created (a string surrounded by quotation marks), it is automatically interned. Other Strings automatically interned are String-Valued Constant expressions, which are multiple String literals concatentated together.

Interning a String places it in a cache of Strings located in a section of the heap called the permanent generation. These Strings exist as long as the Virutal Machine does; they are never garbage collected.

Whenever a String literal is used, if that literal has already been interned, the String in the interned cache will be used, rather than instantiating a new String object.

Strings not created by using a String literal (Strings that are created using the new operator, or Strings that are read in from input streams for example), will not be interned, even if they are identical to a String that exists in the interned cache.

So, when we compare two String variables, both of which were instantiated by using String literals, if they both have the same content, they will in fact refer to the same object in memory. This is why the code snippet above prints a == b: Both variables point to the same object.

Modifying the code in a small, key way, can completely change the outcome, however. Consider the following modification, noting the changes in bold:

public class Test {
    public static void main(String[] args) {
        String a = new String("Hello");
        String b = "Hello";

        if (a == b) {
            System.out.println("a == b");
        } else {
            System.out.println("a != b");
        }
    }
}

Here, you can see that variable a is created using the new operator. Because it is not created as a String literal, the value "Hello" is not interned. When we subsequently create the variable b using a String literal, the VM first checks the interned cache to see if "Hello" exists there (using the String's .equals() method). It does not, so it interns the String literal and then sets the value of b to that interned String. Executing this code will result in the output: a != b, because the two variables no longer point to the same address in memory (a points to a String object in the young generation of the heap, while b points to a String object in the interned cache in the permanent generation of the heap.

While the Virtual Machine typically takes care of interning Strings, you may choose to force certain Strings to be interned. This is accomplished by invoking the .intern() method on the non-interned String. This method will check the interned cache to see if the String value is already interned. If it is not, the String value will be interned and the method will return a reference to the newly created String object in the permanent generation.

The following code illustrates this behavior:

public class Test {
    public static void main(String[] args) {
        String a = new String("Hello").intern();
        String b = "Hello";

        if (a == b) {
            System.out.println("a == b");
        } else {
            System.out.println("a != b");
        }
    }
}

The output of this code will again be a == b because, although we are creating the first "Hello" String using the new operator, we are subsequently interning it. Thus, when the value for variable b is created, the String "Hello" has already been interned, so b is simply set to the cached version of the String.

As such, defining String variables and assigning them String literal values, as in:

String a = "Hello";

is really just syntactic sugar for:

String a = new String("Hello").intern();

Notes on String-Valued Constant Expressions

As mentioned above, in addition to String literals being automatically interned, String-Valued Constant expressions are also automatically interned. When a variable is set to a String-Valued Constant expression, like "Hello" + " World" the value will be concatenated, and then interned as a single value. So, for example, the following code will return a == b:

public class Test {
    public static void main(String[] args) {
        String a = "Hello" + " World";
        String b = "Hello World";

        if (a == b) {
            System.out.println("a == b");
        } else {
            System.out.println("a != b");
        }
    }
}

However, it is worth noting that if part of the expression was a variable, even if that variable was itself set to an interned String, the result would not be interned. The following example would print a != b:

public class Test {
    public static void main(String[] args) {
        String x = "X"
        String a = "Hello " + x + " World";
        String b = "Hello X World";

        if (a == b) {
            System.out.println("a == b");
        } else {
            System.out.println("a != b");
        }
    }
}

Conclusion

In truth, it is pretty rare that a developer is going to have to deal with, much less worry about, the intricacies of interned Strings. Most of the time this topic is largely an interesting bit of trivia. As long as the general rule of thumb is followed that when comparing object contents, the .equals() method should be used, everything will work as expected. The interning simply serves as an optimization in Java that allows Strings to be compared more easily. The .equals() method of the String class first calls the == operator before any other tests, allowing it to short-circuit the comparison of the actual content of the Strings, which would be more time consuming.

That being said, if a developer were writing an application that was comparing a large number of String objects that were not created as String literals, and if it was likely that many of those String objects would be identical, it may very well be worth while to forcibly intern those Strings via the .intern() String method in order to potentially improve the performance of the application.

I hope I have been able to shed some light on the inner workings of interned Strings and what their purpose is. It always pays to know what is going on under the hood.

References

Java Language Specification Section 3.10.5

About the Author

dan.jpg

Daniel Morton is a Software Developer with Shopify Plus in Waterloo, Ontario and the co-owner of Switch Case Technologies, a software development and consulting company.  Daniel specializes in Enterprise Java Development and has worked and consulted in a variety of fields including WAN Optimization, Healthcare, Telematics, Media Publishing, and the Payment Card Industry.