Java/C++ Performance, STL, Strings
Many years ago I wrote Why Java Will Always Be Slower than C++. As any article that says something “bad” about somebody’s favorite programming language, this one was controversial, resulting in a torrent of email spit being sent my way.
The problem mostly boils down to the Java bytecode not having support for “value” types (that’s a wrong name, a better one would be “embedded” types) except primitive types like int and double. Closely related to that is JVM’s lack of support for generics. Both things were proposed by Gosling in his (now deleted) paper on numerics, but were never implemented.
I got reminded of my article yesterday why reading the page Why We Chose C++ Over Java in the Hypertable wiki. They needed some large data structures and Java came out 2-3 times slower than C++ on their tests. A comment on that page about microbenchmarks by vicaya hits the nail right on the head here.
One thing I got reminded about while reading that text is how awesome STL is if you are obsessed with performance. Yes, the begin()/end() thing is full of pain. Yes, iterators need to be replaced [pdf] with ranges. But if you have a large data structure that you want to pack as tightly as possible in order to better use your caches and access it as “lightweightly” as possible, it’s hard to beat STL.
(Example: I find myself writing the following pattern of code over and over: You have a large dictionary. If a key exists in that dictionary, you want to tweak the corresponding value slightly. If it doesn’t, you want to create a new key/value pair. With .NET Dictionary this is a TryGetValue call followed by a write to the dictionary. With STL if the value exists the result of the find is an iterator that points to it and you can tweak it in-place instead of traversing the path to it with an independent write.)
And right after reading the Hypertable article I looked at some C++ code, and got reminded how much C++’s strings suck (talking about performance here, but they do suck on so many different levels). Java and .NET basically get it right by making strings immutable and then giving you a StringBuffer/StringBuilder class that represents a mutable string. C++ strings are mutable with copy-on-write, and have an absolutely awful performance profile. Back in ‘90s there was a proposal to make strings immutable and use Ropes for mutable string buffers, but the C++ committee didn’t want to break compatibility. Big mistake.

18. December 2009 at 14:50
> Closely related to that is JVM’s lack of support for generics.
Presumably what you’re referring to here is the fact that generics are implemented using type erasure? I assume you know that generic types were added to the langauge in Java 1.5?
20. December 2009 at 09:06
Donal,
Yes. Java’s/JVM’s support for generics is a joke.
For reference types this is really not that bad because casts class->interface are fast and both JVM and CLR ditch C++-style vtables and last-used-type-then-default-to-hashtable self-modifying code for generic interface dispatches in order to better use caches.
But for value types it’s awful. Boxing value types because you want to store them in a container and then doing a virtual dispatch on them is a crapload of overhead.
Dejan
3. February 2010 at 07:12
Thanks for having the guts to opine truth in the face of what was once a mere marketing machine, and is now the only gospel so many have grown up with.
Regardless of “who is right”, the kind of thoughtfulness you show is the way such things should be done.