David R. MacIver's Blog
Data structures in Java
I’ve been playing with implementing some simple data structures in Java, for later use in the lazy strings project. The results are... a bit depressing, frankly. Clearly I need to learn to optimise better, as right now I’m creating specialised data structures for certain types which turn out to perform worse than the general purpose implementations in the standard library. :-)
In particular, I threw together a simple implementation of a Trie, and it performs significantly worse than just using a TreeMap. As the number of collisions increases the relative performance of the Trie improves somewhat, eventually beating the TreeMap, but still in most situations the performance is pretty poor.
It also doesn’t help that the Trie ends up bloating the keys significantly, as you end up having to create far too many objects per character. I can definitely cut back on the object creation, but even one object per character may prove to be too much to get an efficient implementation in Java.
Comments
fusiongyro on 2007-08-05 02:29:00:
Java is usually depressing. :)
I don’t know much about what you’re doing but you might consider using
the Flyweight pattern from GoF. They talk about representing characters
in a word processor as objects, but using Flyweight to, instead of
creating a fresh object for each character, instead creating an object
for the first instance of any given character and then just pointing
back to the ones you’ve already created. It should work if it’s
sufficient for you to point at the character objects from inside your
data structure.
If it can be done and you try it, it should improve both your resident
size and your performance.
David R. MacIver on 2007-08-05 10:20:00:
Ha. Yes, that it is. I think the problem is mostly not Java though,
it’s more that I’m not yet very good at optimising things and that a
naive implementation of a cunning datastructure can be less efficient
than a heavily optimised version of a more normal datastructure.
Thanks for the suggestion. I like the idea, but it sounds like it will
only work when the objects in question are immutable. The problem here
isn’t Character objects (I’m storing char primitives on the nodes) but
the actual nodes of the decision tree, which are mutable and thus can’t
be reused like that.
I’m going to try to solve the problem by collapsing multiple nodes into
one, so that when there are long runs of characters without branches
they’ll be stored on a single node. This should hopefully reduce the
number of objects by a very large factor (exponentially, in some
cases).
David R. MacIver on 2007-08-05 11:50:00:
Update: I’ve implemented that now, and my Trie is now only twice as slow as TreeMap. :-)