David R. MacIver's Blog: Turn your toString methods inside out

Turn your toString methods inside out

14 October 2007

All examples in this post will be written in a pseudo-dialect of Scala. Hopefully they should be easy to translate into your favourite programming language (or Java). I also haven’t bothered to compile any of them as they’re mostly not entirely valid. Feel free to point out errors.

Consider the following code:

class List[T]{
  // list implementation

  override def toString : String = {
    val it = this.elements;
    var result = "[";

```scala
while(it hasNext){
  result = result + (it next);
  if (it hasNext) result = result + ", ";
}
result + "]"
```scala
  }
}

What’s wrong with it?

Well, as you presumably know, concatenating two strings of length m and n is an O(m + n) operation (In Haskell or ML it would be an O(m) operation, so this can be made more efficient, but the basic point will still remain). This means we’ve accidentally made an O(n^2) toString algorithm. Oops.

So, the traditional response is:

class List[T]{
  import java.lang.StringBuilder;
  // list implementation

  override def toString : String = {
    val it = this.elements;
    var result = new StringBuilder();

```scala
while(it hasNext){
  result.append(it next);
  if (it hasNext) result.append(", ");
}
result.append("]").toString;
```scala
  }
}

Great! We’ve removed all those expensive string concatenations.

Now, what happens if we call toString on a List[List[String]]? Umm...

Now, consider the following code snippet:

  println(myReallyLongList);
```scala

Let's unpack what's going on in it.

```scala
  val it = myReallyLongList.elements;
  var result = new StringBuilder();

  while(it hasNext){
    result.append(it next);
    if (it hasNext) result.append(", ");
  }
  println(result.append("]").toString);

So, we’ve created a big intermediate string via a StringBuilder, then printed it, discarding the string after that. Right?

Wouldn’t it be great if we’d written the following code instead?

  val it = myReallyLongList.elements;

  while(it hasNext){
    print(it next);
    if (it hasNext) print(", ");
  }
  println("]");

No intermediate structures created at all. And note that the code used to print is almost exactly the same as the code used to append to the StringBuilder.

Conveniently there’s a useful little interface in java.lang which people tend to ignore. If not, we’d have had to write wrappers. In particular this is a superclass of Writer, PrintStream, StringBuilder and StringBuffer. So, let’s rewrite the above code:

class List[T]{
  import java.lang.StringBuilder;
  // list implementation

  def appendTo(ap : Appendable){
    val it = this.elements;

```scala
while(it hasNext){
  ap.append(... // err. What do we do here?
```scala
```scala

We could just do ap.append(it next toString). But that doesn't solve the first
problem - when we nest these things we're creating a lot of intermediate
strings and then immediately throwing them away, not to mention having once
again introduced a hidden O(n\^2) factor. Sadness. :(

Let's do the following:

```scala
  trait Append{
    def appendTo(ap : Appendable) : Appendable;

```scala
override def toString = appendTo(new java.lang.StringBuilder()) toString;    
```scala
  }

  object Appending{
    def append(any : AnyRef, ap : Appendable){
      if (any.isInstanceOf[Append]) any.asInstanceOf[Append].appendTo(ap);
      else ap.append(any toString)
    }
  }

Now we can write it as:

class List[T] extends Append{
  import Appending._;
  import java.lang.StringBuilder;
  // list implementation

  def appendTo(ap : Appendable) = {
    val it = this.elements;
    while(it hasNext){
      append(it next, ap);
      if (it hasNext) ap append(", ");
    }
    ap.append("]");
  }
}

Now, no matter how deeply we nest things, we’ll get things printed in a manner with completely consistent performance - no hidden gotchas.

There are also other benefits to structuring things this way. If you make everything work based on an API that looks like this you’ll tend to write things which work by injecting filters in reading and writing code. And, hey, suddenly all your code works completely transparently when you discover that you need to work with things that are e.g. read off the network, backed by something on the file system, etc. and really need a streaming version of the library.

Also note that I’m not saying “Strings are bad”. There are a lot of cases where what you need really is a persistently available string. Then, by all means, use toString! But even then this is helpful, as your toString code will work a lot better and more consistently than it might otherwise have done.

Comments

google.com on 2007-10-15 00:04:00:

In C++, this approach has been the standard way to do these things since at least 1991.

- Eelis

David R. MacIver on 2007-10-15 22:42:00:

Oh well. I didn’t expect it was new. :-) Just a good idea which seems to be largely ignored in the Java (and .NET, and Scala...) communities

Maybe I should learn C++. This isn’t the first time one of my posts has been responded to with “Yeah, we already do that in C++”.

Jan on 2008-02-19 21:09:00:

This is also the way things are done in Haskell! Take a look at the type class Show and the type ShowS. Wouldn’t it be nice if toString() was implemented as this.appendTo(new StringBuilder()) or something like that?

David R. MacIver on 2008-02-19 21:18:00:

Well, it’s pretty near the way things work in Haskell, yes.

The Haskell approach is more analagous to always using a StringBuilder to build up your intermediate strings then doing toString only at the end than it is to using the Appendable.

In order to replicate this more exactly in Haskell you’d probably want to define a (Monad m) => MonadPrint m type class with a function print :: String -> m () and define instances for IO and State (String -> String) (the latter would encapsulate the current showS stuff).

Jan on 2008-02-19 21:39:00:

True! But when I did Haskell on a more regular basis, I didn’t like polluting my types with IO or other monads - I tried to confine them to main as far as possible.

Arguably, the ShowS approach fits in nicer with the stance of first preparing your output in a pure way and only then interacting with the outside world.

Also, the idiom of writing

shows (CompoundThingy a b) = shows ‘{’ . shows a . shows ‘,’ . shows b. shows ‘}’

is nice and opens up possibilities for abstracting the boilerplate away. (Sorry for bad formatting)

David R. MacIver on 2008-02-19 22:01:00:

Sure. I don’t have a problem with the showS approach - it goes most of the way towards solving the problem and is vastly simpler than a full solution would be. I’m just saying, it’s not quite the same thing. :-)

Monads aren’t inherently impure though. A state monad is just a wrapper around a parameter passing function after all.