Thursday, May 28, 2009

Scala's Projections

While reading a cool posting by Chris Smith regarding F#, I noticed something he did not remark on, which is very common in functional programming languages. In his code, he makes a sequence from 3 to 3,628,800, and then does a lot of stuff on it. He transforms everything into strings, computes stuff from that string, etc. Here's a Scala equivalent:


object Factorial {
private [Factorial] class Fact(n : Int) {
def ! : Int = if (n <= 1) 1 else (n * new Fact(n-1).!)
}
implicit def toFact(n : Int) : Fact = new Fact(n)
}

import Factorial._

println((3 to (10!))
map (n => (n, (n.toString
map (_ asDigit)
map (_ !)
reduceLeft (_+_))))
map (p => (p._1, p._1 == p._2))
filter (_._2)
map (_._1)
reduceLeft (_+_)
)


Go ahead, open an interpreter and try it. The factorial stuff was just for fun -- a simpler definition would work more efficiently, in fact. Now, type the following:

(3 to 3628800).toList

Depending on your Java memory settings, you'll get an Out of Memory error. That's just too much data to store! So, how come you can not only generate that(*), but actually do a bunch of stuff on it?

This magic happens because those map and filter functions are non-strict. A strict function will compute every value and return its result. A non-strict function will return immediately, and each value will only be computed on demand.

As it happens, you are probably familiar with this concept at another level. Take, for instance, the following two declarations:

val ex1 = 2 + 3
def ex2 = { 2 + 3 }

It must be obvious that ex1 == ex2. It should be clear, too, that "2 + 3" was evaluated before its assignment to ex1, but that, in ex2's case, it only gets evaluated when you use ex2 somewhere (like in "ex1 == ex2"). In Scala, another variation exists:

lazy val ex3 = 2 + 3

With that declaration, ex3 will only be evaluated when you use it. For example, when the interpreter calls toString on it to display its result. But try this:

lazy val ex3 = { println("here!"); 2 + 3 }; println("Not there yet")

You'll see that the second println statement is executed before the first one. Only afterwards, when the interpreter calls toString, is the first one executed. For many, if not most, functional languages, that's how ALL expressions work: lazily.

Anyway, back to ex2, please note that ex2 is a strict function, because it computes "2 + 3" before it returns that value. Then again, there isn't really anything you can do with an Integer that does not require evaluation. But think about a List. You do not need to compute the whole list just to call an isEmpty method on it, for instance. Just the head will do.

This is what happens in our example. The expression "3 to (10!)" generates a Range. If you look up Scala's Library documentation on the class Range you'll see that it inherits from Projection, and that Projection has a few "non-strict" methods, and that they return Projections too. Among them are map and filter, the very same methods I'm using in most of that code.

The method reduceLeft, though, is strict. So, what happens is this:

  1. reduceLeft gets the "head" of the projection it receives, forcing map (_._1) to compute it
  2. map gets its value from filter (_._2)
  3. filter now starts iterating on the projection it received from the map above it, searching for the first value satisfying its predicate.
  4. as each element tested by filter, both maps above it, and the range itself, compute the next value
  5. after filter finds its first element, it delivers it, and it goes all the way back to reduceLeft
  6. reduceLeft checks if the tail of the projection it received is empty
  7. and that forces all the maps and filters to act again
Now, as soon as each element gets consumed by reduceLeft, they get disposed by the garbage collector, because they won't be used anymore. The elements still to be consumed haven't been computed yet, so they don't take any memory either. The only elements taking space in memory are the ones in the act of being processed. And, of course, the data structures used by the projection to make its magic possible.

Compare that to this code:

println((3 to (10!)).force
map (n => (n, (n.toString
map (_ asDigit)
map (_ !)
reduceLeft (_+_))))
map (p => (p._1, p._1 == p._2))
filter (_._2)
map (_._1)
reduceLeft (_+_)
)

Here, we use the method force to get a strict sequence. Even before the first map is executed, a complete sequence of all numbers between 3 and 3,628,800 will be stored in memory. Or, rather, will fill the heap and cause an exception.

In conclusion, I'd like reinforce that Scala is NOT non-strict by default. It has non-strict collections which can be used within limits. In other functional languages, not even reduceLeft would cause anything to be computed: it would be println causing that. Still, Scala's Projections are powerful tools, which the smart Scala programmer will use wisely to his or her advantage.


(*) I had originally used "3 to 3628800" instead of "(3 to 3628800).toList", which causes out of memory error on Scala 2.7.4 just fine, but Scala 2.8 is a bit smarter about it, and Range's toString method avoids fully evaluating it, for reasons I hope are now obvious. Thus, I changed the example to something which forces a strict evaluation of the range.