Wednesday, October 28, 2009

Strict Ranges?

Note: This article was created before Scala 2.8.0 was out, and is mostly related to a problem that is no longer present as of that version. The code in here is supposed to be run on Scala 2.7.x unless otherwise noted, and the output of Scala 2.8.0 usually refers to some work-in-progress version of it, not the final version.

A common mistake newcomers to Scala usually make is to treat Scala ranges as strict. It's easy to see why they make such mistake, because the very expression "strict", though relatively easy to find in the Scala API documentation, is unfamiliar to them.

So, before getting into the problem, I'll make a brief detour. In functional languages, there is a concept of "strict functions" and "non-strict functions". Informally, a "strict function" is one which always evaluates its argument. For programmers from a non-functional background, that's most of the functions they ever saw, but not all. For instance, the trinary operator, "test ? f1 : f2", does not evaluate all its arguments, f1 and f2, even though it always evaluate at least one of them.

Now, to see where that can lead to trouble, let's first see an example of non-strictness in action, with Scala 2.7:

scala> object X {
| val x = -10 to 10 map (2 / _)
| }
defined module X

scala> object Y {
| val y = List.range(-10, 10) map (2 / _)
| }
defined module Y

scala> Y.y(0)
java.lang.ArithmeticException: / by zero
at Y$$anonfun$1.apply(<console>:5)
at Y$$anonfun$1.apply(<console>:5)
at scala.List.map(List.scala:812)
at Y$.<init>(<console>:5)
at Y$.<clinit>(<console>)
at .<init>(<console>:6)
at .<clinit>(<console>)
at RequestResult$.<init>(<console>:3)
at RequestResult$.<clinit>(<console>)
at RequestResult$result(<console>)
at sun.reflect.Nativ...
scala> X.x(0)
res2: Int = 0

We see an error happens when the map on the list is applied, but not when the map on the range is applied. That's because the function "map" for Range (the result of "x to y" or "x until y" expressions) is non-strict. I can get the value for any part of that Range, except the eleventh element, without throwing an exception.

But let say we have a function expensiveComputation, which takes quite a while to process. What happens, then, if I do this:

scala> import scala.actors.Futures._
import scala.actors.Futures._

scala> val m = 1 to 1000 map (i => future(expensiveComputation(i)))
m: RandomAccessSeq.Projection[scala.actors.Future[Int]] = RangeM(<function>, <function>, <function>,
<function>, <function>, <function>, <function>, <function>, <function>, <function>, <function>, <fu
nction>, <function>, <function>, <function>, <function>, <function>, <function>, <function>, <functi
on>, <function>, <function>, <function>, <function>, <function>, <function>, <function>...

If you haven't seen it before, the "future" function uses threads to compute the function you passed to it. Your program can go on to do other stuff, while all that work is being done concurrently.

Except, of course, that's not what's happening. No computation is being done, because Range's non-strict map hasn't evaluated its arguments. In fact, each time I call "apply" to get the result of the computation, the function I passed to map (the future) will be computed again! For example:

scala> def expensiveComputation(n: Int) = {
|   println("Starting expensive computation")
|   val x = n * 2
|   println("Finished expensive computation")
|   x
| }
expensiveComputation: (Int)Int

scala> val m = 1 to 10 map (i => future(expensiveComputation(i)))
(output from "m.toString" skipped)

scala> m(0).apply
Starting expensive computation
Finished expensive computation
res12: Int = 2

scala> m(0).apply
Starting expensive computation
Finished expensive computation
res13: Int = 2

To see where that can be a problem to beginners, take a look at the code in this question at Stack Overflow:

def ttest() = {
val threads =
for (i <- 1 to 5)
yield new Thread() {
override def run() {
println("going to sleep")
Thread.sleep(1000)
println("awake now")
}
}

threads.foreach(t => t.start())
threads.foreach(t => t.join())
println("all done")
}

Can you imagine what happens?

Well, gladly, Scala 2.8 should have a strict Range, so newcomers will be spared the trouble and lost debugging time. Here's what this will look like with Scala 2.8:

scala> val m = 1 to 10 map (i => future(expensiveComputation(i)))
Starting expensive computation
Finished expensive computation
Starting expensive computation
Starting expensive computation
Finished expensive computation
Starting expensive computation
Finished expensive computation
Starting expensive computation
Finished expensive computation
Starting expensive computation
Finished expensive computation
Starting expensive computation
Finished expensive computation
Starting expensive computation
Finished expensive computation
Starting expensive computation
m: scala.collection.immutable.Vector[scala.actors.Future[Int]] = NewVectorImpl(<function0>, <function0>, <function0>, <f
unction0>, <function0>, <function0>, <function0>, <function0>, <function0>, <function0>)
Finished expensive computation
Starting expensive computation
Finished expensive computation

scala> Finished expensive computation


scala> m(0).apply
res11: Int = 2

scala> m(0).apply
res12: Int = 2

Also, if you try out the first code snippet on Scala 2.8, "x.x(0)" will also throw an exception. Now, if we want the previous behavior, we can just call the "view" method. It will work not only on Range, but on pretty much all standard collections, as it is defined on the class IterableLike.

scala> val m = (1 to 10 view) map (i => future(expensiveComputation(i)))
m: scala.collection.SeqView[scala.actors.Future[Int],Seq[_]] = SeqViewM(...)

scala> m.toList
Starting expensive computation
Finished expensive computation
Starting expensive computation
Finished expensive computation
Starting expensive computation
Finished expensive computation
Starting expensive computation
Starting expensive computation
Starting expensive computation
Starting expensive computation
Finished expensive computation
Starting expensive computation
Finished expensive computation
Finished expensive computation
Finished expensive computation
Finished expensive computation
Starting expensive computation
Starting expensive computation
Finished expensive computation
res5: List[scala.actors.Future[Int]] = List(, , , , , 
, , , , )
Finished expensive computation

scala> m(0).apply
Starting expensive computation
Finished expensive computation
res6: Int = 2

scala> m(0).apply
Starting expensive computation
Finished expensive computation
res7: Int = 2

And, for the first example:

scala> object X {
| val x = (-10 to 10 view) map (2 / _)
| }
defined module X

scala> object Y {
| val y = List.range(-10, 10).view map (2 / _)
| }
defined module Y

scala> Y.y(0)
res0: Int = 0

scala> X.x(0)
res1: Int = 0

And, speaking of Scala 2.8 changes, there is one that is also likely to reduce confusion among newcomers, and is related to what we spoke of. It's related to how guards on for-comprehensions work. For instance, take a look at this code:

scala> def find(n: Int, l: List[Int]) = {
| var found = false
| for (el <- l; if !found) found = el == n
| found
| }
find: (Int,List[Int])Boolean

scala> find(5, List.range(1, 10))
res14: Boolean = false

This code doesn't work as expected because of how for-comprehensions are translated:

l.filter(el => !found).foreach(el => found = el == n)

Since "filter" for List is a strict method, this just doesn't work, as "filter" evaluates before "foreach". However, Range's "filter" on Scala 2.7 is non-strict. Does it work? Yes:

scala> def findR(n: Int, l: Range) = {
| var found = false
| for (el <- l; if !found) found = el == n
| found
| }
findR: (Int,Range)Boolean

scala> findR(5, 1 to 10)
res17: Boolean = true

But, back to the first example, as of Scala 2.8 the for-comprehension code will translate like this:

l.withFilter(el => !found).foreach(el => found = el == n)

This new function, "withFilter", will be non-strict for standard Scala collections. There's no way, however, to ensure that third-party collections will preserve the non-strictness. Hopefully, people will act in a sensible manner! :-) At any rate, the "find" function, as written, will work on Scala 2.8.

I hope this helps people working with Scala 2.7, and hope even more for Scala 2.8 to arrive soon! As a parting thought, though, I'd like to briefly speak of "<-", seen in for-comprehensions. I have seen people calling it many things. Martin Odersky's (et al) Programming in Scala book, however, calls it "in". So "for (el <- l)" ought to read "for el in l". So, if you weren't sure what to call it, now you know! Happy coding!



8 comments:

  1. In the first code snippet, which version of Scala are you running that under? I just downloaded the 10/28 build of 2.8 and X.x(0) is causing a ArithException.

    ReplyDelete
  2. Scala 2.7.4. Yes, on Scala 2.8 it will cause errors, because Range became strict on Scala 2.8 -- why is the reason for this blog, in fact. I'll make a note about the Scala version, to be clear.

    ReplyDelete
  3. Thanks for the answer. Should have read the rest of the blog before asking questions w/ obvious answers. BTW: I really enjoy your blog entries. Keep them coming.

    ReplyDelete
  4. In your X and Y examples you probably mean calling X.x() and Y.y() on something other than zero, because regardless of range strictness if you call these functions on 0 you get division on zero.

    ReplyDelete
  5. You get division on X.x(10) and Y.y(10).

    ReplyDelete
  6. I mean, you get division by zero on X.x(10) and Y.y(10), as the range goes from -10 (index 0) to 10 (index 20).

    ReplyDelete
  7. Daniel, the view example seems to have a copy-paste-#fail in the output (if it is supposed to be 2.8)

    //Örjan

    ReplyDelete
  8. Örjan,

    Thanks for reporting it. I corrected the example I think you were speaking of (one which displayed wrong output anyway). Scala 2.8.0 had not been released yet, and some changes in the REPL made that example look different -- though the explanations on how the code is supposed to work are still correct.

    I also put in a note at the beginning, since this article is effectively useless nowadays. Newcomers are using Scala 2.8, and people using Scala 2.7 are rarely newcomers anymore. :-)

    ReplyDelete