Thursday, February 16, 2012

The average scala problem

One of my long term favourite interview sections is the bit where we develop a java method to calculate the average of a set of numbers. Oddly, even after 5 years or more, I still hear different things coming back. Recently, for instance, the first thing the candidate said was they'd think about break the summation across several threads. Now, regardless of whether optimising for performance using concurrency before the interface is defined is a good sign, it set me thinking about how hard this would be in Scala. In java, it's obviously possible, but I wouldn't have considered it, because the thread management would obscure the limited functional code (I think). Scala though, with its share-nothing, message-based approach, could maybe do it with a small amount of verbosity.

Before I look at the concurrent solution, 'average' is an interesting thing to do in Scala anyway, because it takes you into the functional arena. With a traversable collection, you have the foldLeft method. You provide an initial value, and an operator, and foldLeft applies the the operator to each item in the collection in turn, with the result carried forward. So to sum a collection, you provide foldLeft with 0.0 (the starting value for the running total), and the binary operator '+', like this

def averageStraight(values: Array[Double]): Double = {
  values.foldLeft(0.0)(_ + _) / values.length
}

You can probably do this more economically (in fact, Scala provides a sum method anyway, as part of traversable for numerics, but where's the fun in that?). The compiler will often figure out the type, but here it seems I need to be quite explicit about the fact I'm using Doubles. So, foldLeft takes the first parameter list, consisting of the initial value. Then the second parameter list contains just the '+' operator - note the use of the _ as an unnamed (...and untyped) placeholder parameter.

So, having got over the obstacle of the 'normal' version, this is what I wrote for the concurrent version:

def averageConcurrent(values: Array[Double], blocks: Int): Double = {
  val caller = self

  val blockSize = values.length / blocks

  (0 to values.length / blockSize).foreach(block => actor {
    val from = block * blockSize
    val until = from + (blockSize min (values.length - from))
    caller ! values.slice(from, until).foldLeft(0.0)(_ + _)
  })

  var sum = 0.0

  for (i <- 0 to blocks) {
    receive {
      case (partialSum: Double) => sum += partialSum
    }
  }

  sum / values.length
}

We pass in the array of values to average, and the number of concurrent actors we want. We figure out how much of the master array can be summed by each actor (the final actor might have a shortfall, making the code a tad ugly), and then kick off each actor with the section of the array to sum. Then just wait for the partial sums to come back as messages, and divide by the total number of elements.

Using an array of 5 million doubles, the straight version takes 0.03 seconds. I can't understand the parallel performance though - with a single thread (i.e., all the overhead, and none of the benefit), it takes ~0.2seconds, or about 10x as long. As the number of blocks increases, the time doesn't seem to change much, although it keeps a downward trend up to several hundred blocks. There's little or no I/O here, so I expected it to top out when the number of threads hit the number of cores (this is on a MacBook Air, i5 - reports 2 cores, but maybe it does hyperthreading (or is that just windows?) - anyway, definitely not hundreds of cores!). The timings are rubbish though - running under Intellij, other things in the background etc. - I need to re-run with clean, idle system.

Final thought, would it really be so horrid in Java? A fair comparison needs the java version, methinks.