Before I look at the concurrent solution, 'average' is an interesting thing to do in Scala anyway, because it takes you into the functional arena. With a traversable collection, you have the foldLeft method. You provide an initial value, and an operator, and foldLeft applies the the operator to each item in the collection in turn, with the result carried forward. So to sum a collection, you provide foldLeft with 0.0 (the starting value for the running total), and the binary operator '+', like this
def averageStraight(values: Array[Double]): Double = {
values.foldLeft(0.0)(_ + _) / values.length
}
You can probably do this more economically (in fact, Scala provides a sum method anyway, as part of traversable for numerics, but where's the fun in that?). The compiler will often figure out the type, but here it seems I need to be quite explicit about the fact I'm using Doubles. So, foldLeft takes the first parameter list, consisting of the initial value. Then the second parameter list contains just the '+' operator - note the use of the _ as an unnamed (...and untyped) placeholder parameter.
So, having got over the obstacle of the 'normal' version, this is what I wrote for the concurrent version:
def averageConcurrent(values: Array[Double], blocks: Int): Double = {
val caller = self
val blockSize = values.length / blocks
(0 to values.length / blockSize).foreach(block => actor {
val from = block * blockSize
val until = from + (blockSize min (values.length - from))
caller ! values.slice(from, until).foldLeft(0.0)(_ + _)
})
var sum = 0.0
for (i <- 0 to blocks) {
receive {
case (partialSum: Double) => sum += partialSum
}
}
sum / values.length
}
We pass in the array of values to average, and the number of concurrent actors we want. We figure out how much of the master array can be summed by each actor (the final actor might have a shortfall, making the code a tad ugly), and then kick off each actor with the section of the array to sum. Then just wait for the partial sums to come back as messages, and divide by the total number of elements.
Using an array of 5 million doubles, the straight version takes 0.03 seconds. I can't understand the parallel performance though - with a single thread (i.e., all the overhead, and none of the benefit), it takes ~0.2seconds, or about 10x as long. As the number of blocks increases, the time doesn't seem to change much, although it keeps a downward trend up to several hundred blocks. There's little or no I/O here, so I expected it to top out when the number of threads hit the number of cores (this is on a MacBook Air, i5 - reports 2 cores, but maybe it does hyperthreading (or is that just windows?) - anyway, definitely not hundreds of cores!). The timings are rubbish though - running under Intellij, other things in the background etc. - I need to re-run with clean, idle system.
Final thought, would it really be so horrid in Java? A fair comparison needs the java version, methinks.