Wednesday, October 10, 2012

Scala, recursion and meta-function literals

I'm 3 weeks in to the Functional Programming in Scala course at Coursera.org, and I've just knocked out this week's assignment. I was so shocked at part of the solution, I've reproduced it here (with enough changes to comply with Coursera's honour code, natch).
object GetThis {
    
  def matchAny( predicates: List[ Int => Boolean]): Int => Boolean =
  { 
     (x: Int) => if( predicates.isEmpty) false
       else if( predicates.head(x)) true
       else matchAny(predicates.tail)(x)
  }
  
  def matchAll( predicates: List[ Int => Boolean]): Int => Boolean =
  { 
     (x: Int) => if( predicates.isEmpty) true
       else if( predicates.head(x)) matchAll(predicates.tail)(x)
       else false
  }

  val possible= List( 1,2,3,4,5,6,7,8,9,10,20,30,40,50,99)
  
  val conditions= List(
    (x: Int) => x % 2 ==0, 
    (x: Int) => x % 5 ==0, 
    (x: Int) => x > 5 
  ) 
  
  val matchedAny= possible.filter( matchAny(conditions))
  
  val matchedAll= possible.filter( matchAll(conditions))
}
So, you can see that I

  • create a list of integers, called 'possible'
  • build a list of function literals that specify functional maps from ints to booleans (which is pretty cool, but not the really cool bit)
  • use the filter method on the list of possible integers (which is a tiny bit cool, but not the really cool bit). 
 The magical bit happens with the matchAny / matchAll functions (meta-functions, perhaps). They take a list of predicates (each predicate maps ints to booleans) and combines them into a single predicate-applying function. They both are defined recursively, testing the predicate at the front of the list, and recursing with the tail. So these functions are returning a function (I think) that is defined in terms of further applications of itself. That is the really cool bit :)

Thursday, February 16, 2012

The average scala problem

One of my long term favourite interview sections is the bit where we develop a java method to calculate the average of a set of numbers. Oddly, even after 5 years or more, I still hear different things coming back. Recently, for instance, the first thing the candidate said was they'd think about break the summation across several threads. Now, regardless of whether optimising for performance using concurrency before the interface is defined is a good sign, it set me thinking about how hard this would be in Scala. In java, it's obviously possible, but I wouldn't have considered it, because the thread management would obscure the limited functional code (I think). Scala though, with its share-nothing, message-based approach, could maybe do it with a small amount of verbosity.

Before I look at the concurrent solution, 'average' is an interesting thing to do in Scala anyway, because it takes you into the functional arena. With a traversable collection, you have the foldLeft method. You provide an initial value, and an operator, and foldLeft applies the the operator to each item in the collection in turn, with the result carried forward. So to sum a collection, you provide foldLeft with 0.0 (the starting value for the running total), and the binary operator '+', like this

def averageStraight(values: Array[Double]): Double = {
  values.foldLeft(0.0)(_ + _) / values.length
}

You can probably do this more economically (in fact, Scala provides a sum method anyway, as part of traversable for numerics, but where's the fun in that?). The compiler will often figure out the type, but here it seems I need to be quite explicit about the fact I'm using Doubles. So, foldLeft takes the first parameter list, consisting of the initial value. Then the second parameter list contains just the '+' operator - note the use of the _ as an unnamed (...and untyped) placeholder parameter.

So, having got over the obstacle of the 'normal' version, this is what I wrote for the concurrent version:

def averageConcurrent(values: Array[Double], blocks: Int): Double = {
  val caller = self

  val blockSize = values.length / blocks

  (0 to values.length / blockSize).foreach(block => actor {
    val from = block * blockSize
    val until = from + (blockSize min (values.length - from))
    caller ! values.slice(from, until).foldLeft(0.0)(_ + _)
  })

  var sum = 0.0

  for (i <- 0 to blocks) {
    receive {
      case (partialSum: Double) => sum += partialSum
    }
  }

  sum / values.length
}

We pass in the array of values to average, and the number of concurrent actors we want. We figure out how much of the master array can be summed by each actor (the final actor might have a shortfall, making the code a tad ugly), and then kick off each actor with the section of the array to sum. Then just wait for the partial sums to come back as messages, and divide by the total number of elements.

Using an array of 5 million doubles, the straight version takes 0.03 seconds. I can't understand the parallel performance though - with a single thread (i.e., all the overhead, and none of the benefit), it takes ~0.2seconds, or about 10x as long. As the number of blocks increases, the time doesn't seem to change much, although it keeps a downward trend up to several hundred blocks. There's little or no I/O here, so I expected it to top out when the number of threads hit the number of cores (this is on a MacBook Air, i5 - reports 2 cores, but maybe it does hyperthreading (or is that just windows?) - anyway, definitely not hundreds of cores!). The timings are rubbish though - running under Intellij, other things in the background etc. - I need to re-run with clean, idle system.

Final thought, would it really be so horrid in Java? A fair comparison needs the java version, methinks.

Saturday, January 28, 2012

A day in bed with my laptop...

A day in bed. I got about half way through Thursday, and the shivering/shaking got too much, so I scampered home to recuperate. 15 hours sleep, a chat with NHS Direct and then a GP (amoxycillin), and things started to become bearable. But how to pass the time lolling around under the duvet...?

Scala of course. A brainy relative sent me a link to a chat about how the Guardian has shifted its website from java to scala, and I'd spotted Programming In Scala on a desk at my client's office. Last summer's brief-but-unconsummated flirting with Programming Groovy had left me with a rumbling desire to do something different-yet-the-same with my java experience. Serendipitous, eh?

Anyway, armed with scala 2.9.1 &amp; Intellij IDEA's scala plugin, I ploughed through the Scala chapter in Bruce Tate's Seven Languages in Seven Weeks in pretty short order. The part that really stood out was the concurrency aspect. There's an example script that pulls 4 websites' homepages and counts the size of the content. The script does it sequentially first, and then concurrently - and to to get the concurrency, the code fires off 4 messages to itself, which it then processes (...but magically now in parallel). Here's what that bit looks like:

def getPageConcurrently() = {
  val caller = self

  urls.foreach{ url => actor { caller ! (url, PageLoader.getPageSize(url)) } }

  for (i <- 1 to urls.size) { 
    receive { case (url, size) => println("Size for " + url + ": " + size) }
  }
}

Now you can't tell me that's not beautiful!

There's a lot I don't understand. For instance, my first though was to optimise out that 'caller' var, and just inline 'self' - but that didn't work. Why ever not? How many threads is it using? Does the thread pool expand if messages need to be sent but all current threads are busy? Maybe it contracts when they're idle.

Oh, and the foldLeft operator - that's going to be lots of fun.