Saturday, January 28, 2012

A day in bed with my laptop...

A day in bed. I got about half way through Thursday, and the shivering/shaking got too much, so I scampered home to recuperate. 15 hours sleep, a chat with NHS Direct and then a GP (amoxycillin), and things started to become bearable. But how to pass the time lolling around under the duvet...?

Scala of course. A brainy relative sent me a link to a chat about how the Guardian has shifted its website from java to scala, and I'd spotted Programming In Scala on a desk at my client's office. Last summer's brief-but-unconsummated flirting with Programming Groovy had left me with a rumbling desire to do something different-yet-the-same with my java experience. Serendipitous, eh?

Anyway, armed with scala 2.9.1 & Intellij IDEA's scala plugin, I ploughed through the Scala chapter in Bruce Tate's Seven Languages in Seven Weeks in pretty short order. The part that really stood out was the concurrency aspect. There's an example script that pulls 4 websites' homepages and counts the size of the content. The script does it sequentially first, and then concurrently - and to to get the concurrency, the code fires off 4 messages to itself, which it then processes (...but magically now in parallel). Here's what that bit looks like:

def getPageConcurrently() = {
  val caller = self

  urls.foreach{ url => actor { caller ! (url, PageLoader.getPageSize(url)) } }

  for (i <- 1 to urls.size) { 
    receive { case (url, size) => println("Size for " + url + ": " + size) }
  }
}

Now you can't tell me that's not beautiful!

There's a lot I don't understand. For instance, my first though was to optimise out that 'caller' var, and just inline 'self' - but that didn't work. Why ever not? How many threads is it using? Does the thread pool expand if messages need to be sent but all current threads are busy? Maybe it contracts when they're idle.

Oh, and the foldLeft operator - that's going to be lots of fun.

2 comments:

  1. Not that I have a clue about Scala but the caller = self looks very much like a closure

    ReplyDelete
    Replies
    1. I nearly understand it now :). The foreach line creates a new 'actor' for each url. The fragment in the braces is the code that is passed as-is (unexecuted, by-name) to the new actor. Creating actors like this instantiates the actor, and starts it executing in a separate thread. When these actors execute (now in their separate thread), the code is in a new context - and that's why we have to copy the value of the special variable 'self' into 'caller' - if we don't do that, 'self' will be evaluated in the actor code, where it will refer to the actor, not to context that created the actor. I think it's this binding that, as you say, makes it a 'closure'.

      Then, all the actor code is send a message (using the ! method (scala methods can have symbol names)) back to the caller - the message is made of a pair (2-tuple) of values, evaluated in parallel in the actor.

      The calling code code just sits in a loop until it has received the requisite number of messages, printing a diagnostic for each, and that's it! Simples!

      Delete