09 Nov

Writing parallel code in a readable way

Although I often have embarrasingly parallel problems (data parallel), and I have an 8-CPU box at work, I used to frown on writing parallel computing code when doing exploratory coding. We now have fantastic parallel computing facilities in Python (amongst other, multiprocessing, IPython, and parallel Python). However, in my opinion, there are two reasons to hesitate to use them, especially when the code is very imature (which is almost always my case, in research settings):

  1. It makes the code look less like the ideas it is trying to express. Peter Norvig made a pretty convincing case for scientific code reading like math at SciPy2009.
  2. Because parallel computing is out of process, in Python, it is simply harder to debug (though I hear that the IPython guys are on that).

I have progressively developed a tiny tool to adress both problems, at least for my embarrasingly-parallel problems. I address the second problem by having a trivial switch to run my code without importing any fancy parallel computing tools. And I address the first problem using syntactic sugar to be able to type in map/reduce code that actually looks like standard procedural code:

results = Parallel(n_jobs=2)(delayed(my_calculation)(data1, data2, parameter1=1, parameter2=2)
                        for data1 in store1 for data2 in store2)

There are several tricks here:

  1. I use a ‘delayed‘ decorator that creates the argument list and keyword argument dictionary for me so that I can type something that really looks like a function call. Also, the decorator checks to see if the function and the arguments can be pickled, because if not the parallel computing libraries will raise errors, sometimes with hard-to-understand messages.
  2. I use list comprehension to create the list to apply the map/reduce onto. List comprehension is really readable, and very powerful.
  3. The ‘Parallel‘ object hides all the cleverness. If the ‘n_jobs‘ parameter is set to 1, it does not call any parallel computing library. If it is set to -1, all the CPUs are used. The object instantiates the parallel computing context and also destroys it. While this is inefficient, it is great for catching errors early. And finally, while I have implemented this only for the multiprocessing module, any fork/join-based parallel computing library could be encapsulated the same way, thus providing a uniform API to do multi-node parallel computing or single-computer shared memory (as multi-processing uses the Unix fork call, and all modern Unices implement copy on write of memory pages, you get some shared memory for free without worrying about race conditions).

The code can be found here. The license is BSD, please use and abuse at your will :).

8 Responses to “Writing parallel code in a readable way”

  1. J Says:

    I may try that code, thanks. Parallel processing is an area that I think Python is still lacking. In particular OpenMP for python (or a similar) would be great.

  2. luispedro Says:

    Let me plug my own framework, jug:

    http://luispedro.org/software/jug

    It works on any set of processors that either share a filesystem (including through NFS) or that can connect to a redis database server (which is also very easy to set up). If you want another backend, it’s pretty trivial to do, as long as you can support a sort of dictionary interface and locking.

  3. gael Says:

    Indeed, Luis, I do believe it would be trivial to write the same wrapping interface to jug. In fact, I would encourage you to do so: it makes it easier for the user to use a new framework, if the interface is alike something he knows.

  4. Mathieu Says:

    This post is related to another post of yours to me: http://gael-varoquaux.info/blog/?p=83. i.e. the idea that your code can be broken down into tasks, that can be run on different CPUs and the results of which can be cached. I’m dreaming of a consistent framework that can take care of both issues and that is easy to integrate in my existing code.

  5. gael Says:

    Somewhat related, indeed. However, I have moved a bit away from the ideas of the old blog post (a framework for managing the lifecycle of scientific objects, that would give, amongst other things, data-parallel computing for free). The reason I have (temporarily) shunned from the complete framework, is that, as all frameworks, it was trapping me and making my code hard to reuse. Also, the resulting code did not look at all like math, but more like object-oriented super-designed code, which is hard to read.

    I have gone the way of considering that tasks are functions, and objects are just about anything, and that I use things that list comprehension to spool the latter on the former. For persistence, I am reasonably happy with a very simple memoize approach optimized for scientific computing (http://packages.python.org/joblib/). I believe that the idea can be carried on a bit more, adding some tracing functionality, for debugging and better use of the cache. More on that later, when I find more time to work on it.

  6. luispedro Says:

    jug does both what joblib does and the paralelisation.

    It even does a bit of logging, but the output gets very messy when there are multiple processors.

    The code still looks like Python. In fact, in most projects,I mostly write pure Python functions that can be used anywhere and jug only comes into the picture at the top-level script.

  7. Mathieu’s log » Blog Archive » Easy parallelization with data decomposition Says:

    [...] I came across this blog post which introduced me to the new multiprocessing module in Python 2.6, a module to execute multiple [...]

  8. Mathieu Says:

    Here’s a follow-up:

    http://www.mblondel.org/journal/2009/11/27/easy-parallelization-with-data-decomposition/

    I suggest an alternative decorator to parallelize list comprehensions.

Leave a Reply

111111