------------------------------------------------ Painless Concurrency: The multiprocessing Module ------------------------------------------------ .. class:: endnote +-------------------------------------------+-------------------------------------------+ | .. image:: ../processing/yo-small.gif |**Author:** Roberto Alsina | | :class: right foto | | | |The author has been around python for a | | |while, and is finally getting the hang of | | |it. | | | | | |**Blog:** http://lateral.netmanagers.com.ar| | | | | |**twitter:** @ralsina | | | | | |**identi.ca:** @ralsina | +-------------------------------------------+-------------------------------------------+ .. raw:: pdf Spacer 0 1cm Sometimes when you are working on a program you run into one of the classic problems: your user interface blocks. We are performing some long task and the window "freezes", jams, doesn't update until the operation is over. Sometimes we can live with it, but in general it gives the image of amateurish, or badly written application. The traditional solution for this problem is making your program multi threaded, and run more than one parallel thread. You pass the expensive operation to a secondary thread, do what it takes so the application looks alive, wait until the thread ends, and move on. Here is a toy example: .. code-block:: python :include: ../processing/demo_threading_1.py.en Ir produces this output:: $ python demo_threading_1.py Starting the main program Launching thread Thread has been launched The thread is still running Starting to work The thread is still running The thread is still running The thread is still running The thread is still running The thread is still running The thread is still running Finished working Program ended It's tempting to say "threading is nice!" but... remember this was a toy example. It turns out that using threads in Python has some caveats. * You are not using multiple cores. Since there is a global lock in the interpreter, it turns out that python instructions, even when in more than one thread, are executed in sequence. The exception is that if you program does I/O, while you are doing it, the interpreter works. * It's easy to shoot your own foot Paraphrasing Jamie Zawinsky, if when you see a problem you think "I'll fix it using threads"... now you have two problems. * There is no way to forcibly interrupt a thread! That makes it possible to lock your app in complicated ways. * It's harder to debug multi threaded apps, specifically for race conditions and deadlocks. So, what can we do? Use processes instead of threads. Let's see an example that's suspiciously similar to the previous one: .. code-block:: python :include: ../processing/demo_processing_1.py.en Yes, the only change is ``import multiprocessing`` instead of ``import threading`` and ``Process`` instead of ``Thread``. Now the ``worker`` function runs in a separate Python interpreter. Since they are separate processes, this will use as many cores as processes you have, so it may be much faster on a modern computer. I mentioned deadlocks earlier. You may believe that with a little care, if you place locks around variables you can avoid them. Well, no. Let's see two functions ``f1`` and ``f2`` which use two variables ``x`` and ``y`` protected by locks ``lockx`` and ``locky``. .. code-block:: python :include: ../processing/demo_threading_2.py.en If you run it, it locks. All variables are protected with locks and it still locks! What's happening is that while ``f1`` acquires ``x`` and waits for ``y``, ``f2`` has acquired ``y`` and is waiting for ``x``. Since neither one is going to give the other what it needs, both are stuck. Trying to debug this sort of thing in non-trivial programs is awful, because it only happens when things occur in a given order and with a certain timing. It may happen 100% of the time on one computer and never in another which is a bit faster (or slower). Add to it that many Python data structures (like dictionaries) are not reentrant and you need to protect many variables and these scenarios become more common. How would this work with ``multiprocessing``? Since you are not sharing resources because they are separate processes, there are no problems with resource contention, and no deadlocks. When you use multiple processes, one way to handle this example is passing around the values you need. Your functions then will have no "side effects", making it more like functional programming in LISP or erlang. Example: .. code-block:: python :include: ../processing/demo_processing_2.py.en Why am I not using any locks? Because the ``x`` and ``y`` of ``f1`` and ``f2`` are not the same as in the main program. They are copies. Why would I want to lock a copy? If there is a case where a resource needs to be accessed sequentially, ``multiprocessing`` provides locks, semaphores, etc. with the same semantics as ``threading``. Or you can create a process to manage that resource and pass it data via a queue (``Queue`` or ``Pipe`` classes) and voilĂ , the access is now sequential. In general, with a little care on your program's design, ``multiprocessing`` has all the benefits of multi threading with the bonus of taking advantage of your hardware, and avoiding some headaches. Note: The ``multiprocessing`` module is available as part of the standard library in Python 2.6 or later. For other versions, you can get the ``processing`` module via PyPI.