tushman.io

The musings of an insecure technologist

Python | Multiprocessing and Interrupts

tl;dr: If handling interrupts is important, use a SyncManager (not multiprocessing.Manager) to handle shared state

I just hit the learning curve pretty hard with python’s multiprocessing — but I came through it and wanted to share my learnings.

Preliminary Thoughts

The bulk of this post is going to be around using the multiprocess library, but a few preliminary thoughts:

Multiprocessing and Threading is hard (especially in python):

Its starts off all hunky-dory — but trust me, factor in time to hit the wall … hard.

pstree is your friend.

Stop what you are doing, and pick your favorite package manager and install pstree: (for me: brew install pstree)

Its super-useful for dealing with sub-processes, and see what is going on.

In particular pstree -s <string> which searches your branches containing processes that contain the string in the command line. So much better than ps

Output looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
➜  pstree -s python
-+= 00001 root /sbin/launchd
 \-+= 00221 jtushman /sbin/launchd
   \-+= 00446 jtushman /Applications/iTerm.app/Contents/MacOS/iTerm -psn_0_188462
     \-+= 07770 root login -fp jtushman
       \-+= 07771 jtushman -zsh
         \-+= 46662 jtushman python multi3.py
           |--- 46663 jtushman python multi3.py
           |--- 46664 jtushman python multi3.py
           |--- 46665 jtushman python multi3.py
           |--- 46666 jtushman python multi3.py
           \--- 46667 jtushman python multi3.py

Know your options

There are more then one paralyzation framework / paradigms out there for python. Make sure you pick the right one for you before you dive-in. To name a few:

Important: Unless you are a ninja — do not mix paradigms. For example if you are using the multiprocessing library — do not use threading.locals

The Main Story

Axiom One: All child processes get SIG-INT

Note: I will use SIG_INT, Keyboard Interrupt, and Ctr-C interchangeably

Consider the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from multiprocessing import Process, Manager
from time import sleep

def f(process_number):
    try:
        print "starting thread: ", process_number
        while True:
            print process_number
            sleep(3)
    except KeyboardInterrupt:
        print "Keyboard interrupt in process: ", process_number
    finally:
        print "cleaning up thread", process_number

if __name__ == '__main__':

    processes = []

    manager = Manager()

    for i in xrange(4):
        p = Process(target=f, args=(i,))
        p.start()
        processes.append(p)

    try:
        for process in processes:
            process.join()
    except KeyboardInterrupt:
        print "Keyboard interrupt in main"
    finally:
        print "Cleaning up Main"

The abbreviated output you get is as follows:

1
2
3
4
5
6
7
8
9
10
11
^C
Keyboard interrupt in process:  3
Keyboard interrupt in process:  0
Keyboard interrupt in process:  2
cleaning up thread 3
cleaning up thread 0
cleaning up thread 2
Keyboard interrupt in process:  1
cleaning up thread 1
Keyboard interrupt in main
Cleaning up Main

The main take aways are:

  • Keyboard interrupt gets send to each sub process and main execution
  • the order in which the run is non-determanistic

Axiom Two: Beware multiprocessing.Manager (time to share memory between processes)

If it is possible in your stack to rely on a database, such as redis for keeping track of shared state — I recommend it. But if you need a pure python solution read on:

multiprocessing.Manager bill themselves as:

Managers provide a way to create data which can be shared between different processes. A manager object controls a server process which manages shared objects. Other processes can access the shared objects by using proxies.

The key take away there is that the Manager actually kicks off a server process to manage state. Its like it is firing up your own little (not battle tested) private database. And if you Ctr-C your python process the manager will get the signal and shut it self down causing all sorts of weirdness.

Consider the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
from multiprocessing import Process, Manager
from time import sleep

def f(process_number, shared_array):
    try:
        print "starting thread: ", process_number
        while True:
            shared_array.append(process_number)
            sleep(3)
    except KeyboardInterrupt:
        print "Keyboard interrupt in process: ", process_number
    finally:
        print "cleaning up thread", process_number

if __name__ == '__main__':

    processes = []

    manager = Manager()
    shared_array = manager.list()

    for i in xrange(4):
        p = Process(target=f, args=(i, shared_array))
        p.start()
        processes.append(p)

    try:
        for process in processes:
            process.join()
    except KeyboardInterrupt:
        print "Keyboard interrupt in main"

    for item in shared_array:
        # raises "socket.error: [Errno 2] No such file or directory"
        print item

Try running that and interrupting it was a Ctr-C, you will get a weird error:

You will get a socket.error: [Errno 2] No such file or directory when trying to access the shared_array. And thats because the Manager process has been interrupted.

There is a solution!

Axiom Two: Explicitly use multiprocessing.manangers.SyncManager to share state

and use the signals library to have the SyncManager ignore the interrupt signal (SIG_INT)

Consider the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
from multiprocessing import Process
from multiprocessing.managers import SyncManager
import signal
from time import sleep

# initializer for SyncManager
def mgr_init():
    signal.signal(signal.SIGINT, signal.SIG_IGN)
    print 'initialized manager'

def f(process_number, shared_array):
    try:
        print "starting thread: ", process_number
        while True:
            shared_array.append(process_number)
            sleep(3)
    except KeyboardInterrupt:
        print "Keyboard interrupt in process: ", process_number
    finally:
        print "cleaning up thread", process_number

if __name__ == '__main__':

    processes = []

  # now using SyncManager vs a Manager
    manager = SyncManager()
    # explicitly starting the manager, and telling it to ignore the interrupt signal
    manager.start(mgr_init)
    try:
        shared_array = manager.list()

        for i in xrange(4):
            p = Process(target=f, args=(i, shared_array))
            p.start()
            processes.append(p)

        try:
            for process in processes:
                process.join()
        except KeyboardInterrupt:
            print "Keyboard interrupt in main"

        for item in shared_array:
            # we still have access to it!  Yay!
            print item
    finally:
      # to be safe -- explicitly shutting down the manager
        manager.shutdown()

Main take aways here are:

  • Explicitly using and starting a SyncManager (instead of Manager)
  • on its initialization having it ignore the interrupt

I will do a future post on gracefully shutting down child threads (once I figure that out ;–)

Thanks to @armsteady, who showed me the like on StackOverflow (link)

Comments