tushman.io

The musings of an insecure technologist

PyCon Proposal - Pragmatic Concurrency

Description

Hmmmm how can I make this faster? I have idea, I’ll just run it in parallel.

Luckily I am working with Python, and we have PEP20:

There should be one— and preferably only one —obvious way to do it.

So what is the obvious way to do it:

There are 5 different popular packages to do this: multiprocessing, subprocess, threading, gevent

:FacePalm:

This talk will cover the main concurrency paradigms, show you the pros and cons of each and give you a framework for picking the right solution for your project

Objectives

Attendees will learn the main multiprocessing options in both python 2.7 and python 3. Will leave with a framework for determining which approach is best for them

Detailed Abstract

Concurrency is hard. As a lay-developer there is a lot of ramping up to figure out how to solve what would seem like simple problems:

“I want to check the status of 1000 urls?”

“how can I run my test suite in parallel?”

“I have millions of jobs on a queue — what is the best way to spawn workers to process them?”

With Python you have many options, each one does a certain thing well. Here we will explain the tools in our toolbelt so you can pick the right tool the problem you are trying to solve.

threading: interface for threads, mutexs and queues

multiprocessing: is similar to threading but offers local and remote concurrency (with some gotchas)

subprocessing: Allows you to spawn new processes with minimal memory sharing support. But great for a lot of things

gevent: a coroutine-based Python networking library that uses greenlets

Outline

  1. Background — This is hard
    1. Threads
    2. Processes
    3. Pipes
    4. GIL
  2. Subprocesses
    1. How to use
    2. Joins
    3. Pipes
    4. Good use cases
  3. Multiprocessing
    1. How to use
    2. Sharing Memory (SyncManager)
    3. Handing Interupts
    4. Good use cases
  4. Gevent
    1. How to use
    2. Monkey Patching
    3. Good Use Cases
  5. Threading
    1. How to use
    2. Locks, Conditions, Timers
    3. Good Use Cases
  6. Summary
    1. “Do not cross the streams”
    2. Decision Framework
    3. What about tulip

Additional Notes:

  • My parallelized version of lettuce is open sourced here
  • I have other open-source libraries, can find them here
  • This is my first time speaking at PyCon. I have spoken at Boston Python. My slides for that talk are here
  • I sometimes write about Python. My blog is here

PyCon Proposal - Pragmatic Behavior Driven Development

Pragmatic Behavior Driven Development

Description

Love your test suite again (or for the first time).

Have you ever met a developer who loves their test suite for their web app. There is subtle air of confidence surrounding them. They stand a bit taller; walk with a bit of a swagger.

This talk will put you on that path, by showing an approach of Behavior Driven Development using lettuce and selenium

Detailed Abstract

Behavior Driven Development (BDD) is a development process based on Test-Driven Development – but makes a significant modification. With TDD – the main goal was to achieve test-coverage (what percentage of your code is covered by tests). With BDD – the driving question – “What percentage of my user stories are covered?” The main test-unit is the user story.

1
2
3
4
5
6
7
8
9
10
11
Story: Returns go to stock  

In order to keep track of stock 
As a store owner 
I want to add items back to stock when they're returned  

Scenario 1: Refunded items should be returned to stock 
Given a customer previously bought a black sweater from me 
And I currently have three black sweaters left in stock 
When he returns the sweater for a refund 
Then I should have four black sweaters in stock

With an application that leverages BDD, will have a set of feature files, and a feature file will have a collection of stories written is a specific format Given, When, Then.

Python has some tooling that helps turn this format into a fully automated test framework. We use lettuce to process feature files, and selenium to drive the web browser.

Outline:

Pragmatic Behavior Driven Development

Description

Love your test suite again (or for the first time).

Have you ever met a developer who loves their test suite for their web app? There is subtle air of confidence surrounding them. They stand a bit taller; walk with a bit of a swagger.

This talk will put you on that path by showing an approach of Behavior Driven Development using lettuce and selenium.

Detailed Abstract

Behavior Driven Development (BDD) is a development process based on Test-Driven Development – but makes a significant modification. With TDD – the main goal was to achieve test coverage (what percentage of your code is covered by tests). With BDD the driving question is “What percentage of my user stories are covered?” The main test unit is the user story.

1
2
3
4
5
6
7
8
9
10
11
Story: Returns go to stock  

In order to keep track of stock 
As a store owner 
I want to add items back to stock when they're returned  

Scenario 1: Refunded items should be returned to stock 
Given a customer previously bought a black sweater from me 
And I currently have three black sweaters left in stock 
When he returns the sweater for a refund 
Then I should have four black sweaters in stock

With an application that leverages BDD, we will have a set of feature files, and a feature file will have a collection of stories written is a specific format: Given, When, Then.

Python has some tooling that helps turn this format into a fully automated test framework. We use lettuce to process feature files and selenium to drive the web browser.

Outline:

  1. Intro (5 mins)
    1. Who am I?
    2. 20 years of development and testing, and what I have seen over time
    3. Explain why BDD is an important evolution
  2. Writing your first Test (5 mins)
    1. Explain the Gherkin Format (Given, When, Then)
    2. Write your first test – watch it fail
  3. Given / When / Then (15 mins)
    1. Given: Using Factories to set up your data
    2. When: Trigger Events
    3. Then: Writing “Deterministic” assertions
  4. Setting up your testing environment (10 mins)
    1. The terrain.py file
    2. Work around javascript timing issues with a selenium adaptor
  5. Other tools to flesh out your test suite (5 mins)
    1. Coverage
    2. Flake8
    3. Travis (or CI)
    4. Unit Test Frameworks (such as nose)

Additional Notes:

  • My parallelized version of lettuce is open sourced here
  • I have other open-source libraries, can find them here
  • This is my first time speaking at PyCon. I have spoken at Boston Python. My slides for that talk are here
  • I sometimes write about Python. My blog is here

Describing Descriptors Descriptively

Is it ironic that the documentation for descriptors is not very descriptive

Descriptors are one of my favorite Python features — but it took me too long to discover them. The documentation and tutorials that I found were too complex for me. So I would like to offer a different approach, a code first approach

Agenda

  • Definition
  • A Problem that Descriptors Can Solve
  • CODE!! Solution to the Problem
  • Reflection on the Code, and explanation on how we used Descriptors

Definition

In general, a descriptor is an object attribute with “binding behavior”, one whose attribute access has been overridden by methods in the descriptor protocol. Those methods are __get__(), __set__(), and __delete__(). If any of those methods are defined for an object, it is said to be a descriptor.

Read that and stash it in your brain for a few minutes. By the end of this article you’ll grok this

A Problem that Descriptors Can Solve

Imagine that you need to consume a 3rd party API that returns json documents. Often solutions to this problem look like this …

1
2
3
response = requests.get(USER_API_URI, id=111)
print(response.to_json()[0]['results'][0]['name'])
print('barf')

I dislike this solution, Its concise, but it breaks separation-of-concerns. The code consuming the API should not be concerned about the exact path and location of the data element in the json document.

Proposed Solution with Descriptors:

Our goal is to write code like this:

1
2
3
user = UserAPI.get_by_id(111)
print(user.name)
print(user.street_address)

CODE!! Solution to the Problem

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Hey guys this is a descriptor -- Woot!!        
class Extractor(object):

    def __init__(self, *path)
        self.path = path

    def __get__(self, instance, owner):
        return _extract(instance.json_blob, *self.path)

    def _extract(doc, *keys):
        """
        digs into an dict or lists, if anything along the way is None, then simply return None
        """
        end_of_chain = your_dict
        for key in keys:
            if isinstance(end_of_chain, dict) and key in end_of_chain:
                end_of_chain = end_of_chain[key]
            elif isinstance(end_of_chain, (list, tuple)) and isinstance(key, int):
                end_of_chain = end_of_chain[key]
            else:
                return None

        return end_of_chain

Now look how elegant our code can look

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class JSONResponse(object):

    def __init__(self, json_blob):
        self.json_blob = json_blob


class User(JSONRepsonse):

    name = Extractor('result','username')
    street_address = Extractor('result','address', 'street')
    status = Extractor('result','status')


class UserAPI(object)

    @class_method
    def get_by_id(cls, id):
        response = requests.get(USER_GET_ENDPOINT, id=id)
        return User(response.json())

And now are code is warm and fuzzy:

1
2
3
user = UserAPI.get_by_id(111)
print(user.name)
print(user.street_address)

Reflection on the Code

The real magic of descriptors happens with the signatures of __get__(), __set__(), and __delete__():

  • object.__get__(self, instance, owner)
  • object.__set__(self, instance, value)
  • object.__delete__(self, instance)

Each of these signatures contains a reference to instance, which is the instance of the owner’s class. So in our example:

instance will be an instance of the User class owner will be the User class self is the instance of the Descriptor, which in our case holds the path attribute.

Let’s take a look at our example where we made a descriptor Extractor. – user = UserAPI.get_by_id(111)

Here we get an instance of a User object, which has the json_blob stored on it from the GET request

  • print(user.name)

Now we call name on that object, which we defined: name = Extractor('result','username'). At this point when we call name it is going to use the Extractor descriptor to extract the value from the json_blob.

The concern of extracting data from a json blob is nicely contained in our Descriptor I think this is one of many great ways to use descriptors to DRY up your code.

Hope this is helpful!

Additional Resources

Explaining Bouncer and Method_decorators

Thank you Boston Python for the opportunity to present at “July Presentation Night: What I Built at Work”.

I would like to elaborate on one of the questions asked during the Q&A after my presentation. It was a question about bouncer.

When I shared the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from bouncer import authorization_method
from bouncer.constants import READ, EDIT, MANAGE, ALL

@authorization_method
def authorize(user, they):

    if user.is_admin:
        they.can(MANAGE, ALL)
    else:
        they.can(READ, ALL)
        they.cannot(READ, 'TopSecretDocs')

        def if_author(article):
            return article.author == user

        they.can(EDIT, 'Article', if_author)

Someone asked: what is user and they? It was a really good question and deserved a better explanation.

The first thing to consider is @authorization_method This is a method decorator,a really nice Python feature — particularly when you are writing framework code.

A method decorator is a method that takes in a method as an argument and returns a mutated method. (pause to re-read that)

Let’s take a look at this specific implementation:

1
2
3
4
5
6
7
8
9
10
11
12
# in bouncer/__init__.py

def get_authorization_method():
    return _authorization_method

_authorization_method = None

def authorization_method(original_method):
    """The method that will be injected into the authorization target to perform authorization"""
    global _authorization_method
    _authorization_method = original_method
    return original_method

So in the instance of our authorization_method, we receive a function and store it in the global variable _authorization_method. We can make of use of this function later in application’s execution.

For example. In my talk I showed the can method:

1
2
3
4
5
6
7
8
9
10
jonathan = User(name='jonathan',admin=False)
marc = User(name='marc',admin=False)

article = Article(author=jonathan)

print can(jonathan, EDIT, article)   # True
print can(marc, EDIT, article)       # False

# Can Marc view articles in general?
print can(marc, VIEW, Article)       # True

can is defined as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
def can(user, action, subject):
    """Checks if a given user has the ability to perform the action on a subject

    :param user: A user object
    :param action: an action string, typically 'read', 'edit', 'manage'.  Use bouncer.constants for readability
    :param subject: the resource in question.  Either a Class or an instance of a class.  Pass the class if you
                    want to know if the user has general access to perform the action on that type of object.  Or
                    pass a specific object, if you want to know if the user has the ability to that specific instance

    :returns: Boolean
    """
    ability = Ability(user, get_authorization_method())
    return ability.can(action, subject)

When “can” is called, it builds an Ability using the logic in method we decorated (stored) with @authorization_method

Having said that, let me explain what they and they.can is.

they is a RuleList

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# in bouncer/models.py

class RuleList(list):
    def append(self, *item_description_or_rule, **kwargs):
        # Will check it a Rule or a description of a rule
        # construct a rule if necessary then append
        if len(item_description_or_rule) == 1 and isinstance(item_description_or_rule[0], Rule):
            item = item_description_or_rule[0]
            super(RuleList, self).append(item)
        else:
            # try to construct a rule
            item = Rule(True, *item_description_or_rule, **kwargs)
            super(RuleList, self).append(item)

    # alias append
    # so you can do things like this:
    #     @authorization_method
    # def authorize(user, they):
    #
    #     if user.is_admin:
    #         # self.can_manage(ALL)
    #         they.can(MANAGE, ALL)
    #     else:
    #         they.can(READ, ALL)
    #
    #         def if_author(article):
    #             return article.author == user
    #
    #         they.can(EDIT, Article, if_author)
    can = append

RuleList is a python list with two tweaks:

  1. override append to handle inputing of Rules or something I can construct into a rule
  2. alias append can = append which allows us to have the desired syntax they.can(READ, ALL)

I am pretty pleased with this; I really like the they.can(READ, ALL) syntax. Some may argue that it is not pythonic since I could be more explicit — but in this case I think ease of readability trumps style.

But if you don’t agree, no worries you can use the following equivalent syntax:

1
2
3
4
5
6
7
8
9
10
@authorization_method
def authorize(user, abilities):

    if user.is_admin:
        abilities.append(MANAGE, ALL)
    else:
        abilities.append(READ, ALL)

        # See I am using a string here
        abilities.append(EDIT, 'Article', author=user)

Both work!

Hopefully this clarifies things. Feel free to ping me with additional questions.


Addendum

There has been a fair bit of discussion in my office about the grammatically correctness of they. Uncannily, xkcd comes to the rescue once again:

image

Boston Python | Monarch and Bouncer Notes

The Boston Python group was nice enough to have me speak about things I built at work

I spoke about:

This was my presentation:

Module Properties | the Proxy Pattern

Have you ever tried to add a @property to a module?

I was working on a authentication_manager. And I wanted to usage of the module to be like so:

1
2
3
4
5
6
from login_manager import current_user

def do_sometime_with_a_user():
    print(current_user.name)
    current_user.speak()
    current_user.say_hi('lauren')

current_user is going to return the current User if it exists. In other words it needs to call a function.

This would be pretty trivial if I would be okay with using parens all over the place

1
2
3
4
5
6
print(current_user().name)
# gross ----------^^
current_user().speak()
# gross ----^^
current_user().say_hi('lauren')
# gross ----^^

But that makes me want to barf. Luckily python gives us the tools to clean this up. We are going to use the Proxy pattern to solve it. At its simplest we can do something like so:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class Proxy(object):

    def __init__(self, local):
        self.local = local

    def __getattr__(self, name):
        return getattr(self.local(), name)

# aliasing for better syntax        
module_property = Proxy

class User(object):
    """Contrived User Object"""

    def __init__(self, **kwargs):
        self.name = kwargs.get('name', 'billy')

    def speak(self):
        print("Well hello there!")

    def say_hi(self, to_whom):
        print("Hi there {}".format(to_whom))

@module_property
def current_user():
  return User()

With this we have come close to achieving our goal:

1
2
3
4
5
6
from login_manager import current_user, User

if __name__ == '__main__':
    print current_user.name
    current_user.speak()
    current_user.say_hi('lauren')

This simple Proxy class that we defined. Takes a function, stores in a local variable, and then when it is accessed it is executed with names and arguments passed through to it. __getattr__ is a pretty special feature of python.

The big gotcha with this is that current_user does not return a User object (like the built-in @property will return), it is going to return the Proxy object. So without a little bit of additional care you might run into issues.

The werkzeug team has developed a fully featured Proxy within the werkzerg project. If you are using werkzeug, you can find it: from werkzeug.local import LocalProxy

Its takes the proxy pattern further by overwriting all of the python object methods such as __eq__, __le__, __str__ and so on, to use the proxied object as the underlying target.

If you are not using werkzeug I have created a mini library where you can get the extracted proxy code. You can find it here: (http://github.com/jtushman/proxy_tools)

Or install it like so:

1
pip install proxy_tools

And use it like so:

1
2
3
4
5
6
7
8
9
10
# your_module/__init__.py
from proxy_tools import module_property

@module_property
def current_user():
  return User.find_by_id(request['user_id'])

# Then elsewhere
from your_module import current_user
print(current_user.name)

Now — I am sure there was a very good reason why the python-powers-that-be chose not add the @property syntax to modules. But for the time being I have found it useful and elegant.

Parallelize Your Lettuce Tests to Win Friends and Influence Others

tl;dr: I forked the lettuce package to use multiprocessing, tests run more then 4x faster on my MBP

I am a fan of Gabriel Falcão’s lettuce Behavior-Driven Development (BDD) tool. We have been using it on my team for 6+ months now. Recently our test suite completion time has crossed the 10 minute line, which had a bunch of negative effects, as you can imagine:

  • people writing less tests
  • people running the test suite less frequently
  • people spending more time watching a test suite run, then coding, …

We all are using relatively modern MBP with 4 cores, and we might as well make the most of them. Here is my fork of lettuce that allows you to take advantage of all of your cores:

https://github.com/jtushman/lettuce

I have made two main modifications (You will find the lion share of my modifications in this file):

  • I created a ParallelRunner (I have left the main runner alone), which kicks off processes to pull the scenarios off a queue
  • After each run I store the run times of each test in a .scenarios file, so in subsequent runs I can sort them longest to shortest

My test suite used to take 12 minutes, now its takes 2 minutes — REJOICE!

Usage

lettuce tests -p 4 -v 2

-p: stands for parallel. You can set it to how many processes you like, I find that the number of cores should be your default

-v is the same verbosity parameter, but I recommend setting it to 2 when using parallelization, otherwise the steps will interlace and not make much sense

in your terrain.py file, there are two new callbacks:

@before.batch and @after.batch

which you should use to set up and tear down each process. I use main to fire up flask, selenium and mongo. Also note that I set a port_number attribute on world which you can use set up processes specific servers. For example:

1
2
3
4
5
6
@before.batch
def batch_setup():
    settings.MONGO_DATABASE_NAME = 'testing__{}'.format(world.port_number)
    mongoengine.connect(settings.MONGO_DATABASE_NAME, host=settings.MONGO_HOST, port=settings.MONGO_PORT,
                        username=settings.MONGO_USERNAME, password=settings.MONGO_PASSWORD)
    clear_database()

Caveats

For this to work all of your tests need to be isolated, they can not depend on each other (which I think is best practice anyways). This means in your tests you should not use world at all Use scenario instead:

To do this, in your terrain file add the following:

1
2
3
4
5
6
7
8
class ScenarioState(object): pass

@before.each_scenario
def setup_senario(senario):
    world.scenario = ScenarioState()

def scenario():
    return world.scenario

And I use this all the time in my steps to refer to state from previous steps

1
2
3
4
5
6
7
8
@step(u'Given a user exists with one account')
def given_a_user_exists(step):
    scenario.current_user = UserFactory.create()

@step(u'And the user has a dog')
def user_has_a_dog(step):
    scenario.current_user.dog = DogFactory.create()

Hope you guys find this useful!

Python | Multiprocessing and Interrupts

tl;dr: If handling interrupts is important, use a SyncManager (not multiprocessing.Manager) to handle shared state

I just hit the learning curve pretty hard with python’s multiprocessing — but I came through it and wanted to share my learnings.

Preliminary Thoughts

The bulk of this post is going to be around using the multiprocess library, but a few preliminary thoughts:

Multiprocessing and Threading is hard (especially in python):

Its starts off all hunky-dory — but trust me, factor in time to hit the wall … hard.

pstree is your friend.

Stop what you are doing, and pick your favorite package manager and install pstree: (for me: brew install pstree)

Its super-useful for dealing with sub-processes, and see what is going on.

In particular pstree -s <string> which searches your branches containing processes that contain the string in the command line. So much better than ps

Output looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
➜  pstree -s python
-+= 00001 root /sbin/launchd
 \-+= 00221 jtushman /sbin/launchd
   \-+= 00446 jtushman /Applications/iTerm.app/Contents/MacOS/iTerm -psn_0_188462
     \-+= 07770 root login -fp jtushman
       \-+= 07771 jtushman -zsh
         \-+= 46662 jtushman python multi3.py
           |--- 46663 jtushman python multi3.py
           |--- 46664 jtushman python multi3.py
           |--- 46665 jtushman python multi3.py
           |--- 46666 jtushman python multi3.py
           \--- 46667 jtushman python multi3.py

Know your options

There are more then one paralyzation framework / paradigms out there for python. Make sure you pick the right one for you before you dive-in. To name a few:

Important: Unless you are a ninja — do not mix paradigms. For example if you are using the multiprocessing library — do not use threading.locals

The Main Story

Axiom One: All child processes get SIG-INT

Note: I will use SIG_INT, Keyboard Interrupt, and Ctr-C interchangeably

Consider the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from multiprocessing import Process, Manager
from time import sleep

def f(process_number):
    try:
        print "starting thread: ", process_number
        while True:
            print process_number
            sleep(3)
    except KeyboardInterrupt:
        print "Keyboard interrupt in process: ", process_number
    finally:
        print "cleaning up thread", process_number

if __name__ == '__main__':

    processes = []

    manager = Manager()

    for i in xrange(4):
        p = Process(target=f, args=(i,))
        p.start()
        processes.append(p)

    try:
        for process in processes:
            process.join()
    except KeyboardInterrupt:
        print "Keyboard interrupt in main"
    finally:
        print "Cleaning up Main"

The abbreviated output you get is as follows:

1
2
3
4
5
6
7
8
9
10
11
^C
Keyboard interrupt in process:  3
Keyboard interrupt in process:  0
Keyboard interrupt in process:  2
cleaning up thread 3
cleaning up thread 0
cleaning up thread 2
Keyboard interrupt in process:  1
cleaning up thread 1
Keyboard interrupt in main
Cleaning up Main

The main take aways are:

  • Keyboard interrupt gets send to each sub process and main execution
  • the order in which the run is non-determanistic

Axiom Two: Beware multiprocessing.Manager (time to share memory between processes)

If it is possible in your stack to rely on a database, such as redis for keeping track of shared state — I recommend it. But if you need a pure python solution read on:

multiprocessing.Manager bill themselves as:

Managers provide a way to create data which can be shared between different processes. A manager object controls a server process which manages shared objects. Other processes can access the shared objects by using proxies.

The key take away there is that the Manager actually kicks off a server process to manage state. Its like it is firing up your own little (not battle tested) private database. And if you Ctr-C your python process the manager will get the signal and shut it self down causing all sorts of weirdness.

Consider the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
from multiprocessing import Process, Manager
from time import sleep

def f(process_number, shared_array):
    try:
        print "starting thread: ", process_number
        while True:
            shared_array.append(process_number)
            sleep(3)
    except KeyboardInterrupt:
        print "Keyboard interrupt in process: ", process_number
    finally:
        print "cleaning up thread", process_number

if __name__ == '__main__':

    processes = []

    manager = Manager()
    shared_array = manager.list()

    for i in xrange(4):
        p = Process(target=f, args=(i, shared_array))
        p.start()
        processes.append(p)

    try:
        for process in processes:
            process.join()
    except KeyboardInterrupt:
        print "Keyboard interrupt in main"

    for item in shared_array:
        # raises "socket.error: [Errno 2] No such file or directory"
        print item

Try running that and interrupting it was a Ctr-C, you will get a weird error:

You will get a socket.error: [Errno 2] No such file or directory when trying to access the shared_array. And thats because the Manager process has been interrupted.

There is a solution!

Axiom Two: Explicitly use multiprocessing.manangers.SyncManager to share state

and use the signals library to have the SyncManager ignore the interrupt signal (SIG_INT)

Consider the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
from multiprocessing import Process
from multiprocessing.managers import SyncManager
import signal
from time import sleep

# initializer for SyncManager
def mgr_init():
    signal.signal(signal.SIGINT, signal.SIG_IGN)
    print 'initialized manager'

def f(process_number, shared_array):
    try:
        print "starting thread: ", process_number
        while True:
            shared_array.append(process_number)
            sleep(3)
    except KeyboardInterrupt:
        print "Keyboard interrupt in process: ", process_number
    finally:
        print "cleaning up thread", process_number

if __name__ == '__main__':

    processes = []

  # now using SyncManager vs a Manager
    manager = SyncManager()
    # explicitly starting the manager, and telling it to ignore the interrupt signal
    manager.start(mgr_init)
    try:
        shared_array = manager.list()

        for i in xrange(4):
            p = Process(target=f, args=(i, shared_array))
            p.start()
            processes.append(p)

        try:
            for process in processes:
                process.join()
        except KeyboardInterrupt:
            print "Keyboard interrupt in main"

        for item in shared_array:
            # we still have access to it!  Yay!
            print item
    finally:
      # to be safe -- explicitly shutting down the manager
        manager.shutdown()

Main take aways here are:

  • Explicitly using and starting a SyncManager (instead of Manager)
  • on its initialization having it ignore the interrupt

I will do a future post on gracefully shutting down child threads (once I figure that out ;–)

Thanks to @armsteady, who showed me the like on StackOverflow (link)

Dict Digger

In the age or SaaS, and working with 3rd part APIs developers often have to navigate a complex object (arrays of hashes of arrays or hashes) (I am looking at you Adwords API)

I wanted a nice way to avoid doing None checks and does this key exist over and over again.

So I made a (very) simple utility to help with it dict_digger

And works like this …

1
pip install dict_digger
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import dict_digger

h = {
    'a': {
         'b': 'tuna',
         'c': 'fish'
     },
     'b': {}
}

result = dict_digger.dig(h, 'a','b')
print result  # prints 'tuna'

result = dict_digger.dig(h, 'c','a')
print result # prints None
# Important!!  Does not through an error, just returns None

#but if you like
result = dict_digger.dig(h, 'c','a', fail=True)
# raises a KeyError

# also support complex objects so ...

complex = {
    'a': {
         ['tuna','fish']
     },
     'b': {}
}
result = dict_digger.dig(complex, 'a',0)
print result #prints tuna

Alternatively you do the following

1
2
3
4
try:
    result = h['c']['a']
except KeyError:
    result = None

Find it on github here

Shuffling Team Seating

I think it is good to shuffle the team around. Helps with cross-pollination, and keeps the team area neat. Here is the function that we use to randomize our team making sure that you do not sit next to someone you are already sitting next to.

Note: Only works with teams greater than four. Assign each space in your office an number, the run the following. The first person in the outputted array goes in space 1, and so on.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import random

def all_perms(elements):
    if len(elements) <=1:
        yield elements
    else:
        for perm in all_perms(elements[1:]):
            for i in range(len(elements)):
                #nb elements[0:1] works in both string and list contexts
                yield perm[:i] + elements[0:1] + perm[i:]


def find_position(key,lizt):
    return [i for i,x in enumerate(lizt) if x == key][0]

def new_neighbors(some_list):
    new_neighbor_list = some_list[:]
    list_size = len(some_list)
    for new_neighbor_list in all_perms(some_list):
        print new_neighbor_list
        too_many_neighbors = False
        for i,team_member in enumerate(new_neighbor_list):
            #find position in inital list
            position_in_original_list = find_position(team_member,some_list)
            original_neighbors = []
            original_neighbors.append(some_list[(position_in_original_list+1) % list_size])
            original_neighbors.append(some_list[(position_in_original_list-1) % list_size])

            new_neighbors = []
            new_neighbors.append(new_neighbor_list[(i+1) % list_size])
            new_neighbors.append(new_neighbor_list[(i-1) % list_size])

            delta = len(set(new_neighbors) - set(original_neighbors))
            #print "for {} comparing: {} with {} = {}".format(team_member,original_neighbors,new_neighbors,delta)

            if not delta == 2:
               too_many_neighbors = True
               break

        if too_many_neighbors == False:
            return new_neighbor_list
    else:
        print "No Matches"

    return []


# Usage      
team = ['JT','FS','MC','MA','FD']
new_seating = new_neighbors(team)
print new_seating
# >> ['MC', 'JT', 'MA', 'PS', 'FD']

To end with an quote to motivate:

Everyday I’m shufflin’ — LMFAO

(you can play that music as you are shuffling’ seats)