Tuesday, September 6, 2016

Solving Multi-Core Python

tl;dr I proposed a solution for Python's weak multi-core story (note I didn't say "support"), but didn't have time to follow through.  I still have hope, both for the project as a whole and for several standalone parts.

[Update 8/2018] I picked this back up after a break and am aiming for Python 3.8.  See this project and PEP 554.  <fingers_crossed/>

Contents:
  Backstory
  The Proposal
  The Outcome
  The Details


Backstory

For the longest time I've heard the argument that multi-processing is the Pythonic way to get multi-core.  Threads are an anti-pattern, some say.  They are a thing only because Microsoft made them a thing, some say.  I've heard anecdotal evidence from people that actually do
parallel (not just concurrent) computing (e.g. scientific, render farms) that they favor multi-processing approaches, especially because they typically go multi-host anyway.

You could also argue that in practice nearly all concurrent programming is IO-bound and that asyncio solves that for us.  Recent-ish discussions (e.g. PEP 492) lead me to believe that it's not nearly that simple.

Ultimately the merits of multi-processing and asyncio over multi-core threading, regardless of whether or not a valid or sufficient argument, do not mitigate the popular and pervading *sentiment* that Python is weak when it comes to leveraging multi-core and handling concurrency (or more accurately computational parallelism).  And perception ""is 9/10th of the law"" (or arguably higher).

Folks looking for a solution are going to search for one that matches the model in their brain.  Much like with organic molecules, if the conceptual bind points don't line up they aren't going to connect with what Python is offering.  The power of Python is that it maps well onto our brains.  Though concurrency/parallelism isn't very suited to our brains, it is one key place where Python doesn't do a good job of matching conceptual expectations at large.

The Proposal

In short, Python's multi-core story is murky at best.  Not only can we be more clear on the matter, we can improve Python's support.  The result of any effort must make multi-core (and concurrency/parallelism) support in Python obvious, unmistakable, and undeniable (and keep it Pythonic).

Early in 2015 I'd reached my limits with all the criticisms and misunderstandings.  So in the spirit of open source I resolved to do something about it.  Since this was not an area of expertise for me I did a lot of reading and reached out to experts I know (thanks to Guido, Nick, Sarah, Graham, and others).  In a few months I felt like I had a good enough understanding and a good solution.

In June of 2015 I introduced my solution on the python-ideas mailing list.  The gist is to use CPython's existing subinterpreters to isolate GIL-free execution threads with a CSP front end.  My hope was to finish the first stage of work in time for Python 3.6 (i.e. right about now).  The reception was generally positive.  There was even discussion on reddit.  I was encouraged.

The Outcome

Going in I knew it would be a challenging project.  However, the solution I proposed was tractable in the desired time frame, building on a lot of existing parts and decomposing into manageable stages.  The blockers were well understood, mainly involving subinterpreter bugs and PEP 432.  I also received several solid offers for help.  In October I even went to PyCon UK to coordinate some of the efforts.

At the same time it became clear that my life was getting too busy to make much progress.  In early 2016 I decided to table the project.  I wasn't giving up yet and hoped to get back to it.  Furthermore, there were also many parts of the project that stand on there own as useful features.  That's about where things are at right now.

The Details

This is where I try to summarize my proposal and relevant information.

Summary

  • expose subinterpreters in Python (a low-level stdlib module)
  • support passing objects between subinterpreters
  • add subinterpreter serial-execution mode
  • add a high-level CSP module to the stdlib

Phases

  1. resolve blockers
  2. add "interpreters" module
  3. minimal multi-core solution
    • subinterpreter "serial execution mode" (no GIL )
    • channels supporting immutable objects
    • no extension modules
  4. csp module
  5. expanded support
    • support more types in channels
    • performance optimization
    • extension module support (PEP 489 compliant only)

Requirements

  • "make multi-core support in Python obvious, unmistakable, and undeniable (and keep it Pythonic)"
  • no significant impact on single-threaded performance
  • maintain backward compatibility (C-API, etc.)
  • (pseudo-)compatibility with multiprocessing/threading/concurrent.futures APIs
  • a multi-core concurrency model/approach that fits our brains
  • Python APIs
  • supportable on other Python implementations

Blockers

Standalone Improvements

  • faster/cleaner interpreter startup?
  • better multiple-interpreters-per-process support
    • named subinterpreters
    • interpreters module (a la threading)
      • _interpreters module (a la _threading)
    • more efficient sharing between interpreters (e.g. builtins)
    • faster/cleaner subinterpreter startup
      • share some modules
      • leverage object sharing
    • refactor C-API to take interpreter arg
    • leaner subinterpreters?
  • refcounts in own memory page
  • factor out pickle-independent parts of multiprocessing
  • better object immutability
    • truer immutability?
    • immutable mode?
    • frozen objects
    • issue #24991: Define instance mutability explicitly on type objects
    • several PEPs
  • isolated object graphs
    • "isolated" object
    • memory model (all in same page)
    • related to RDM project

Specific Additions


  • channels (a la queue)
    • object sharing between interpreters
      • "immutable objects: int, float, str, tuple, bool, frozenset, complex, bytes, None
      •   - containers (tuple, frozenset) must hold only immutable objects"
      • types that implement __shared__
      • "frozen" objects
      • read-only views
      • "owned" objects (transfer ownership)
    • C channels
    • in own module?
  • PEP-489 slot for subinterpreter support?
  • subinterpreter "serial execution mode"
    • add mode management
    • start each in own thread
    • disallow threading
    • disallow forking
    • eliminate GIL within each subinterpreter
  • csp module
    • inspired by python-csp
    • shared-nothing "thread" concurrency model
    • uses subinterpreters in serial execution mode by default
  • object ownership
  • "Local Interpreter Lock"

Specific Changes

  • drop GIL between interpreters?

Python Alternatives

  • threading
  • multiprocessing
  • asyncio
  • STM (Armin Rigo, PyPy)
  • pyparallel (Trent Nelson)
  • dask
  • gilectomy (Larry Hastings, CPython)
  • otherwise remove the GIL
  • better concurrency primitives for threaded programming
  • add multi-core support to the asyncio event loop
  • better documentation
  • Jython
  • IronPython
  • other Python implementations
  • fibers
  • do nothing

17 comments:

  1. This is great, will it be possible to have subinterpreters with different python binaries (e.g. [cpython <-> [jython] ] ) ?

    ReplyDelete

  2. Hello, I have gone through your post Its really awesome.Thats a great article. I am also want to share about python online course and advanced python training. thank you

    ReplyDelete
  3. Nice post. I was checking constantly this blog and I’m impressed! Extremely useful information specially the last part 🙂 I care for such info much. I was looking for this particular information for a long time. Thank you and best of luck.

    Java Training in Chennai

    Java Training in Bangalore

    Java Training in Hyderabad

    Java Training
    Java Training in Coimbatore


    ReplyDelete
  4. Very useful information, the post shared was very nice.
    python Online Training

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. This blog covers some most asked topics in the Azure Data Engineer Interview Questions with fully explained answers and details dp203

    ReplyDelete
  9. Microsoft recently updated a certification named Azure Data Engineer Associate. To get this tag you need to clear one examination named: Exam DP-203: Data Engineering on Microsoft Azure.

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. Very information article. For more information on cloud and cloud computingvisit our website.

    ReplyDelete
  12. Unlock seamless cloud migration with Supportfly, your trusted cloud migration service provider. Our expert team ensures a smooth transition, minimizing downtime and maximizing efficiency. Elevate your business to new heights with our tailored solutions. Embrace the future with Supportfly – Your Cloud Migration Partner.

    ReplyDelete
  13. Shopware server management is a fully hosted e-commerce platform provided by Cloud Clusters. It allows enterprises to quickly and easily start and manage their online stores without technical expertise or server management skills.

    ReplyDelete