back to article Faster Python: Mark Shannon, author of newly endorsed plan, speaks to The Register

Python creator Guido van Rossum last week introduced a project to make CPython, the official implementation, five times faster in four years. Now Mark Shannon – one of the three initial members of the project – has opened up about the why and the how. Shannon, formerly a research engineer at Semmle, a code security analysis …

  1. Pascal Monett Silver badge

    Making Python faster

    If Python is based on C, which is the fastest programming language after pure Assembler, then how can you make it faster ?

    You can make any code slow if you write it badly, but there is no compiler optimization for a programmer who doesn't know how to code.

    1. PerlyKing Silver badge

      Re: Making Python faster

      CPython is written in C, but I wouldn't say Python is based on C. It's an imperative language, but designed to be Object Oriented.

      I imagine it's just like any other application - it can be fast or slow. Start by writing it to be correct, then work on improving performance.

      1. Anonymous Coward
        Anonymous Coward

        Re: Making Python faster

        It's a Super Language Object Written environment.

      2. Anonymous Coward
        Anonymous Coward

        Re: Making Python faster

        "... but designed to be Object Oriented."

        Exactly. Why is there even a comparison between C and Python? Trying to pimp Python over C doesn't make sense, it's like a performance comparison between a pencil and a paint brush. The comparison should be C++ and Python, at which point pull in Boost and redraw the comparison, although it wont' be looking great for Python.

        FWIW, if you already have your crunch routines written in ASM or C, then any language can be the glue code. C++ is just as valid as Python, and Python is just as valid a Haskell, and Haskell is just as.... it really doesn't matter. I know if I was doing serious ML, I would seriously try to avoid ALL scripting languages. I wouldn't care what you'd tell me that scripting language is written in... hard pass.

    2. Flocke Kroes Silver badge

      Re: Making Python faster

      In Python, every variable is an object which is mostly harmless for big complicated objects but really hurts for simple things like integers. Imagine a function has come up with the result 12345. C can put that in a register and return to the caller (and will do something stupid if the value being returned is too big to fit in the register). Python allocates some memory, stores a reference count of 1 at the start of that memory followed by a pointer to class int, then some number that represents the amount of memory needed to store the value of the int (python handles really big integers without making the programmer think hard) followed by the value 12345 and finally returns a pointer to the allocated memory. This process is so painful that python is already optimised by having instance for small integers already prepared at known locations so small integers can be returned by incrementing the reference count of the right one and returning a pointer to it.

      From here things go further down hill. C is a typed language so the caller knows that an integer was returned. It can copy this integer wherever it is required, use it in arithmetic or simply forget about it. In python the caller only knows that some subclass of Object was returned. Something "simple" like a+b gets expanded to type(a).__add__(b). Luckily __add__ is optimised and does not require a dictionary look up. int.__add__ has to check the type of b and if int.__add__ does not understand it the Python interpreter tries type(b).__radd__(a) instead. Simple copying is not that bad: add one to the reference count and store a pointer where required but forgetting about an object is not trivial. The interpreter subtracts one from the reference count and if the result is zero calls type(a).__del__ then deallocates the memory assigned to the object.

      Everything being a subclass of Object (almost always) makes creating new software in Python easier than C - at the cost of making the CPU do a huge amount work.

      1. GrumpenKraut Silver badge

        Re: Making Python faster

        > ...at the cost of making the CPU do a huge amount work.

        Doing quite low level stuff containing just mem reads/writes, integer ops and the occasional if, once in C, once in Python: Python slower be a factor of more than 70 by my measurement.

        Python certainly has its place, but raw computations is not the place.

        1. Tim Parker

          Re: Making Python faster

          Well exactly, it's about using the right tool for the job. If you're just checking numbers, then Python may not be that tool - if you're quickly knocking up a cross-platform network application or GUI, then C may not be that one either.

        2. Anonymous Coward
          Anonymous Coward

          Re: Making Python faster

          I was going to add that if you're looking for speed in Python then chances are you chose the wrong language. Python is a glue language primarily use to shuffle between libraries written in C to perform the grunt work. This it does well.

          Like any language you will have compromises - it's about choosing the ones you're happy to live with. Python is like a multi-tool - it can do many things ok but isn't really the best at anything.

          Personally I like C# for my go-to language, R for analytics (data.table is awesome), and Python as more palatable than Perl for data munging. Perl is great but I prefer Python.

      2. Version 1.0 Silver badge
        Happy

        Re: Making Python faster

        BASICally (sic) Python makes it easy for programmers with a PhD in an area other than programming (e.g. Life Sciences) to write code because it does all the work that a programmer has to think about and get right when they create an algorithm. So it makes it "easy to write code" because Python does all the hard work - this is a big advantage for its users but I wouldn't write Python code to fly a drone around on Mars ... I wouldn't write it in BASIC, FORTRAN or COBOL either.

    3. Charlie Clark Silver badge

      Re: Making Python faster

      As Shannon said: the Python code is likely to be trivial and therefore reliable. For many things like IO it will be making the necessary calls to C libraries but you won't have to reinvent the wheel for handling encodings, line endings, etc. Hence, the total time spent to write and run the program can easily be orders of magniture greater in C than in Python.

      1. Munchausen's proxy
        Flame

        Re: Making Python faster

        -- " the Python code is likely to be trivial and therefore reliable."

        Haha. Until you import a package from the community.

        https://xkcd.com/1987/

        1. amanfromMars 1 Silver badge

          Re: Making Python faster with a Communications PACT

          Haha. Until you import a package from the community.

          https://xkcd.com/1987/ .... Munchausen's proxy

          Quite so, Munchausen's proxy. It is an interesting defence shield and proven almighty effective to date in matters that attack to lay waste and try permanently degrade fundamental systems.

          You'll have to imagine what it does with stealth in future operations, but you can be sure they are not inconsequential and foreboding even whenever so often thought to be best forbidden and/or at times better forgotten, because of what can so easily be simply done to extremely complicated machines to deliver radical change in staged rushes/tsunami waves of free to air novel informative instruction for Persistent ACTive Cyber Treats.

          What are you looking for to do with all the new virtual tools that are at our disposal/beck and call? Anything new and radically different and exciting and troubling?

        2. Charlie Clark Silver badge

          Re: Making Python faster

          Note, I was talking particularly about the standard library but there are also many excellent Python libraries out there.

          But you can end up with the following dilemma: focus on your own code and rely on external libraries wherever possible. This leaves you open to problems associated with code you don't control, whether it's simply poor design decisions, performance or exploits. The other choice, go the NIH (not invented here) route gives you complete control and all the responsibility. Now, contrary to their own opinion, many programmers are good programmers only in certain areas and the list of simple but fatal bugs that C is prone to is long and it would be a brave programmer who said they never made one. And it may take a long time to find out. While I'm not a fan of release early, release often, at the end of the day only running code matters.

          Using external libraries does present a risk, but also a risk that can be mitigated against because it is known. It might be added that Python presents perhaps a lower risk here than many other languages because a great deal of work (IO, networking, etc.) is going to end up being done by the standard library, which has been vetted a lot, or similar libraries such as LibXML2. This could be coincidental but the persistently low number of CVEs for Pyhon code suggests otherwise. In some situations you can, of course, have a library vetted for issues and I think Google has started doing this for some popular code.

          Python's promiscuity makes it relatively easy to use high performance libraries for hotspot code, letting you pick and choose which code to optimise and then write your own C, C++, Fortran (very popular for some scientists) or even language-du-jour Rust and provide a Python API for it relatively easily.

          In my own experience I've benchmarked XML implementations and gone with the standard library for reading and lxml (and therefore LibXML2) for writing, used and then dropped an external calendar library; and recently benchmarked JSON implementations because JSON can be very memory intensive.

          And there are still a great deal of tasks for which Python isn't suited: Google decided to go with Go for very large scale systems work, Dmitri Fontaine switched to LISP for pgloader largely because of Python's known problems with multiprocessor systems. That was before asyncio and concurrent.futures so it might be interesting to see some of his earlier work done using some of the newer techniques.

        3. bazza Silver badge

          Re: Making Python faster

          -- " the Python code is likely to be trivial and therefore reliable."

          Or some end user takes it into their head to modify the python code...

    4. thames

      Re: Making Python faster

      If you're referring to the discussion near the end of the article, better algorithms will many times beat better compilers. Like you said "there is no compiler optimization for a programmer who doesn't know how to code". If the programming language makes it easier and faster to write your program then you can spend more time on the algorithm and get it right, resulting in a faster program.

      There's no guarantee that your particular problem is the sort which better algorithms can be applied to, but there are ones which are. In the latter case, the "better programmer" is the one who understands algorithmic principles and can pick the best one rather than the one who has the greatest encyclopedic knowledge of a particular compiler's performance quirks.

  2. Anonymous Coward
    Anonymous Coward

    A Statement of the Obvious.............

    Quote: "...The community did not enjoy the backward compatibility issues introduced by Python 3..."

    *

    No sh*t Sherlock!!!!

    1. Anonymous Coward
      Anonymous Coward

      Re: A Statement of the Obvious.............

      Probably the biggest reason for the change of heart was the offer of 5 years guaranteed employment.

  3. Draco
    Windows

    Nary a natter of Nuitka?

    In all the recent articles about projects focussed on speeding up Python, I was surprised none mentioned Nuitka - a Python compiler. Granted, it's a small and obscure project (probably) that I stumbled across a few years ago, which (when I last checked) is still plodding along.

    1. Anonymous Coward
      Anonymous Coward

      Re: Nary a natter of Nuitka?

      >it's a small and obscure project

      Answered your own question there guv.

    2. thames
      Boffin

      Re: Nary a natter of Nuitka?

      Most of the "let's just add a JIT / static compiler to Python" projects end up with something that is significantly faster than CPython in certain types of application, but are on average somewhat slower than CPython in all others. That's why there are so many attempts to replace CPython which look promising on simple benchmarks to start with but which run out of steam once they get enough implemented for a wide range of real world programs.

      The low level changes to data structures and other internals they are talking about in this project are the sort of thing needed to actually get major performance increases. These sorts of changes may also make it possible to get better performance from static and JIT compilers, not just a faster interpreter.

      In other words, the real work isn't just gluing a JIT or C code translator onto the side. That has been tried many times and a JIT usually results in slower performance, not faster, as well as devouring huge amounts of RAM.

      Pypy (Python written in Python) did all of this already and got better performance. Their description of how Pypy works is that they wrote a method of creating interpreters for any language, for which you get a specializing JIT compiler for no additional effort as a side effect. They also improved the internal data structures and memory layouts. However, they did all of this at the expense of being incompatible with most of the existing Python libraries which are written in C, which means a large proportion of the total number of the popular libraries out there.

      The project being discussed here are apparently going to try to accomplish many of the things which Pypy did, but to do so in a way which doesn't break compatibility with existing C libraries.

      One of the things which has made Python so successful is that unlike many languages they don't insist that all libraries be written in their own language (e.g. Python). VM based JITs are particularly prone to this pitfall.

      Instead CPython is written in a way which makes it straightforward to use libraries written in C with much less call overhead than for many other languages. So if there is a demand for library to do something, it has been common to simply find an existing one written in C, write a Python binding for it, and you now have a Python library with minimal effort. Since C has become the lingua franca of the programming world, there is now a huge selection of very fast libraries available for Python with minimal effort. This is also why many experienced Python programmers don't see an issue with writing bits of an application in C when that would be of benefit.

      However, anything Python related which breaks compatibility with a large proportion of those C libraries tends to be doomed to failure in the market, which is why Pypy hasn't been a huge success.

      The people working on the project discussed in this article are well aware of this as they have been around the block multiple times on this subject, so they are no doubt well aware of what the issues are and will try to avoid them.

    3. Mowserx

      Re: Nary a natter of Nuitka?

      I have been using Nuitka to compile Python scripts for Windows and macOS. I hope they keep maintaining it.

  4. amanfromMars 1 Silver badge

    What to do with the language? Does it show lead with more than empty promises available?

    the idea is that we shouldn't privilege a workload,

    Any support to a workload certainly gifts and deserves every privilege available, and lavishly supplied to be at its best fully enjoyed as vital and virile. ...... an addictive attraction to add sparkle to life with death to boredom and program slavery having a huge following and following with leading novel developments.

    Sterling Stirling Services for Specialist Virtual Office Space Operations would be one of its ACTive IT Guises in any Earthly Iterations. In One of those New Fangled Virtually Entangling Operations Exercising Notions of Greater Command with Beta AI Controls ...... which be an energetic energising part thought vital and thoroughly enjoyed and deployed in COSMIC ProgramMING, is one of most enjoyable of ones. ....... so good in fact, that it is beautifully difficult to even imagine contemplating leaving for anything better anywhere else.

    Real Spooky Type Stuff it helps to know crazy is a constant bright friend and not phantom maniacal foe, to realise Future Presentations which survive and prosper surrounded by madness and mayhem.

    You don't get those from any nickel/dime store. They're hard to find and extremely expensive.

  5. ST Silver badge
    Devil

    if you write your program in Python ...

    [ I wouldn't, but I digress ...]

    [ ... ] if you write your program in Python and you write one in C, you end up with probably a faster Python program because it's trivial to write it [ ... ]

    I love the use of the word probably in this context.

    Also: I've always found that writing something quick-and-dirty in a scripting interpreted language always yields the best performance possible. [/s]

    Here's how likely it is that a program written in Python will be better performing than its C equivalent: about as likely as seeing twelve purple monkeys fly out of my ass singing a cappella Mozart's Queen of the Night aria.

    1. chuBb. Silver badge

      Re: if you write your program in Python ...

      A != B is the optimised for speed version of your comment

    2. werdsmith Silver badge

      Re: if you write your program in Python ...

      We'll have to arrange getting you fitted with a magic flute.

    3. Richard 12 Silver badge

      It's talking about total time spent

      If it's an application that will likely be run a small number of times, then the fastest one is the one that took the least amount of time to write and test.

      There's a heck of a lot of code that's only going to be run "in anger" fewer than ten times, or even only once.

      There's even more situations where the amount of time "running" is basically zero compared to the rest of the task, like build pipelines and data munging for analysis tools.

      In those the really important thing is being able to prove it correct, because the program itself is responsible for less than 0.5% of the runtime. You want to be really certain you're passing the right data formatted in the right way because a mistake there is going to cost you days of CPU time.

      1. tetsuoii

        Re: It's talking about total time spent

        Learning C is time well spent. Learning python means investing a lot of time in a language that can never perform and doesn't scale. The perceived 'savings' quickly vaporize when you discover that use case only exists because your data is poorly structured and your project is poorly planned.

  6. gormful

    I wondered why the IronPython team just announced an alpha version of IronPython 3.7 after years of inactivity.

    But it makes sense now that I know Guido is working on Python at Microsoft...

  7. Yes Me Silver badge
    WTF?

    Mission: Impossible

    Backwards compatibility of features that we might not even know we have
    Python's an interpreted language and includes the eval() and exec() constructs. As a result you can never know what you have in the code base. I've written Python that dynamically constructs statements and sends them over TCP to another Python program that executes them. You can't do that in a compiled language, so it's pretty certain that there will always have to be a complete interpreter and a byte code version of all variables. Good luck in making that run several times faster on the same hardware.

    1. chuBb. Silver badge

      Re: Mission: Impossible

      Yes you can do that in a compiled language

    2. Anonymous Coward
      Anonymous Coward

      Re: Mission: Impossible

      >You can't do that in a compiled language

      Spoken truly like a man who would write an application that manually wrangles sockets to send plain text content over a network to be arbitrarily executed.

      Of course you can do that in a compiled language.

  8. sreynolds

    Screw it...

    Just take the money and start writing something to translate the python to rust and kill the language. Why do these high level languages hide what really happens on REAL hardware, not on idealistic machines that never exist?

    1. werdsmith Silver badge

      Re: Screw it...

      They abstract the machines because it means that people using the language don't have to worry about the machine. A person doing science wants to focus on the science that they are doing, not the computer science behind the language. The power of python in these areas is that it is reasonably easy to get it to manipulate dataframes and do mathematics. So it is displacing MatLab in Unis, but not displacing C in embedded.

      1. sreynolds

        Re: Screw it...

        A good man always knows his limitations. When people started using python for tensors and other larger arrays, they should have realized that a language that uses garbage collection was never going to be optimal. Matlab and python are what people use to prototype stuff before they used generated code.

        1. werdsmith Silver badge

          Re: Screw it...

          Yes of course. They are also what people use to do stuff quickly and tweak and customise. Most often for their own use and it doesn't ever reach being formally turned into an application . It doesn't actually need to be optimal for 99% of purposes. When it does, then there are other tools.

          1. tetsuoii

            Re: Screw it...

            Guess what, I do that in C and it takes no longer and can actually be used in real software after tweaking and customizing.

        2. Anonymous Coward
          Anonymous Coward

          Re: Screw it...

          >they should have realized that a language that uses garbage collection was never going to be optimal

          Which is a big part of why TensorFlow, Pytorch, Pandas and SciPy and all the other performance-critical bits in wide use are in fact written in C or Fortran.

          Which you'd know if you had even the most passing familiarity with the subject at hand.

    2. Dan 55 Silver badge

      Re: Screw it...

      Seems Rust is the answer to everything now. When did that happen?

  9. HammerOn1024

    Or...

    Just learn 'C' and Assembler and stuff python.

    Real-time guy here... interpretive languages are for script kiddies and a waste of my time. :-)

    1. thames

      Re: Or...

      I've done plenty of real time embedded work in interpreted Basic running on an Intel 8052AH 8 bit chip with 32kb of RAM and clocked at around 10 MHz. Loads of other people have done the same. The market for this was big enough that several companies were making dedicated versions of this chip with the Basic interpreter masked into ROM on the chip itself. It's a variant of the 8031, which was a top selling micro-controller CPU/SOC.

      All of the major industrial control vendors (Siemens, AB, Schneider, etc.) sold them as modules for their control systems. Numerous single board computer vendors, whose market is entirely focused on embedded, sold them as well.

      Loads of major industrial processes and equipment with more dedicated embedded control systems used this as part of their system, or as the complete thing.

      Someone who genuinely knew more than a smattering about embedded control and had years of experience in the industry would know all this.

      These days I would quite happily use a Raspberry Pi running a Python program in many types of real time embedded applications. People who have genuine experience in a wide range of embedded control applications would know that "real time" actually means "can meet the process deadlines".

      And while we're on the topic of interpreted languages in real time applications, I should mention that I've used Python in PC based embedded industrial systems doing real time interaction with robots and measuring equipment.

      I'll keep this short, but here's a list of other interpreted language systems that I've used or are familiar with which have seen widespread use in real time industrial control or test and measurement: Python, GWBasic, QBasic, HP Rocky Mountain Basic, UCSD Pascal, and a host of proprietary languages.

      And I should add that until very recently most or all of the PLCs used to control factories around the world used interpreters.

      I would suggest that you wind your neck in on this topic.

      1. Blue Pumpkin

        Re: Or...

        "The market for this was big enough that several companies were making dedicated versions of this chip with the Basic interpreter masked into ROM on the chip itself...."

        Hey that was me - for one of them at least !

        Cramming a Tiny BASIC interpreter coded into 4K of ROM on an 8051 in 1982. Took me a week to reorganise my assembly code to get down from 4099 bytes to 4095 in the end.... those were the days

        1. bazza Silver badge
          Pint

          Re: Or...

          One byte spare? Tsk, could have added a whole new feature...

          On a serious note, definitely hats off. Its folk like you that gave folk like me a cheap enough machine to get started with. Beer owed.

  10. Ozzard
    Devil

    First define your supported backward-compatible surface; the rest is "mere engineering"

    I think that backward compatibility is going to be an awful lot of fun to define.

    Imagine, for example, the race conditions that nobody has ever found in their multi-threaded code because the existing code has particular performance characteristics such that one thread always gets there ahead of the other / the code is slower than the hardware being controlled. Now consider a project that *only* varies timing, and makes no other change. You've already lost backwards compatibility, in that code that work{ed,s} in the old environment no longer works in the new one.

    I confess I'm going to sit back, grab the popcorn, and watch the fun, continuing to avoid as far as I can the trio of Topsy-ish "just growed" P languages that were originally fucked by their lack of architecture and are now *utterly* fucked by their requirement for backward compatibility: PHP, Python, and Perl. Spawns of Santa, all of them, hence the icon.

  11. Paul Hovnanian Silver badge

    Faster?

    Stop making me type in all those leading spaces for one thing.

    1. W.S.Gosset Silver badge

      Re: Faster?

      If you look upwards and to the left of your space bar, you'll see a key labelled "Tab". Set AutoIndent=On.

      You're welcome.

      1. bazza Silver badge

        Re: Faster?

        Indent errors are a major source of bugs in python code, and are slightly hard to spot on account of the fact that the missing or extra characters are invisible... Give me curly braces any day.

  12. bazza Silver badge

    Need for Speed?

    If Python isn't fast enough, it's the wrong language for the application.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2021