The @codon.jit decorator sounds like an incredibly useful compromise from the docs.
Definitely plan on trying this out as soon as possible!
Edit..Then I got down to the licensing information and suddenly lost all interest.
Python is among the one of the most popular programming languages, yet it's generally not the first choice when speed is required. While it can be optimized for better performance, Python is prized for qualities other than speed, such as readability, a manageable learning curve, an expansive ecosystem, and utility in both …
> Then I got down to the licensing information and suddenly lost all interest.
Which presumably means you're only interested in commercial use.
Look on the bright side: you have four years to do all the dev work and get it *really* finished before that version of Codon switches to the Apache licence and you can ship. If any sales manager demands an earlier release you can just point Legal and Compliance at them: get them working on your side for a change.
Certainly not only interested in commercial use or even "mostly" commercial use, but also not interested in investing much time in something that I now know I couldn't use in any potential employer and would be better off committing to getting comfortable in a more suitable language instead.
When older releases are reaching GPL stage I might well revisit.
I'm a pentester not a developer so my commercial code is not ever shipped code per se, the stuff I do is pretty much all one off hacky scripts to help me out (data manipulation for example) that never see the light of day again and I'd be ashamed to have any real developer or data engineer catch a glimpse of.
I don't usually need this performance, but every now and again my junk code thrown together in a few hours can take a few hours to run and getting that down (and speeding up the many debugging runs) without spending hours/days optimising would be nice.
Nothing I do along them lines would ever take 3/4 years without me being fired, let's put it that way.
Also some network stuff of course but doubt that will see many gains here so not much point there.
Those slow runs are not a deal-breaker - usually runs in the background while I get on with other things.
GPU feature is nice. Had a few instances where password crackers for proprietary software have come up, and I've never used python for those nor have I ever committed to GPU for them, but then again I normally just use whatever the language the reverse engineered code is and flip it around to knock those up too. Assessment windows are short and so spending more than a couple hours on these things is just impossible.
But ultimately nobody would commit to licensing for this kind of software for "that one python guy" when teammates all have different language preferences, and I'd probably only benefit from these performance improvements a couple times a year.
While that might not even count as "production code" under that license I'll prefer to just find a "free as in freedom" alternative.
"Which presumably means you're only interested in commercial use."
They're not even that clear. The license talks about "production use". Both terms can be vague, but the way I use them, production use includes all cases of commercial use and quite a bit more.
I run a few services just because I want to, making no profit (I pay for the servers, so I make a small loss). Is this production use? If I was using a piece of proprietary software, yes it is and just because I don't make any money doesn't mean they would let me have a license for free. I have also built software for charities before, which is more definitively production use and starts to near the line of commercial use (I didn't sell it, the charities aren't selling it or using it to sell things, but they do take in donations and may sell items elsewhere in their operations, and if the software does something like track that, it could be commercial use).
I am disappointed that they chose this license, but it is their choice to do so. It will prevent me from contributing code to them. Since I don't know what license terms would be if I did want to use it in production use, I'm also less likely to take it up unless I want a fast, compiled Python script just for myself. Maybe in 2025 we'll see what it has become.
If you - or anyone - is really interested in using Codon in production (or even just to see if they can provide clarification on what they deem "production" to mean, for instance with regards to the charity work) you can always drop them an email and ask.
The Seq project is licensed under plain old Apache (which is presumably why Codon is reverting to that and not to, say, MIT or GPL: stick with what they know) and the relicensing has been done with the fork to Codon and the creation of the Exaloop company, which still appears to be a small academic offshoot: i.e. not exactly a full-fledged shrinkwrap software vendor.
Following the MariaDB BSL's FAQ, Exaloop can always decide to grant you an Additional Use Grant for limited production use (e.g. charities). Or they can even a sell you an appropriate commercial licence.
Or, if enough people - preferably representing companies - express an interest in Codon and - POLITELY - point out issues with the MariaDB BSL (like, what does "production" mean?) and maybe suggest a better alternative...
That is all true, and just because the license isn't OSI-approved doesn't mean I won't use it. However, I have not had great results when going to a commercial company as an individual who is using it for noncommercial projects before. This group may be different, but sending a bunch of messages to someone who might or might not be authorized to write a legal document to confirm that I won't be breaking the license often doesn't work when I'm offering either little money or no money.
I distinctly remember an occasion where a company offered their proprietary library with a page that implied that you could easily buy a license to use it in one-off projects on commodity hardware, but when I sent a message asking the terms, it turned out that I had to pay €20k per year just for the SDK and any use would incur another charge to be determined later. They had intended it to be purchased by builders of embedded devices and I planned to have one project running a single instance. They weren't impressed.
Bit of a thread hijack, but Matt Parker did a video “ Someone improved my code by 40,832,277,770%” where other coders took some Python he wrote to perform a task in 30 days, and sped it up, and sped it up, and sped it up........ and sped it up some more.
Interesting watch, but the takeaway was your code only needs to be as good as the results you are willing to accept. And he was just fine with 30 days.
https://youtu.be/c33AZBnRHks
If performance is a key requirement, there's no substitute for properly written code. I recall something similar in DSP, where the difference between an algorithm implemented in %TRENDYLANGUAGE (eg MatLab, or C#) and the same thing written in C using a proper compiler and libraries (eg icc and Intel's IPP/MKL libraries) was vast. And that was before a distributed architecture Grey Beard got involved and got their very best OpenMPI fingers working on the job.
There's currently a trend for model based synthesis, eg MatLab / Simulink built directly to FPGA. I've yet to see such an approach succeed in anything but trivial cases. I've seen a lot of projects that try this go wrong...
In my graphic computation project in uni (30 years ago....God, I'm old!), I used some embedded assembly in my Visual C++ code to load a polygon vector file. While most of my colleagues' code took several minutes to load the largest files, mine did it in a couple of seconds - it loaded the first file so quickly the teacher first thought my code just didn't work - he only realised it did when I told him to press the menu button to show the 3D object and all worked :)
It's nice when your program goes, "Yeah, and? Try harder next time!" :-)
I once revolutionised an entire department (this was back in the early 1990s, when even PCs were still a fairly new concept). A lot of the work relied on a "fast Fourier transform" routine, written in Turbo Pascal (those were the days). My "revolution" was to replace this with C code, but also to actually implement an FFT because the original Pascal in fact implemented a discrete Fourier transform (which is considerably slower!), and no one had realised.
I was once tasked with speeding up a program that calculated the optimal schedule for a surface-mount machine. The program ran on Apple IIs and was written in AppleSoft BASIC. It barely fit in memory so all comments had been removed, and were maintained separately in a ring binder. It was compiled with Microsoft's AppleSoft BASIC compiler which took about half an hour, but worked flawlessly and got runtime down to about an hour.
After some time spent working through the code, it turned out that it was spending most of its time bubble-sorting an array; replacing this sped up the whole program by a factor of 5.
The biggest difference is most likely that Cython will only use static typing if it's explicitly written, otherwise you still incur dynamic typing overhead, whereas Codon uses black magic to infer the types from a generic Python file and thus needs no extra work for massive speedups. The paper's pretty interesting, but reminds me that I'm a mere mortal, not a real computer scientist.
Oof, checking their website, the automagic just forces 64-bit ints and ASCII strings, that's going to give a ton of speedup but simply not work for many of the specialized applications where massive speedup would come in so handy.
On the other hand, baked in threading and no GIL (except when interfacing with CPython) is a real nice addition.
Quite a lot of my variables come from accessing external web services, and as far as I can see, there is no way of telling from looking at the program what you are going to get.
If I then import it into a Numpy array, then I tell it at that stage what it is going to be, but even then if it isn't it will try to convert it (eg if I tell it it is a float, and it gets a string comprising of the numbers 0-9 + . + possibly a "-" at the start, then it will be able to convert it into a float).
The same I guess applies if you read them from a file or possibly even from a database query.
But really, if you want the speed of a compiled language, then you should probably write it in a compiled language, and a knock-off version of Python probably isn't the best option for a compiled language. People choose Python because of the libraries and general ecosystem that surrounds the language, not because of some magical properties of the language itself. If this only supports a subset of Python, is it still Python, or is it some other language that looks a bit like Python?
"People choose Python because of the libraries and general ecosystem that surrounds the language, not because of some magical properties of the language itself."
Disagree. Dynamic typing is a vital advantage in some cases, and one that no compiler can handle properly (Burroughs Algol thunks come to mind from 50 years ago, but that doesn't solve the whole problem). Then there's eval() and exec().
Also I wonder how well it does with monkey patching.
Anything relying on either is broken by design.
Well, yes and no. Which is to say I don't disagree :D
Many life times ago I was writing systems that were designed to implement self modifying code. (There's a lot of hype about "AI" systems at the mo, so I'll just say it was an "AI" system, and consider myself unimpeachable!) I was using Clipper. So, compiled speed with runtime Codeblocks, a Clipper feature allowing interpreted Clipper code at runtime (ie. post compilation).
If you consider designing software that can modify itself as broken by design then, okay. I can't see anything but extreme niches where SMC would be needed.
[Edit: not my DV.]
Both exec()
and eval()
exist because they are necessary. But you are unlikely to ever use them your own code, I certainly haven't in over 20 years. Monkey patching can be essential in tests and can be occasionally be the right tool for the job.
The problem with all languages is the abuse of features for the wrong reason: I've seen people using comprehensions for flow control because it saves a line. It also makes things impossible to debug.
Python's great advantage is that helps millions of non-programmers get stuff done. Generally, quickly, safely and correctly.
Well, they say this about it:
"Codon is a Python-compatible language, and many Python programs will work with few if any modifications:"
So, no, not actually Python - but looks more that just 'a bit' like it. Besides, anyone who has been around a few years is used to coming across variants of languages - how many variants of BASIC or PASCAL or FORTH have you seen over the years ? It's not an issue if you are starting a new project surely ? If you are feeding existing code into it though, then that may well be painful.
This is better than a poke in the eye with a sharp stick. This is sort of a "have your cake and eat it too" situation.
I write in Python because "it thinks like I do" and as you say, it's got great libraries and general ecosystem because of the "batteries included" philosophy
Normally I just accept that Python will not be fast but now if I discipline myself a little, I can have speed too, on the parts where speed is desired. I don't have to switch to Java.
Compared to Python, Java is blisteringly fast. There again, most languages are faster than Python - version 3.x of Python attempted to fix the terrible threaded performance, but the nature of the language makes it very hard to improve things in a single thread. The advantages of Python are allegedly the ease of writing something in it, although I'd argue that even if that's true the maintenance burden of a dynamically typed language and its poor support for OO make it totally inappropriate for large systems.
If you're already using NumPy, then you're already on the other side of Raymond Chen's proverbial airtight hatch -- you've already made significant adjustments to your code to fit them into NumPy's optimized C routines, and there's not really a lot Codon can do for you.
Interfaces that can return any random type are very rare and a nasty code smell, though. Most of the time it's always either just a [type], a subclass thereof, or None.
If it works and is fast and it works with the libraries you need, and it would take you twice or ten times as long in C, why go straight to C? (Admittedly C++ is starting to look much more Pythonic these days.)
I've seen a lot of code that started as Python but never ran at more than prototyping speed due to I/O latency. Instead of moving on to another language, absurd levels of resources were spent trying to hack in multithreading or multiprocessing. The result is code that's unreliable and unmaintainable, while still being too slow from spin loops and unexpected waits in the GIL.
Native compilation can speed up Python 100x but it's still going to spend all it's time waiting on serialized I/O.
Well, it can be pre-compiled as an alternative to JIT compilation. Depends on your usage scenario, I guess.
I hardly use it personally, but find it quite appealing, and am interested to see where it goes, in particular how much traction it gains in scientific computing, my field.
I currently use mostly Matlab, mainly because it is something of a de facto standard in my particular area (although that is now tilting towards Python/NumPy/SciPy). The programming paradigm in both cases is that all the computational gruntwork is devolved to uber-efficient low-level libraries like the BLAS, LAPACK, fftw, SLICOT, etc., you vectorise the hell out of your code (an art in itself), and if all else fails in efficiency terms, write C plugins. In other words, the language itself becomes little more than a convenient scripting wrapper. Julia offers, at least, an alternative paradigm.
> Languages are irrelevant,
No, they're not. They're really not...
> its all about the libraries and how many has Julia got compared to say Java ?
... but that's a very fair point. Note that Julia is explicitly aimed at high-performance and scientific computing, so fair comparisons would be with Python, Matlab, C and, um, Fortran (Java not so much). On those grounds, Julia fares pretty well, although lagging a bit (but not that much) behind Python and Matlab - understandable, being a more recent language. It does appear to have a sizeable and very committed community, though, and is catching up.
This post has been deleted by its author
> 5 different python installations on your computer
Just for a laugh, as I'm booted into Windows right now, did a quick check: there are only 53 python*.exe files on here at the moment.
Most just literally just python.exe, so gawd knows which version they are. Luckily none of them are on the normal PATH so confusion avoided (!)
My guess is that some of those are in virtual environments if you or someone else using this computer does Python development, which means they're either copies or symlinks with the effective path changed by the activation script setting environment variables. Others may be packaged versions of the runtime for applications written in Python. I doubt you have that many full installations of different Python versions. I don't disagree that it makes it a mess, but probably explains a bit about what's causing them.
Oh, I know exactly what's causing them, and it is nothing out of ordinary; I just liked the comparison against the idea that a Mac might only have five Pythons!
A few are the interpreters used by various Python IDEs - eg, Thonny, Jupyter and so on - that I've installed for the sake of having access to Python for one purpose or another. I specifically use WinPython as "the command line python" because it explicitly is intended to work without messing with PATH and the rest of the environment (as hash-bang lines aren't really Windows cup of tea).
The rest are all installed as part of one application or another - and they are all "full" [1] Python installations in that they (the ones I could be bothered to test) happily run the Python repl and execute My First Python examples; each one (hopefully) only has just the set of libraries they need to do whatever their app wants, but again I've not bothered[2] to look that deeply. Although as what they need includes a GUI and hence another copy of wxPython or similar...
No symlinks or multiple virtual environments here: this lot all came from separate and individual application installs.
[1] Please, nobody try to argue that it is only a "full install" if it also includes x number of your favourite libraries!
[2] they work, the drive space is cheap enough and it isn't an interesting enough question to be worth figuring out to determine which, if any, are never used by a given program. Given how much space I've knowingly wasted by installing the kitchen sink and all copy of WinPython, "just in case I need it"...
In the old days if a language didn't run fast enough or the spec didn't meet requirements we didn't butcher it to make do things the spec didn't allow, we simply learned a new language that did have what we needed. Fair play to the devs fo Codon, it's an achievement I won't take away from them. However I still stand by my statement, language's are spec'd to do certain things in a proper ways, people learn the spec and adhere to the language and next thing they have change to a greater or lesser extent.
Javascript and Python suffer from one major flaw, they're constantly trying to be all things to all men, and what you end up with something that only barely manages to achieve anything. Now both a mess of add-ons, botches, wrappers and patches. Node.js, utterly detest it as a bastard child of something that was designed to run very well inside browsers, now it's just something that tries its hardest to consume all the inodes on your disk by requiring 78 bolt-on packages just to make a HTTPS call. Python is already close to ending up the same, it's already assuming it's place as a perfect way to get malware through the dev chain via a 47 layer stack of depdencies and now we're going to make it spit out binaries.
Call me a dinosaur if you like but you still can't beat a well skilled good C++ coder for writing fast, efficient code. Me I'll stick with Go and Rust thanks, I'll suffer Python to use Ansible but that's my limit with it.
> In the old days ... we didn't butcher it ...
Well, as already mentioned above, every BASIC was different, adding in new features to do what was wanted that month. Forth variants abounded and if you liked Pascal did you use UCSD (cross-platform, but slow - and still not "to spec" as it had stuff not in the standard) or TurboPascal for that speed boost?
Warren's Edinburgh Prolog or SWI Prolog or perhaps even TurboProlog? How about MicroProlog?
C++ - do you mean TurboC++ or VisualC/C++ or gcc? These have had (still have) their own oddities and useful(?) extras that aren't in the spec and change(d) over time.
And for someone who wants to follow the spec for a language, Rust is a bit of an odd choice: sure, you can write working code in it (as you can in any of the above) but can you point to the paragraphs in the spec that you are adhering to today? Or are you just being pragmatic and sticking to the examplar implementation?
"if you liked Pascal did you use UCSD (cross-platform, but slow - and still not "to spec" as it had stuff not in the standard) or TurboPascal for that speed boost?"
I did, and still do, like Pascal, first Turbo, the Turbo for windows, then Delphi. I lost touch with Pascal while doing other things, but have come back to the fold with Free Pascal and Lazarus, the licencing terms for the "community edition" of Delphi being eyewateringly expensive for my purposes.
I also did Turbo C++ a long time ago, and I'm seriously considering playing with Visual C++ and MFC to do some windows apps the old fashioned way.
But for quick and dirty tools that I would once have written in Turbo Pascal, I've long been doing them in Python. It's a friendly language, has a library for just about everything, and it's good enough for what I need, though I can see that it might not suit all tasks.
As someone else said, use the language best suited for the job - if I need something Python can't do well enough, I'm not above looking around for a language that can.
I lost touch with Pascal while doing other things, but have come back to the fold with Free Pascal and Lazarus...
That tracks true for me too, leaving the Delphi world around 2000, along with Windows. Which is relevant (not a Doze bash) as I wanted similar on Linux. I use Castle Engine and Lazarus, multiplatform-tastic!
And yeah, agreed that with languages go with fit for purpose.
......needed to download LOTS of other stuff:
- llvm-devel
- zlib-devel
- perl-core
- libstdc++
- libstdc++-static
.......and finally got the package to create the Makefile........
.......but the make failed near the end........not fixable as far as I could see........a couple of hours.........no useful result.
Someone else here can let me know if Codon actually does what it says on the tin!
Seems like you really need to be careful of the results, because the expression evaluator isn't really like Python's.
It seems this project hasn't learned from the mistakes of its predecessors and is in my opinion doomed to be a footnote in the language's history.
There have been many attempts to make Python faster that compromise on compatibility, and none of them have succeeded in really catching on. Why? You have to think about the ecosystem. I don't think most people use a language just because it's a cool language. You use it because it gives you access to the tools and frameworks you wish to use, unless you live in academia or you're a hobbyist who can afford to reinvent the wheel.
It doesn't even really support Python, and they should have been more forthcoming about this in the README. Some fundamental aspects of the language are simply unsupported or have different behavior because it's incompatible with their goals or hard to compile. If you're not a drop in replacement, at what point does it make more sense to invest the work of making a square peg fit through a round hole in writing a simple C extension for performance?
Finally, the restrictive license definitely puts a cap on adoption. Of course a developer should be compensated for their work, but it certainly does not have a positive impact on adoption. We know that making your code more accessible helps.
I'm not trying to put down someone's hard work. I'm saying that the project strikes me as naive when it doesn't address the elephant of the room of why this project deserves to succeed where others have failed.
The dynamic nature of Python and Javascript means there are many exaples where its impossible for the compiler to narrow down and simplify some construct.
The simplification that type safety brings is not only for humans to reason, its about simplifying the dispatch of methods for an instance. In Python for example to dispatch a method the runtime has to look at all available methods on an instance by scanning by name, in Java for example its an index lookup of a table. No prizes for guessing which one is faster.
> It doesn't even really support Python, and they should have been more forthcoming about this in the README
From the Codon github page (i.e. their README.md) : "Codon is a Python-compatible language, and many Python programs will work with few if any modifications ... While Codon supports nearly all of Python's syntax, it is not a drop-in replacement ... and a few of Python's dynamic features are disallowed". That seems clear enough - it isn't Python but it is Python-compatible (enough for their purpose).
After the quick README, the next stop is usually the FAQ and, yup, there is "What isn't Codon?" to fill in some background and even point you towards using Codon for the bits it can do and leaving the rest in plain old Python.
> You have to think about the ecosystem
Presumably, you mean all the sorts of tools that will be of interest to the BioInformatics trade? Even from the Seq days: "There are many great bioinformatics libraries on the market today, including Biopython for Python, SeqAn for C++ and BioJulia for Julia. In fact, Seq offers a lot of the same functionality found in these libraries. The advantages of having a domain-specific language and compiler, however, are the higher-level constructs and optimizations..."
Oh, you wanted to apply it elsewhere? Well, if the Python integration does what they indicate, you can make use of all of those libs and bring in Codon piecemeal on the bits that would benefit from the speedup. Just like you already do when linking from Python to a nice fast compiled C library, only without making the full leap to C (or FORTRAN or whatever your analytics are already written in).
> Finally, the restrictive license definitely puts a cap on adoption
As noted above, the licence allows for everything short of use in production (and if you are having to argue about whether your use-case counts as production, e.g. charities, you know you can always talk to the people!). So you can do all your learning and research to see if it *is* useful to you. After that - TALK TO THEM! You *do* know that licences are negotiable, don't you? If you are interesting enough to them, they can cut you a deal!
> the project strikes me as naive when it doesn't address the elephant of the room of why this project deserves to succeed where others have failed
These others (can you name them, just so we're on the same page) - did they have a well-defined market in a growing field, in this case, BioInformatics?
Python and cool in the same sentence? I do laugh.
This joke of a language should have never made it to popularity. I still remember when Ruby and Python were new. Python was billed as "easy to learn" and eventually won. A perfect example of how a totally inferior solution can come popular and we the admins have been paying the price ever since. It is a yet another PHP train wreck that should have never gotten this far. But idiots will lap up anything...
Javascript is the modern Perl. Its essentially write and forget, after all most javascript has no comments and with only names its impossible to tell exactly what any value is. Did i also mention that NodeJS is single threaded ? In a day or multi threaded CPUS why would yo uwant to use a language that can only use one ? Sure you can run multiple instances of your runtime, but thats dumb because instead of sharing cached memory stuff you have to have multiple copies one for each CPU .
Friends dont let friends use Javascript on the server for any reason.