![I am very happy and wish to express my contentment Happy](/design_picker/fa16d26efb42e6ba1052f1d387470f643c5aa18d/graphics/icons/comment/happy_48.png)
Hmm
So, pandering to the audience.
Not content to bait developers by declaring that Python is the fastest-growing major programming language, coding community site Stack Overflow has revealed the reason for its metastasis. Coming a day after Programmer Day, which falls on the 256th day of the year – except January 7: – the explanatory post by data scientist …
when you play with big quantities of data in science, the speed is usually limited by inefficient code, not by inherent properties of the language. When I crunch my 5 GB dataset, making a for loop a little faster won't make my code run in a reasonable time -- but moving to a sparse data representation or avoiding the loop altogether will. Python makes those things easy, that's why it is a game changer for science.
Python makes those things easy, that's why it is a game changer for science.
When talking about science, it is best to avoid hyperbole and exaggeration. Is python a sometimes convenient tool? Yes. Is it helpful to have iin some situations? Absolutely. Is it a "game changer"? Hell, no.
New algorithms for data processing and representation, or new models to analyse that data, or new theories to explain it, or new experimental techniques to measure it can be a critical new development, or a "game changer" if you will. A computer language which allows you to quickly slap together a bit of code neither your colleagues nor yourself will understand in two months time? Not so much.
Absolutely. Is it a "game changer"? Hell, no.
You should probably talk to more scientists. Python has become popular among a huge range of scientists with no formal computing qualifications who are required to process large amounts of data. I've met several who would never have got their work done without Python. So, yes, for some it really is a game changer.
is it a "game changer"?
For me absolutely, for at least a dozen reasons, including resource management, the ability to use it interactively and the fact that you can easily interface to C (or other low level language) libraries. Although it's probably true that there's nothing you can do in python that couldn't be done in C, and the C would ultimately run faster, the development time in python is orders of magnitude faster, which in a scientific, especially research, context is far more important. Consider, for exanple, some real world data set which could be represented by nested dictionaries in which keys can either be real numbers or strings. This is trivial in python and can be taught to science students without a programming background, doing it in C would probably be a 2nd year undergraduate programming exercise.
Yes, there are potential drawbacks to Python, but in my view they are mainly social (untrained people tend to try run before they can walk), not technical
You could make that argument for Excel/Access/VBA. But I'd rather you didn't.
At some point it all ends up in front of an experienced programmer as a pile of novice code, a huge problem, a short deadline and requirements of "I can't quite get it working, will you take a look?"
It's got its merits but, like everything else in this industry for the past umpteen years, all the breathless hyperbole is a bit of a turn off.
@ BeakUpBottom (At some point ...)
That is an example of the social problem I mentioned. When I teach short course to non-programmers (for very secific tasks), the first thing I say is along the lines of "you will be learning to use a programming language, but this is NOT going to turn you into programmers" (and repeat several times thereafter!)
You could make that argument for Excel/Access/VBA. But I'd rather you didn't.
I know no one who prefers VB(A) over Python and lots of people who've moved from VB(A) to Python and have embraced it wholeheartedly, especially as some of us work hard to make it possible to work with MS Office files without having to start Word or Excel.
While in Python you can't simply record a macro to get something done, it's a good example of nearly literate programming. Most new users are keen to right good code and respond well to suggested improvements and I've almost never come across unreadable newbie code. I know some people hate the whitespace but it makes a real difference in these environments.
VBA on the other hand has access to some fantastic API but as a programming language is akin to self-harm.
There are millions of EUCs in excel/VBA that never need to get in front of an "experienced programmer".
Python is filling a similar niche for people with more specialised number crunching requirements. In terms of hyperbole it's solving a smaller set of problems than the spreadsheet but it's solving them exceptionally well...
This is actually a super-happy story in that we've actually managed to grow some combination of language and tooling that people want to use!
@AC:
" When talking about science, it is best to avoid hyperbole and exaggeration. Is python a sometimes convenient tool? Yes. Is it helpful to have iin some situations? Absolutely. Is it a "game changer"? Hell, no. "
That's every bit as strong a statement as the one you're seeking to refute.
In the early days of computer programming, most people were just scheduling batch jobs (hence "programming") using a scripting language.
The problem is, most shell scripting languages are rubbish. Most attempts at more powerful shell scripting languages (e.g. Tcl) were contorted, byzantine affairs. Javascript was clumsy to start off with, and when people tried to put it into the shell, if just felt weird.
What is often overlooked is that Python is a shell scripting language, and it manages to maintain a pretty high level of flexibility and power while still being more learner-friendly than most languages.
When people complain about its lack of speed, they're kind of missing the point, because in applications like data science, all the heavy lifting is done by libraries, which are generally compiled C code.
Python with Pandas is a bit like a massively updated version of using calling grep from a bash script.
It has changed the game.
> What is often overlooked is that Python is a shell scripting language,
Python is a computer language. The most common implementation can be used as a 'shell scripting language', or as an application programming language, or as a statement evaluation tool. Other implementations can be used as an embedded language or can compile to various VMs and/or can use JIT compilation.
I used to think this was true as well, but it's not. if you are dealing with large quantities of numerical data (and 5GB is not a large quantity in this sense: our jobs create terabytes a day) then having something which implements various numerical array-bashing operations efficiently does actually matter. Hence NumPy.
So how exactly do you install pandas? Every time I want to try out python, I do not know which version to use, I just hope that random pip command du jour will work and nothing will break. Should I use Python version which came with Mac? Or the one I installed with Homebrew? Where do the packages reside?
Python's packaging remains a problem. However, in general you should avoid installing user libraries for a system language.
Personally, I create a separate virtual environment for every Python project an install the required libraries only there. However, when it comes to Pandas you can also install Anaconda (from the maintainers of Pandas) which comes with its own package manager for a set of well-maintained and pre-compiled libraries.
"So how exactly do you install pandas?"
For engineers such as myself, this is one of the biggest headaches with Python. Compared to more engineering-focused ecosystems package and dependency management are a pain in the arse.
However there's also a pretty simple answer that satisfies most use cases.
Step 1: Install anaconda
Step 2: Set up a virtual environment per project
Step 3: Distribute that virtualenv as a docker container
Job done.
"So how exactly do you install pandas?"
Typically one gets a zoo on board about 2 to 3 years ahead of schedule, and arranges government funding to build a specialist panda enclosure, and then one writes up an agreement with the Chinese Government and the panda breeding associations. From what I've been reading of late it costs between $85,000USD to $1.1Million USD a month to host the pandas. Most of the money is supposed to go back to the breeding and protection of the species, but I've no proof of that.
<ref TorStar article "Pandas Installed at Toronto Zoo over objections from (Free speech advocacy group) >
"It's fun (for a programming language)
It's readable
It has lots of libraries
It's approachable for novice programmers"
.
alt.sysadmin.recovery always had a very useful motto: "All hardware sucks. All software sucks. They all suck the same."
As applicable here, all programming languages suck. 'Fun' is an orthogonal concept.
Libraries, people, documentation - that's the package that makes progress possible in any particular language. The language is a circumstance.
It is the masochistic kind of fun. BDSM. All that matters is your pain threshold - how painful do you want it to be before you enjoy it.
Python is for people with low pain threshold.
Javascript and C - same but for those who enjoy the quantity - more beatings, more fun. Just light ones every time.
Perl is for the really kinky ones - it does not hurt a lot, but hurts in some really weird places.
Java, C++ - for those of us who need to have a glimpse of the light eternal before they get a kick out of it. A good equivalent would be - BSDM fans of strangulation.
Microsoft offers free Jupyter notebooks in the Azure Cloud at notebooks.azure.com for those interested in investigating Python notebooks. There are also two very basic Python courses from Microsoft on edX suitable for the rank beginner that use the notebooks.
There are free Jupyter notebooks for the Julia language at juliabox.com for those interested in Julia, and there is a Coursera course on Julia that assumes you know other languages.The objective of Julia is to provide the ease of use of Python, R, and Matlab while running as fast as C or Fortran. See juliacomputing.com