Yes, I had a look at the paper, and just as I suspected it uses the Computer Language Benchmarks Game (CLBG).
As a game it may be fun, but as a set of benchmarks they're worthless. Some compilers have been optimized to score well on those benchmarks, and some haven't. Change the problem set to something different and you may get a completely different set of rankings.
The rules of the CLBG requires that your entry must follow the idiom of the original problem exactly, without being adapted for how another language may actually work. This means that languages which favour approaching problems in a different way are inherently disadvantaged.
Many benchmarks that you find on the Internet are actually samples which were written specifically to show off features of a specific compiler. If you've just written a cool new feature into your compiler, then you need a benchmark which shows it off at its best. That has little relevance to anything which isn't very similar to that benchmark however.
These benchmarks represent a very specific problem domain, one that is probably best suited to C, or in some cases Fortran. If you have a problem like that, then write it in C or Fortran, that's what those languages are intended for. If you are writing a server application for a web site or a line of business application, then write it in a language which is suited for that.
There was a project at Google some years to write their own JIT compiler for Python, called Unladen Swallow. They used the CLBG benchmarks as their development benchmarks, and worked for months on the JIT compiler. When they were done they proudly announced how according to CLBG benchmarks their work was now 'x' times faster. The new JIT compiler was then rolled out to testing prior to deployment.
It was rapidly kicked back. The JIT compiled version was not 'x' times faster, it was 'y' times slower. By optimizing it for CLBG they ended up de-optimizing it for real world problems.
The big lesson learned from that was the need for more realistic benchmarks. These were created by using large chunks of actual applications as well as synthetic benchmarks, and this is the approach used by more successful Python JITs such as PyPy.
Unfortunately for people trying to compare languages, realistic benchmarks are not readily portable across languages, and nobody else is motivated enough to write ones which are.
The only realistic approach is to know several different programming languages and know which one is best suited to which application. Even better, learn how to use them together so that you can use C and python together in those situations where you need features from both.