haha web devs
I wonder if web devs will still try and convince the world PHP is a serious non mickey mouse language and they are real software developers.
Web developers are in a lather following the discovery of a bug in the PHP programming language that causes computers to freeze when they process certain numerical values with large numbers of decimal places. The error in the way floating-point and double-precision numbers are handled sends 32-bit systems running Linux, Windows …
I hadn't noticed any malice toward PHP until reading the comments for this article. Having since conducted a Web search for problems regarding PHP, it seems in all cases dislike for PHP stems from an inability or unwillingness to employ an effective programming methodology, as opposed to anything specifically regarding PHP.
@asdf, Thyratron, Greg: PHP is only as insecure as one's unwillingness to put some effort into the art of code design. In my experience, compiled languages lend themselves to far more buggy results than the likes of PHP. So what exactly is it that irks you lot about PHP? How is it insecure or "mickey mouse"? Get specific, for pity's sake. No one likes a hater!
> So what exactly is it that irks you lot about PHP? How is it insecure or "mickey mouse"? Get specific, for pity's sake.
As I probably didn't make clear enough in my "Testing times" comment, it's not the language itself that lacks rigour, but the people who use it. Yes, I'm sure there's a core of specialists who use effective methodologies, but most of the people I come across who claim to be PHP programmers are little more than 'script kiddies' copy-and-pasting fragments of other people's code without comprehension.
Whilst this might not matter much for simple websites, the trend is for more business processes to be moved online, and with it there is the danger they will be ineptly coded by this shower. Then it will matter.
(When you've finished downvoting me, pop over to Amazon and buy a programming book :-)
If this doesn't affect 64bit servers then I'd imagine the majority of anything half decently new and running non-Windows OS is likely to be unaffected...?
@asdf - like Python, C#, Java, Ruby, Perl, ASP, .NET stuff and pretty much everything else out there it's got bugs but then real programming languages don't have bugs, do they...err.
Yeah, that PHP must suck so bad, given that only three quarters of the Internet uses it, including some of the largest, most popular sites in the world with hundreds of millions of users. How about giving us some details of your complaints, assuming you actually have any...
What's next - MySQL isn't a "real database"?
What's so amazing about this story is that it has taken so long for this bug to emerge. I can understand how most PHP programmers wouldn't bother with proper testing, but for not a single one of them to find this until now is astounding. I don't think I'll be recruiting any PHP 'programmers' for a while.
Back when I used punched cards...
I develop in PHP pretty exclusively, but since I don't ever accept input from the user without screening it, and I can't remember the last time I ever used (float) to typecast it to a float, I'm always using ints when the data should be numeric.
So according to your theory, I should have hit the bug long before now - but seeing how I've never had to write anything processing exponentials, or variables that long, it's not something I ever encountered since I made sure I turned it into a format that was safe for use *before* I used it.
Incidentally, I just tested some of the code I work with and I can't trigger this on either PHP 5.2 or 5.3... is that because I'm a bad programmer and didn't test it initially, or because I'm a good programmer and made sure I wasn't going to get invalid/nonsensical input in the first place?
Surely you would hire a PHP developer if you need some PHP written? And of course, you are smart enough to recognise the difference between the language PHP and C that it's engine and other gubbinz are written in, aren't you? Although that would mean you wouldn't recruit any C developers as surely they are tainted by association. And as C developers rule the world (sorry Java/Perl :-) ) that kinda narrows down the talent pool.
On a side note i suspect this bug hasn't been noticed because anyone using PHP for serious maths heavy work uses a combination of 64bit (where the bug doesn't seem to exist), and BC math or GNU Multiple Precision extensions which i suspect don't touch the bug causing code
(I suspect i may have just fed the troll but oh well)
"What's so amazing about this story is that it has taken so long for this bug to emerge. I can understand how most PHP programmers wouldn't bother with proper testing,"
Testing is incredibly valuable, but there are diminishing returns as you generate a more and more comprehensive test suite. When the value of the returns is less than the cost of fixing a bug when it occurs 'in the field' as a project manager you have to say "enough is enough".
Given that this bug hasn't manifested itself as problematic in the real world (or there would have been much squealing well before now), and the code-base of PHP driven websites out there, then I think someone pretty much got that balance right.
I'm glad you're not my boss on the one hand demanding such a comprehensive test suite on the one hand that hard to hit, obscure and esoteric bugs are found before release, but also pushing us to ensure testing - always pushed to the right by development slips - doesn't delay a ship-date.
"Beware of bugs in the above code; I have only proved it correct, not tried it."
http://www-cs-faculty.stanford.edu/~knuth/faq.html
In something called "hiphop". Yeah... that's the ticket ... rappers, gangstas and faizbookas on da intarwebz.
"Facebook sees about a 50% reduction in CPU usage when serving equal amounts of Web traffic when compared to Apache and PHP. "
They lose 90% of the flexibility though. Why would you want to do that?
I confess that I too have accepted money for a short spell of developing in ASP.NET, but I wouldn't use the expression "As a professional ASP.NET developer" as a way of asserting my software credentials. It's a bit like "as a professional flat-pack furniture assembler".
I didn't know Facebook was written in Mickey Mouse. Is that Mickey Mouse 6.0 or Mickey Mouse.NET?
I could not extract a single bit of useful information from the article, except perhaps that PHP has a bug.
"Converting a fixed fraction decimal number into a floating point means turning an exact number into its best approximation"
What is this "fixed fraction decimal number" thing? There are many ways to express a number. But for god's sake, those are ways to represent the same number. They can be written differently, but they are the same number, regardless of notation used. What's next, a number in hexadecimal like 0x10 being a different number than, dare I say 16?
"In order to get the approximation as close as possible to the original number, a floating point conversion algorithm will perform several runs until the error between the original number and the floating point representation is smaller than some very small value."
Let's see, the computer knows the exact value of a number, and it wants to convert it to its floating point closest approximation. First question that comes to mind, if there is an exact representation of the value already in the computer (it should be, since it is somehow trying to find its best approximation), why it does need to use another, inferior, less precise representation? Answer: because that is not what is happening.
This is an example of how to turn a bug in the floating point parser into a confusing mess that no one can understand, and a way to perpetuate the perennial misunderstanding of data types and their representation. Which somehow is likely to be more abundant in the world of typeless languages and "everything is a string" programming, of which PHP is a prime example.
Have the tech standards at the Reg reached a new low or I'm missing something here?
Fractions don't convert between bases. The simple value of 0.1 in decimal has an infinitely repeating value as a base 2 or base 16 number. Since floats are fractions and an exponent, not all values translate exactly. Also note that in 2's compliment binary there are more negative values than positive values. There are lots of rules to keep all of this working, and that's probably where PHP is messing up.
2.2250738585072011e-308 decimal is represented as a 64 bit float by: 0000000000001111111111111111111111111111111111111111111111111111
0.1 decimal is:
0011111110111001100110011001100110011001100110011001100110011010
1 decimal is:
0011111111110000000000000000000000000000000000000000000000000000
1e100 decimal is:
0101010010110010010010011010110100100101100101001100001101111101
Several patterns are redundant and used to represent for zero, +infinity, -infinity, and Not-A-Number.
Zero:
0000000000000000000000000000000000000000000000000000000000000000
+Inf:
0111111111110000000000000000000000000000000000000000000000000000
-Inf:
1111111111110000000000000000000000000000000000000000000000000000
NaN:
0111111111111000000000000000000000000000000000000000000000000000
Over 30 years since I did my Computer Science degree and Kevin McMurtrie's post finally clears up my understanding of why floats can't represent many numbers exactly. I seem to remember a numerical methods lecturer who claimed that you couldn't represent 0 precisely as well or is that my misremembering?
Kevin, I knew all those differences you care to explain so well. I just wanted to point out how poorly written the article is, which is something I'm sure you'll agree. The difference between a number and how is represented and stored, with the associated loss of precision (or not), is a very complex one, and not easy to dealt with by any means.
I suppose I'm burned out of seeing so many self appointed programmers that firmly believe that a date is a string, a number is a string and everything is stored in the same way it is represented in ASCII. And PHP and typeless languages in general are notoriously handicapped at dealing with those things well. Or perhaps they just try too hard to make life simple for non technically minded people.
I wish that I was not as lazy as I am, I would have posted your examples or similar ones. Indeed, I wisth that someone with some knowledge about the issue had written the original article.
This is a bad bug but I believe PHP has always maintained if you're dealing with large precision numbers you should always treat the number as a string. Which as the originator's blog pointed out it seems to work.
and then use the precision maths libaries to handle such numbers as the internal maths ops are not built to handle precision numbers and has rounding errors. I wonder if anyone tried with the the libs and whether it still crash the system.
If it doesn't then I believe it's a case of bad programming as well.
PHP is a perfectly good language for what it's used for, which is to say, web sites.
I'm not surprised a bug like this hasn't turned up. How many projects are there that require that kind of crazy precision, which are written in PHP? Anything even remotely enterprise-y or scientific is going to be using far more established and rigorous languages such as Java or C. Heck, maybe even Fortran since it was specifically designed for dealing with this sort of problem.
It's a fairly safe bet that the NYSE and LHC probably don't use PHP for their system software.
I am PHP developer.
I don't use any of the MS .Net based stuff like c#, but it doesn't offend me that other people do.
Why the haters, its only a programming language and for the web its quite good, would you rather the whole web was still coded in perl or perhaps java servlets with JSPs.
Maybe get rid of Apache and run the web on IIS.
Seriously, horses for courses.
Everything has a faults, sadly the php haters seem to have the fault of not engaging their brain before slating something they probably have not used.
Oh and punch card programming is for pensioners!
Pedant alert:
Correctly
> Zero:
> 0000000000000000000000000000000000000000000000000000000000000000
should be +Zero as opposed to
-Zero:
1000000000000000000000000000000000000000000000000000000000000000
I'll get my coat - its the one with a copy of IEEE754 in the pocket
PHP is disliked because it is a shit language. Simple as that.
It has no rigourl, arguments to functions are inconsistently ordered str_replace is it (in_this_string, replace_this, with_this) or (replace_this, with_this, in_this_string) ?
then, of course, there's this http://imgur.com/7unV7
PHP is popular because it has batteries included and back when I started using it, when it was called Personal Home Page, the alternatives were CGI (mostly Perl) or ASP (vbscript, jscript).
I continue to use it through gritted teeth because once I've coded it up my replacement for ongoing maintenance will be much cheaper than me.
Just because you don't know how the language works, doesn't mean it's shit. If you want an exact match use triple equals, anyway. In which you'll find only -1 matches -1 not anything else, including '-1'. I wasn't surprised by what converted to what, but you then need to know how things evaluate as PHP is loosely typed after all.
String functions are string_funct(find_str, replace_str,subject)
Arrays functions normally call the array as the first option. It's a bit annoying why this is the case
Maybe you are one of those people who is confused by isset() and array_key_exists()
It's basic programming time: there's a difference between equivalence, and identity. The first means that two variables both transform to the same thing, the second means the two variables contain the same thing before transformation. And of course there is the one-way implication that identity(a,b) -> equality(a,b).
Some languages solve the difference between these two by mixing operators and functions, like using "==" and "equals()" (java), some languages solve it by not supporting equality out of the box (C/C++), some languages solve it by only using functions (functional programming languages, notably, like prolog and lisp) and some languages solve it using only operators, like PHP, where "==" is used for equality, and "===" for identity.
Why do you need it? Well, some languages only support one function return type in which case checking returns is simple. However, PHP is not one of these. Loose typing means you can stick whatever you like in a variable to get the job done, so it's important to have a way to differentiate between a function returning, say, a number or a failure indictaor like "false" or "null".
A conditional like if(functionvalue(...)!=false) {...} will fire when the function returns false, but also when it returns null or 0, because they are equivalent -- but not identical. For identity verification, to be able to see the difference between a legal function output and a failure (without an elaborate exception framework), if(functionvalue(...)!==false) {...} will make that distinction, only firing if the output isn't false. As long as the function retuns anything that isn't "false", and that means even things that are equivalent to "false", the conditional will be jumped over.
It's not the most elegant solution, but you don't use PHP because it's clean, you use it because it's quickly written and as dirty as C, minus the hell of memory allocation. Any dirt in the code left after you're done writing is dirt you put in there, and didn't bother cleaning up.
As you well know, so...
"Seriously? Just how many of the f*ckers do you need? = for assign and == for equivalence. Jesus."
It's actually:
= for assign
== for equivalence *of value* - two variables with the same value that might not necessarily be the same type
=== for equivalence *in value and type* - two variables with the same value and type
Is that too difficult for you? Would you rather do your programming in BASIC?
Although this is a stupid bug, the example you gave:
/store.php?cat=22250738585072011
would do nothing
I assume you mean
/store.php?cat=2.2250738585072011e-308
but didn't want to write that in case the script kiddies use that to take down the universe.
Well, don't fear.
Firstly, PHP request variables are passed as strings until converted by the script into other values.
so passing a normal parameter such as 2.2250738585072011e-308 is going to do no harm.
Converting that string to a float can crash PHP. Which is a big fail for sure, but isn't an epic disaster.
Secondly, what half-assed shopping cart are you thinking about with your store.php example that has floating point product ID codes?!
Fail flag for combination of PHP fail plus example fail!
The problem with PHP is that it's very easy to write bad code in, thanks in no small part to it's genesis as a quick hack to make coding webpages easier. People can therefore write code quickly and easily, but if they don't understand how things work, the system is open to abuse and problems.
For instance, opening a file based on a POST request can be done in a fairly trivial way:
$fh = fopen ($_REQUEST['filename'], 'a');
However, fopen can process URLs as well as local files, meaning that someone could submit an invalid/hacked file. Equally, $_REQUEST transparently reads both POST and GET, making hacking the form easy - and if you don't validate the form input (especially data destined to be sent to a database), it's may possible for an attacker to inject commands or trash your system - in the example above, they could read the system's password file or overwrite your PHP script with something designed to harvest passwords or worse...
It also doesn't help that the core function definitions are inconsistent (a legacy of bits being borrowed from C, Perl and anything else which was handy and could be duct-taped into play). For instance, when you want to use a string function, it could use any of the following four formats:
1) do_something (string, pattern)
2) doSomething (string, pattern)
3) do_something (pattern, string)
4) doSomething (pattern, string)
However, the actual language itself is as good (or bad) as most other loosely typed languages - Perl, JavaScript, Python, etc - PHP5's object model is good and solid and it does make a lot of stuff (e.g. processing XML, creating SOAP client/server systems, etc) relatively trivial.
I wasn't aware there was a language that wasn't easy to write bad code in!
All these language wars amaze me: in my experience good coders write good code in any language and bad coders write bad code in any language.
If you require the language to force you to avoid poor coding practices doesn't that say more about you than it does about the tool you are using?
Some languages^H^H^H libraries make it easier than others...
Having seemingly random function name "conventions" make it a bit harder for the spider-sense to discriminate good from bad and guess the function name.
Also, it makes it even easier to not know about certain functions that would do the job in a single call rather than the developer crafting a new misshapen wheel. Needless obstacles to knowing better cannot be good for helping those who don't escape their pit of ignorance, can it?
This isn't to say that bad coders are absolved, just to say that they aren't entirely to blame.
Wow, was the author really so clueless about the subject matter that had to borrow a random bloke from a random forum to explain to his readers that "[c]onverting a fixed fraction decimal number into a floating point means turning an exact number into its best approximation"?
I'm not having a go at this Pomax guy since he was presumably just writing a quick comment on a forum, but Dan. You are writing an article. No amounts of [sic]s are going to polish such a turd. Please do your job next time. Thanks
If a web developer is using even some half assed validation on a site, this bug should never come to light in the first place so I can understand why it has taken so long. Even an extreme situation, I can't ever imagine a user sending the web-server a 234-point decimal number and being justified...
Any "web" application that does need such extreme numbers would most likely be built on something other than PHP anyway, and I don't mean ASP either. Test ALL input, simples. The web was a dangerous place before this bug came to light.
But, hey...At least PHP is an open technology to discover such flaws and it can be remedied ASAP. Does this affect any other platforms I wonder?
So I cut my teeth on Z80 assembler before I learned C
In my first job I was taught C++ and Objective C (in the days of the Next foundation, well before Apple made Objective C cool).
Over the last 15 years I've used numerous languages on a range of projects, from military to transport safety critical and safety related.
So obviously I'm not a "real programmer" - that's why I use PHP as my tool of choice for most web projects.
(But don't hold your breath if you're using emergency telecommunications or travelling on a certain UK railway line - your physical being may be about to intersect with software I wrote, and after all, I'm not a proper programmer...)
The best description I know for floating-point representation came from my ex-senior-engineer.
Imagine a rubber band marked off with 100 intervals. Stretch this over 1m, and each interval measures exactly 1cm. So any whole number of centimetres up to 100cm can be measured exactly.
Now stretch it over 10cm. You can now measure up to 10cm with 0.1cm precision, and whole numbers of centimetres up to 10cm can still be measured exactly. So you've traded off your maximum value for more accurate measurement. But you can't measure 50cm with 0.1cm precision, because there aren't enough intervals on the rubber band; and you still can't measure with 0.01cm precision either.
Now stretch the band over 10m. You can now only measure 10cm increments. So if you place something at the 20cm mark and then move it along slightly, you can't measure exactly where it is - all you know is that it's between 10cm and 20cm, or between 20cm and 30cm. Now you're getting larger values in exchange for less accurate measurement.
And then there's the problem of matching this to real things. Suppose you're measuring blocks that are 5.9cm long. You can measure the first one accurately at 5.9cm, but the next one needs you to stretch the band over 1m and use 1cm accuracy, so two only measure in at 11 cm (because you always round down). And if you assume you can measure two and then halve the result to get the actual size, guess again - the result will be 5.5cm.
So anyone doing exact comparisons with floating-point values is such an insanely incompetent coder that they should be fired for gross negligence. It really is that bad - if you're stupid enough to do it then you shouldn't be in the job, end of story. That this is even an issue is a sad indictment on the state of the code being written by these muppets.
Re the "representing zero accurately" issue, this was the case for some floating-point representations. To get a bit (literally!) more accuracy, they sacrificed exact representation of zero, so you could say "it's very very small and positive" or "it's very very small and negative", but not "it's exactly zero". I believe most floating-point code these days doesn't do this though.
The Floating Point issue is poorly explained everywhere, and I've spoken to CS graduates from good schools who never covered the issue in their coursework. A bigger recipe for disaster I've yet to see.
100% of the references I've come across in which it occurred to the writer to warn of a problem (e.g. Oracle documentation, C++ programming books and so forth) stick with "Floating point is an inexact data type". Sometimes they use "approximate" instead of inexact. The key point being that none of them went the extra two inches and said why.
Which is how certain financial houses in the 90s had their old Cobol suites replaced with gleaming new "C"-based ones, the baby-out-with-the-bathwater crowd (never bothering to ask why the Cobol stuff had all those "Computational" data types defined) wrote all the currency handlers as floating point ops.
Which brings us full circle to "C++ is obviously a bad language".
But we knew that anyway, right?
8o)
In 2002 I became the senior web dev for a well known UK dotcom, and found that the previous developer had implemented a PHP script called "file_browser.php" which was in the root of the publicly available web space. Not only did it let you navigate the file system for the dotcom web app, but the entire file system, or any file that was readable by the web server user anyway!
No login was needed, you just needed to locate the script.
Needless to say that deleting this little gem was the first thing I did in that job :)
just to wade in here, I am not a PHP programmer but I understand their issues. PHP targets multiple architectures but they want the same precision so the 64 bit floating point arithmetic needs to be retargetted for a 32 bit platform, or something like that.
Stuff like sin and e etc. have to be calculated using standard operations on a computer. Taylors expansions provide us with an iterative method for successively improving an approximation to ANY FUNCTION using operations we have defined. So presumably during stuffing 64 bit float arithmatic into a 32 bit arcitecture they had do do something along the lines for applying a taylors expansion. The thing is, its not very efficient.
I wrote a physics engine once in Java and found 79% of my processing time was spent in sin(), because of these iterative processes. I tried replacing with lookup table but this caused aliasing in the simulations, so I had to write my own sin with a controllable error term. http://www.java-gaming.org/index.php/topic,16296.msg130032.html#msg130032 . Bugs are easy, the taylors might not terminate if anything weird happens when trying to meet the desired error in the approximation. Unless of course, you have defensive checks in every iteration of your taylors expansion.
My main point (finally) is that when you are working at this level, you are already annoyed by how slow the taylors expansion is. Presumably every floating point operation is having to go through an annoyingly slow step already, so they probably down want to bog it down even further by coding a taylors defensive checks. Probably to fix that bug properly they will have to slow down floating point arithmetic 10% FOR EVERY FRICKIN FLOAT operation. You though PHP was slow already :/ If they are lucky they might be able to catogarize the bug at a higher level and say if the float is above X then reduce reduce the error term, then its jsut one conditional overhead at the beginning of the operation, but these issues are head scratchers usually. Portability is the issue. Java has a strictfp keyword which is all about this kind of thing.
So there's this weird bug that can cause PHP to cack itself and attempt to refine the accuracy of a number which cannot be refined (sounds like maths hell), leading to PHP consuming 100% of the CPU's time and effectively bringing down the server.
It's pretty easy to point a finger at PHP.
But wait.
An application consuming 100% of the CPU? Surely this says a lot about the scheduling abilities of the server machine which, I'm guessing, is usually some Unix variant. Likewise the near-constant problems of *creating* files in PHP and wondering if you as a user have the right to play with those files. Or the fact that you pretty much point blank cannot chown a file to its rightful owner (Unix ought to permit non-root to chown a file it owns to the same as the owner of the folder the file is in). Need I even mention the requirement to give files weird access rights to get PHP to be able to modify the file? There are ways around this, to run a PHP as "you", but most hosts I've come across don't do this. And, as such, it seems to me that PHP exposes quite a number of fundamental design deficiencies (or perhaps inflexibilities) in the way Un*x works.