# The trouble with rounding floating point numbers

We all know of floating point numbers, so much so that we reach for them each time we write code that does math. But do we ever stop to think what goes on inside that floating point unit and whether we can really trust it? I hate to cast aspersions on its good name but when I hear stories of space craft crashing, inconsistent …

This topic is closed for new posts.
1. #### It's all a matter of scale

As a longtime Fortran programmer (and compiler developer), I was delighted to see this article, which sheds light on a subject that has ensnared many over the years. You are right that some try to "fix" the problem by fudging, and right again that decimal arithmetic (most commonly used in COBOL applications), is a better way. But there is a method available in almost any language that is proven effective - scaling.

The trick here is not to do the computations in dollars or pounds or euros, which entail decimal fractions, but in cents (or whatever is appropriate for your currency.) Scale the input values by 100 so that you are computing in cents. When you are done and want to display the result, divide by 100 and display the result rounded to the nearest .01. You'll never be off.

It is also important to keep in mind that a single-precision float (float in C, real in Fortran, etc.) is typically good to about 7 decimal digits, which means that as values get larger, you start to lose information. It is better to use the double precision datatype which is good to about 15 digits.

COBOL, Fortran and PL/I have built-in features for handling this scaling - in languages which don't, you'll have to do it yourself, but it's easy once you get the idea.

2. #### Real world rounding errors

It's not always a mathematical error - sometimes the problem can be a misunderstanding between two parties on how rounding is to be done. In engineering contracts, the tradition is to do all calculations to four decimal places, only rounding to two when a financial value is displayed. Sounds simple enough but there's still plenty of scope for error. For example if something has a value of £1.2486 and you want to use 10 of them, you could round at the beginning to give £1.25 * 10 = £12.50 or round later to give £12.49. A penny doesn't sound much but for using high quantities of a small value (eg 60000 of an item costing 0.0375 could give an error of £150 by the above method). It gets even worse if you have multiple cost lines and vary on whether the rounding is done at the total or individual lines. Finally, how do you round a .5 value? Mathematically (and I believe in banking), the convention is to round to the nearest even digit so 1.5 gets rounded to 2.0, 2.5 gets rounded to 2.0 and 3.5 gets rounded to four. On average, this should give a lesser error. Interestingly, Microsoft isn't even consistent, Excel rounds upwards ROUND(2.5,0)=3 but Access rounds as described above.

3. #### No rounding required

We don't do these kinds of calculations using IEEE math for this reason. Since the data is in the database, and CPU costs are the same for application or database servers, we just do the math using Oracle's number format, which is exact in storage.

Here is an example of the results from the topic text in PL/SQL:

SQL>SET NUMWIDTH 38 SERVEROUTPUT ON

SQL> DECLARE

2 num1 NUMBER;

3 num2 NUMBER;

4 BEGIN

5 num1 := 0.37;

6 num2 := 0.37;

7 DBMS_OUTPUT.put_line (num1 + num2);

8 num1 := 59.00;

9 num2 := 1.125;

10 DBMS_OUTPUT.put_line (num1 * num2);

11 END;

12 /

.74

66.375

We actually get the results you would expect. 0.37 is stored as:

SQL> select dump(0.37,10) from dual;

DUMP(0.37,10)

-------------------

Typ=2 Len=2: 192,38

That is, two bytes, shown here in base 10 as 192 and 38.

To decode the precision, take the 2nd to last (only one here) numbers and subtract 1:

38 - 1 = 37

To decode the scale, find the difference from 193, and if negative, divide by 100 that many of times:

192 = 193 - 1 so 37 / 100 = 0.37

In case you are wondering, here are the other two values as they are stored:

SQL> select dump(1.125,10) from dual;

DUMP(1.125,10)

------------------------

Typ=2 Len=4: 193,2,13,51

193 (one whole digit), 2-1= 1, 13-1= 12, 50-1=50 -> 1.1250

SQL> select dump(59,10) from dual;

DUMP(59,10)

-------------------

Typ=2 Len=2: 193,60

Thus, Oracle can store "at least" 38 digits of precision using this method in its NUMBER format. No trickery with 1/2's here, exact numbers are actually stored, and in less space.

4. #### Fixed point fractions

What about using fixed point fractions known as the Q-notation?

For example, a Q7 number would be represented on 8 bits where the first bit is -2^0, the second one is 2^-1, the third 2^-2 and so on to 2^-7 in this example.

There, in Q7, -0.4375 is represented as

0b11001000

Operations with Q-notation are well-known and widly used in the DSP world. That includes how to to reliably truncate or approximate.

5. #### Treacherous floating-point

I first discovered the perils of floating point when I was about 15. I was writing a program to fit polynomial curves (y=a+bx+cx^2+... up to order 5) to a set of a dozen experimental data points. We soon discovered that the fit was subtly different depending on what order the data was supplied to the program. By displaying various intermediate results it transpired that you got different answers depending on the order in which floating point numbers were added up! [Polynomial fit code requires adding up numbers which have been raised to powers, giving a good dynamic range(!)]

At the time I then devised many simple sums which would give obviously wrong answers on many operating systems, languages, and pocket calculators. Unfortunately I can't recall the examples now.

Ever since (18+ years) I've always regarded floating-point as dangerous, and use integer/fixed-point arithmetic unless there's a very good reason not to. At least it forces you to think carefully about any rounding. And despite floating-point co-processors, integer still runs faster too.

6. #### Interval arithmetics

I know the next article will be about decimal arithmetics, but what people probably should start using is interval arithmetics where the error is explicit in the difference between an upper and a lower bound of the value.

http://www.python.org/dev/peps/pep-0327/

8. #### Why don't we use BCD?

As I recall from my digital design class BCD was invented to take care of just this sort of problem. I can't believe this is even an issue!

9. #### A sign of the times?

At the risk of provoking the wrath of the legions of bright, young Java/Ruby/C#/[insert name of latest fad language] programmers in the audience, I'd like to point out that I learnt about the perils and pitfalls of the limited precision of floating-point numbers when I was a rookie FORTRAN programmer 25 years ago.

Back then, programming textbooks clearly spelled out the difference in precision between "real" (i.e. 32-bit) and "double" (64-bit) floating-point numbers, warned about the kind of rounding errors described by Dan Clarke, and offered sound advice on when one should use integers.

My favourite FORTRAN textbook (Munro's "FORTRAN 77") specifically warned against using any kind of floating-point numbers to represent currency amounts in financial applications.

Programmers had a greater appreciation of the hardware in those days, largely because there was such a variety. The Intel monoculture was far in the future, and each manufacturer had its own internal representation of floating-point, none of them compliant with IEEE 754. When you ported a program from one machine to another, you had to take into account, for example, that single-precision floating point arithmetic on an IBM VM/370 system was only good for about six decimal digits of precision.

Programmers seem to be less aware of such issues today, and more trusting that the CPU will always give them "the correct answer".

Then their Java enterprise application rounds a number in a way they hadn't expected, and their company's accounts are unexpectedly short a couple of hundred million dollars.

Maybe we ancient FORTRAN programmers can still teach them a trick or two ;-)

10. #### why does nobody use BCD arithmetic ?

The IEEE floating point format beeing limited in precision is a well known problem.

if you need absolute precision : use BCD format . Every cpu in the world that is not half braindead can perform BCD calculations in hardware. And on the occasion that your RISC cpu can not do it in hardware there are software libraries to do it.

a number is stored in a compressed string. 2 digits per byte. ( one nibble per number )

0000 to 1001 represent numbers 0 to 9 ( direct bit translation )

1111 represents the E ( exponent

1110 represents the . ( decimal point )

1101 represents the - (sign bit)

1101 0001 : 0xD1

1001 1110 : 0x9E

0000 0001 : 0x01

1111 1101 : 0xFD

1001 1001 : 0x99

1001 1001 : 0x99

1001 1001 : 0x99

would represent -19.01E-999999 with only 7 bytes of storage.

Try stuffing that one in IEEE format.

conversion from text to BCD and back is a snap and can be done with maybe 30 lines of assembly code .

Such a simple well known system that is even implemented in hardware on most computers. Yet virtually no compilers support it.

11. #### It gets worse

Back in 1952, when I was learning arithmetic in an English school, I learned that for numbers exactly at N.5, if N was even, one rounded down, and if odd, one rounded up. That's not the way we do it here in the States, however; every N.5 is rounded up. You can imagine what that does to a long series of such numbers.

Round 'em up, head em out...

C.

12. #### Financial Institutions do not use Float or Double

Financial and Insurance applications are not written using float in the first place. In fact the reference you make to risks has nothing to do with floating point numbers, but the concept of a float which is a sum of money. These applications are writting using decimal based arithmetic which is precise. In languages such as COBOL, RPG and PL/I, you specify the number of digits and the representation (Zoned Decimal = 1 digit per byte, or BCD = 2 digits per byte). Very specific rules are provided for overflow, underflow and roundoff and enforced by the langauage runtime. While I commend you for the idea of the article (most programmers know little about numeric programming) a bit more basic research would have make your exmaples more realistic.

13. #### Another real world example.....

I did some work on an allocated pension calculator that required a tweaker (ashamed to say so, but it worked). I had to calculate how old a person would be on their retirement date. If they retired on their birthday the old code didn't show them as being a year older - hence the tweaker. Here is the offending javascript:

DOB = new Date(y,m,d);

RET = new Date(ry,rm,rd);

DIF = (RET.getTime()) - (DOB.getTime())

AGE = DIF / (1000 * 60 * 60 * 24 * 365.25);

AGE_Attained = Math.floor(((Math.round(AGE*1000))/1000) +.001);

14. #### Numerical analysis, anyone?

There's a field called "numerical analysis," which allows one to derive rigorous error bounds for floating point computations in many algorithms. While it's true that floating point is inappropriate for dollars and cents calculations, floating point arithmetic is sufficient for many scientific (heat transfer, fluid dynamics, etc.) and financial calculations (e.g. pricing derivative securities, econometric modeling), while integer arithmetic fails because of scaling problems.

In short I wouldn't do book keeping in floating point, but I wouldn't simulate a rocket with fixed point or integer arithmetic.

15. #### Use strings

Well this proves what we have all known for sometime.

That programming languages are and have been flawed flawed beasties for as far back as we know.

So as shocking as it may seem, use strings.

99% of data transfer and manipulation in application other than scientific, requires almost no need for typing.

We dont need to save space anymore its fine to store an int as a string and in fact you probably should if there are no calculations done on the int in its life because you will spend more effort on type conversion to transfer the value from database to UI than you need to.

Well if you used strings, this "float" problem wouldnt exist either because it is a rather simple matter to round a decimal number if its a string.

So shall we forget CPU 1.0 and just do what needs to be done? Write a decent library that does it with strings and your done.

I used to work as a test manager for a large market research software company. This problem was known to us more than 15 years ago, as it would manifest itself in statistical tables. The anal-retentives who produced the tables (as opposed to those of us who made and tested the software) wanted certain tables to add up to exactly 100.0%. Due to this rounding error, occasionally tables would add up to 99.9% or 100.1%. The customers would then complain and no amount of gentle computing education on our part would satisfy them. The solution that we came up with was to have the program automatically force the tables to add up to the nearest tenth or hundredth of a percent through randomly choosing one response and adding (or substracting) one tenth or one hundredth of a percent. Even though this was arbitrary, once the tables added up to exactly 100.0% (or 100.00%), the customers were happy.

17. #### Basic arithmetic

A pretty straightforward method is to apply grade school arithmetic. If you need precision to the hundreds, add 0.005 and take the sum to the hundreds place. If you need precision to the thousands, add 0.0005, and so on.

Since you were paying attention, you just asked, "but what if 0.000...5 is really 0.000...48779 or some such number that does not cause an overflow?" Simple, multiple the float times 10^n where n is the number of decimal places you want to keep and add 0.5. The integer part is your solution * 10^n

18. #### rounding / how computers do math

on diycalculator.com, on the More Cool Stuff-page there is a very good intro (Rounding Algorithms 101) on rounding and on numbers.

that's part of the diycalculator-project from a really cool book "how computers do math" from Maxfield & Brown.

very entertaining for novices as well (their site is also a treasurechest of info).

19. #### You missed a common bug

Unless I overlooked it, you missed a common bug that often isn't catched before software is released.

When casting from float to integer, C/C++ and most other languages will truncate. So 3.80 becomes 3

Everybody knows you need to add 0.5 before casting but still this is a very common bug I often see...

20. #### Interval arithmetic

It should be noted that whilst all the above problems are due to floating point (FP) arithmetic, they can be alleviated by using interval arithmetic (IA).

IA defines an upper and lower bound which is representable upon the architecture in question; the "true" value (the value which represents the number if it had an infinite number of decimal places to represent it) lies between these. Thus, if we only had one decimal place on our architecture Pi (3.14159..) would be represented as [3.1, 3.2]. From this we can determine how accurate a calculation is by the width of the interval. The smaller the interval, the more accurate the result.

IA is especially useful for calculations involving many iterations, as should they be computed solely using FP arithmetic, only a single value is returned with little idea as to its accuracy.

For a gentle introduction to IA, see:

http://www.americanscientist.org/template/AssetDetail/assetid/28331

--Eoin.

21. #### Banks uses integer arithmetic ...

... or they should.

The only FP routines in all the mass of code run by the bank I used to work for was in computing mortgage repayments.

22. #### I blame the Greeks.

I have always felt that someone, way back when, made a mistake when using their fingers to count the number of cows(camels/sheep/wives) they owned. They should have ignored their thumbs....

Base 8 is a far more sensible option. (1_2_3_4_5_6_7_10_11_..17_20_21..27 so on.) You can divide most base muliples by halves way back down to 1. That is 64 / 2 / 2 /... / 2 = 1. You have to deal with decimals eventually of course, but they always turn out to be fractions of base. Of course we eventually run into the same problems of infinitely repeating decimals and logically indescribeable fractions, but not with such common fractions as 1/3....

The romans came close with base 12 (time, and the gregorian calendar) but it never really stuck, And its almost as bad as base 10.

So thank you Aristotle, or whoever it was who chose base 10. Unfortunately, I think the chances of the world converting now are a little unlikely... :)

23. #### Sensible people use rational numbers

There is a way to do arbitary precision arithematic without any fear of rounding error - represent your numbers as rational numbers, or fractions in every day speak.

Assume that any input data is accurate to the best of your ability to measure (sampling theorists, please note)

First, represent integers as linked lists of integer at a bit length the hardware supports directly, say 32 bits.

Second, represent any non-integer as a fraction using two of those integers. This fraction is probably top-heavy/improper/whatever.

As all your arithematic is now integer based there is no possiblity of rounding error. Multiplication isn't too bad but addition has to go via common denominators and so on.

You can make arbitarily accurate aproximations to e, pi, root 2 and any other real number by summing a series until you get bored.

This system is used by Mathematica and other symbolic manipulation systems. They also leave pi as just 'pi', so don't have to lose precision there either.

24. #### Base 10 isn't perfect either

The problem above isn't restricted to base 2 representations. All those who've suggested using base 10 need only pause for a moment to consider how they'd represent 1/3 in BCD to see the error of their ways. Those of us born in Springfield who count in base 3 (owing to that being the number of digits on one of our hands) can smugly claim an exact representation: 0.1. But then we do have a bit of a problem with 1/2..

--Dave

25. #### Number bases

OK, Base 10 has problems too - just not as many recurring decimals as Base 2.

But, iirc - yes, I am that old - the Babylonians had it about right with base 60.

26. #### FORTRAN

Interesting comment from David Harper - my first programming course was in FORTRAN (Australian National University Research School Physics, iirc, on a Univac) and you really needed to learn good practice to be any good.

But when I was working in a govt economics bureau I remember horrible problems with FORTRAN integer arithmetic - including someone (with a Physics PhD) who processed occasional character records labelling road names as numeric data. Nothing is so good that someone can't misuse it and FORTRAN doesn't (IME) suffer fools gladly.

But interesting that it is still around (AIRI the first non-MS language for Windows NT was FORTRAN) and used for what it is designed for, it's pretty good. And, of course, people who use FORTRAN tend to be proper engineers, with all that implies. Probably why "ancient FORTRAN programmers can still teach... a trick or two ;-)".

27. #### Automatic assumption of accuracy?

It has been assumed here that all CPUs are equally good at calculating floating point decimals to the accuracy we'd expect. However, if you have a faulty CPU it will not be able to add up correctly anyway. The first Pentium bug was as readers will recall was floating point problem.

Since the CPU itself cannot be trusted, the best solution for checking the accuracy of the CPU that I have found is using the Mersenne Prime code. Not only does this work as a constant test but you can use it to find memory and CPU problems. It costs nothing in terms of processing as the code takes the place of the "idle" process. (Sadly there are people who think that CPUs do not run all the time, which means they do not understand the Von Neumann architecture).

Details at www.mersenne.org.

28. #### not quite....

Its not really true that CPUs 'run' all the time... processor load does not just reflect how much useful work a processor is doing, but in many ways how active the processor really is. So your idle process is actually letting your CPU take little breaks here and there. You may notice that temperature of your processor actually changes depending on the load on the machine... your processor will consume more power at high loads in general.

29. #### Article from Computing Surveys

Of possible interest:

David Goldberg, "What every computer scientist should know about floating-point arithmetic", ACM Computing Surveys (CSUR), Volume 23 , Issue 1 (March 1991), pp. 5 - 48.

30. #### What - there are hotter instructions?

The anonymous poster above is one who clearly doesn't understand how PCs work. The "system idle" process is simply a "no operation" instruction being run by the CPU. It has not stopped processing at all. It's simply processing a "do nothing" instruction over and over again (which is a total waste of electricity). Only the F00F bug (or similar) actually stops the processor.

It's an important part of testing that all the instructions produce approximately the same amount of heat. Were there really instructions that made the CPU hotter then I could write a virus to set a computer on fire (some of the old commodore pets had a bug in their CPUs that allowed you to do something like this).

I've been running this mersenne prime code for over four years across numerous systems. I do not observe the CPUs are any hotter than when this code is not running. The only processor I have observed as being hotter was in fact a faulty one. The actual fault was only detected by using the software as it passed all other diagnostics. I only realised that it was running too hot when I saw how much cooler the replacement ran.

31. #### Yes, there _are_ hotter instructions

The "system idle" doesn't use a NOOP instruction, it uses the HLT instruction. That _does_ stop processing, at least until an interrupt is received.

If you were to measure the actual watts used by the process you would see a difference. The fact that you don't see a temperature difference just means that the cooling in your system is working.

Eric, you have stated that the processor uses the HLT instruction, not the NOOP instruction. That's an instruction that it still has to process.

So what does it do after that? The next instruction of course.

And after that? The next instruction. And so on.

Were it to actually stop, how would it start again? This is not a marathon runner that has to stop for drinks every now and then, it's a CPU! The only processors I've ever seen stopping have been those affected by the F00F bug or faulty ones. I haven't even mentioned the processing of interrupts, which of course would be impossible if the CPU had stopped for a tea break.

First you said that the processor would get hotter, then you say it doesn't get hotter because your cooling is working. Which is it to be?

I suggest studying Computer Science would be most beneficial.

33. #### No...

Actually, I studied electrical/computer engineering instead.

Modern CPUs do more (less) than just looping during a HLT instruction. For example, some actually stop the clock signal from being distributed through the processor. Some change the clock multiplier to slow the clock down. But it is definately not 'looping' in its normal execution state.

Even ignoring a CPUs ability to shut itself down before real thermal damage is done, your virus won't work simply because there are 'hotter' instructions. I do not see how that would follow. A well designed CPU should tolerate continuous execution of any of its instructions.

What I am saying is that a CPUs temperature and its current draw (and thus power) are definately related to how much time is spent doing real work vs halting. And btw, an interrupt is what usually brings a processor out of the halt state, so thats covered as well.

So, under your system, how do you account for the increased temperature of a CPU when it is working at 90% as opposed to 10%? What processors are you talking about? Is this what they are teaching in computer science??

- Anon. poster

34. #### Don't feed the trolls

Especially when they change their story from one post to another.

35. #### Rounding in the real world, some analysis

Having discovered this by chance I think I have to clear up thing a little...

We have to distinguish between calculations in commerce and finances on one side and numerical calculations on the other.

Financial calculations should indeed be done with decimal digits, using numbers with a defined amount of digits right to the decimal point. Such numbers are available in languages like COBOL, RPG, PL/I, which are designed for programming commercial applications. Then the amount of digits is always under control of the programmer and rounding can happen as law requests.

Since commercial applications were and are an important use of computers all of the classical mainframes do contain support (instructions etc.) for decimal arithmetic. Also the early microprocessors contain instructions to use one byte as two bcd-encoded digits.

Yet the 'modern languages' like C, Pascal, Java do not have any type of decimal numbers. And since modern RISC microprocessors where designed to run C programms fast these processors do not have any support for decimal arithmetic. (statistics of programms written in C do not show any use of decimal arithmetic!)

There is only one exception: HP-Precicion Architecture supports decimal numbers, as these processors were designed to run commercial applications developed for the HP3000 family of computers. The Itanium, though designed to run large commercial databases, does not support decimals.

And this is not the end: When AMD designed the 64-bit extensions of the Intel x86 architecture they needed some more opcodes. So they had to remove some instructions from the original set. And what did they take? The decimal instructions! (with others). Just see http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24592.pdf at page 85.

Decimal arithmetic does not become impossible without these instructions, as there are very clever tricks to do it with binary arithmetic plus a lot of logical instructions (bit fiddling). Yet one as to know about this and if the compiler does not support this subroutines have to be written and called. All this does not make things easy and fast....

Then we have to see that languages like COBOL are totally 'uncool' today in education, so nearly no student knows about them and about using decimal numbers.

Let's now look at numerical computations. The fact that these are affected by rounding errors is well known since the beginning of using computers.

Since the 60-ties the Institute of Applied Mathmatics at Karlsruhe University here in Germany, headed by Prof. Ulrich Kulisch, did research and development to solve the problems of rounding errors. The main tools are interval arithmetic and the precise dot product. I was part of these efforts for many years, so I can speak as an insider. If anybody gets curious now: just search for "Ulrich Kulisch" in google.

We developed various extensions to programming languages to make use of this extended arithmetic easy (no need for assembler programming!), and also a lot of algorithms for doing computations with garanteed precise results were developed (e.g. solving of linear equations). All this is available for free (http://www.rz.uni-karlsruhe.de/~iam/html/language/xsc-sprachen.html).

Yet there is one main drawback: These calculations are slower than ordinary floating point. And soon we found that people prefer to compute wrong or at least questionable numbers fast over computing precise numbers slowly. So all these efforts are known to a few insiders only, but are not in wide use.

At present the problems of performance could be solved, as processors do now have transistors almost for free. Last year I discovered that modern processors by Intel and AMD contain nearly everything to implement interval arithmetic. With just a little clever control (a few thousand transistors, I guess) it could be implemented. For details see http://www.springerlink.com/content/e0kg1t22pmh75825/fulltext.pdf

Yet: why do we not have interval arithmetic in any processor? It seems to be a henn and egg problem. Interval arithmetic is considered slow and expensive. So those who know about it do not like to use it. Therefore it is not requested by the market. And since there seems to be no market there are no implementations.

It is very difficult to proof that something failed or crashed because of imprecise calculations. When we developed all those great tools for precise calculations we searched for some kind of "killer application", yet could not find one.

Somehow the real world seems to be fault tolerant against rounding errors in floating point computations.

I apologize for typing errors and bad english.

Reinhard Kirchner, Univ. of Kaiserslautern, Germany

kirchner@informatik.uni-kl.de

This topic is closed for new posts.

Biting the hand that feeds IT © 1998–2022