﻿

Question:

In C#, why does 739.39 + 850 equal 1589.3899999999999 instead of 1589.39?

Answer:

C# uses the EIC 60559:1989 (IEEE 754) standard for floating point numbers. This is widely accepted standard, used by many different languages including, C, C++ and Java.

Outside of computing, a ‘shorthand’ notation called the scientific or exponential format is widely used to represent very large and very small numbers in a small space without loss of accuracy. For example, 0.0000000000000000000123 can be expressed as 1.23 x 10-20 and 123000000000000000000 as 1.23 x 1020. In techie speak, the number to the left of the multiplication sign is called the mantissa and the power of ten the exponent.

IEEE 754 is basically a computer version of this shorthand. The main difference is that, being a computer format, everything is based around binary instead of decimal. In addition to expressing the mantissa and exponent in binary, the number raised to the power of the exponent is 2 (binary) rather than 10 (decimal).

Another difference is that computers have a limited space allocated to store each representation of a floating point number. This introduces limits to the size and accuracy of the number being represented. In C#, there are three built-in representations we can use for floating point numbers:

 Type Number of bits in representation (bytes) Number of bits used by plus / minus sign Number of bits used by mantissa Number of bits used by exponent Range (approximate) Accuracy (approximate) float 32 (4) 1 23 8 ±1.5 × 10−45 to ±3.4 × 1038 7 significant digits double 64 (8) 1 52 11 ±5.0 × 10−324 to ±1.7 × 10308 15-16 significant digits decimal 128 (16) 1 96 31 ±1.0 × 10−28 to ±7.9 × 1028 28-39 significant digits

From the above table, it appears that C# should have no trouble representing 739.39, 850 and 1589.39 using the double type. In fact, this is exactly the case and C# can store these numbers without problem. The trouble occurs when we attempt to add 739.39 and 850 together.

Unfortunately, in order to show this it is first necessary to convert 739.39 & 850 to the IEEE 754 format and in order to do that there are a couple of optimization in the IEEE 754 I need to explain first.

Firstly, in the scientific format the mantissa always starts with a non-zero digit – for example 500 is written as 5 x 102, never 0.5 x 103. Since we are expressing the mantissa in binary and the mantissa can’t start with a 0 it must always start with a 1. If the mantissa always start with a 1, there is no wasting a bit on that digit - which is great because it means we can now squeeze n+1 bits worth of mantissa into n bits.

The second optimization is that dealing with negative exponents is expensive. So IEEE 754 adds a constant value to the exponent in order to ensure that it is never negative. In the case of double, this is 1023.

So, if we apply all the above then our simple addition looks like this:

 Decimal IEEE 754 format (mantissa x 2^(exponent-1023)) Mantissa (52 bits) Exponent (11 bits) 739.39 1.44412109375x29 1.0111000110110001111010111000010100011110101110000101 10000001000 850 1.66015625x29 1.1010100100000000000000000000000000000000000000000000 10000001000 + 1589.39 3.10427734375x29     à 1.552138671875x210 11.0001101010110001111010111000010100011110101110000101 Since IEEE 754 allows only 1 digit to the left of the decimal place, we shift the number right one place and increase the exponent by 1. à 1.10001101010110001111010111000010100011110101110000101 (= 1.552138671875 decimal) But we only have 52 bits in which to store the mantissa, so we truncate the number 52 digits after the decimal point (remember, we have that optimization that means we don’t have to waste a bit recording the 1 to the left of the decimal point) à 1.1000110101011000111101011100001010001111010111000010 (= 1.5521386718749998 decimal) 10000001000     à 10000001001

(Note: in a real addition, the first step is to make the exponents the same so that the mantissas can be added together. It is pure luck that the two numbers we are adding here already have the same exponent and that this step could be omitted).

The truncation of the mantissa to 52 bits changes the representation of the result of our addition from 1.552138671875 x 210 (= 1589.39) to 1.5521386718749998 x 210 (= 1589.389999999999).