## Decimal significant figures of IEEE754 DP numbers

The amount of decimal figures used to show a binary DP number in decimal floating-point format varies among the spreadsheet programs as well as programming languages I use. For example, the following table shows the default amount of decimal figures used when displaying the square root of two in decimal notation. The abbreviations are explained at the end of this post.

sqrt(2): square root of two ( programming languge || spreadsheet software )

30: 1.41421356237309504880168872421b0 (MXM fpprec:30;sqrt(2b0);)

18: 1.41421356237309515 (F77RM for DOS as of 1992)

17: 1.4142135623730951 (C, F77OW, F99/95G, JS, PY || GNM )

16: 1.414213562373095 (MXM ev(sqrt(2),numer);)

15: 1.41421356237310 ( || MSXL, LOC)

12: 1.41421356237 (F95S)

Obviously, there is a contradiction between the numbers shown in the first and second lines, resp. The reason for that is Maxima not using the IEEE 754 DP number format. Thus, the first line shows the mathematically correct value of sqrt(2) rounded to 30 decimal significant figures (SFs), whereas all the other lines show rounded values of the very decimal number that represents the DP approximation to sqrt(2). Both the decimal and binary fractions corresponding to the exact value of sqrt(2) have infinitely many SFs, because it is an irrational number. According to the IEEE754 specification, the binary DP representation of sqrt(2) comprises at maximum 53 binary SFs. Thus, it is a rational number in the binary system. All DP numbers correspond to rational decimal numbers. This is easily seen when writing the sum of at maximum 53 different powers of two (Po2) as a product of a factor times the Po2 corresponding to the 53rd bit. If that Po2 is non-negative, a product of two integers is obtained; if it is negative, the result is a fraction of two integers. Either form describes a rational decimal number. The remaining question to answer is how many decimal SFs this number has.

If all the Po2 values of a DP number are non-negative, i.e. the leading hidden bit corresponds to a value of Po2 ≥ 52, we might expect a scaling with a factor of log10(2) = 0.30103, because that is the position of the leading decimal figure in decimal columnar number notation. The trailing figure is always found on position 1, because its value cycles through the sequence 1&rarrow;cycle(2&rarrow;4&rarrow;8&rarrow;6&rarrow;). If all the Po2 values are negative, i.e. the leading hidden bit corresponds to a Po2 < 0, the position of the leading decimal figure scales like -log10(2), and the position of the trailing decimal figure like -1, so that the number of SFs in this case can be expected to scale like -1 – (-0.30103) = -0.69897. The following figures show the practically determined amount of decimal SFs of DP numbers for two different values of the amount of binary SFs, i.e. 1 and 53, resp.

A simple linear regression yields the following relations between the leading Po2 (LPo2) of a DP number and the amount of its decimal SFs:

pure power of two, only hidden bit set

SF_01Po2(LPo2≥0) = + 0.301023(31) ⋅ LPo2 + 0.504(18)

SF_01Po2(LPo2≤0) = - 0.698923(31) ⋅ LPo2 + 0.516(19)

```
```

`full 53-bit resolution`

SF_53Po2(LPo2≥52) = + 0.301034(33) ⋅ LPo2 + 0.496(20)

SF_53Po2(LPo2≤52) = - 0.698963(28) ⋅ LPo2 + 52.504(16)

Of course, the fractional values resulting from these calculations have to be rounded to an integer value and two has to be added for safety (if I will find the time I will plot the confidence intervals of these regression lines…). The values of scaling factors found in case of 53-bit resolution agree closely with our expectations. The smallest normalised DP numbers (corresponding to 53 binary SFs) can be correctly displayed in the decimal system only if the really huge number of 767 (yes, seven hundred and sixty seven) decimal places are used! If fixed point notation is to be used, the amount of decimal places even increases up to a value of 1022 + 52 + 2 + 1 = 1077 columns inclusive sign noted in the first and leading zero and decimal separator noted in the second and third columns, resp.

The graph giving the details of the relations near LPo2 = 0 shows that the minimum amount of decimal SFs needed to exactly display a 53-bit DP number is 16. Thus, displaying at maximum only 15 decimal figures like many popular spreadsheet software do prevents any 53-bit DP number from being displayed correctly in the decimal system! On the other hand, because the deviation between an arbitrary decimal fraction and the value of its DP approximation on average starts on the 16-th decimal SF, it does not make much sense to use more than 18 decimal SFs when displaying the final result of calculations performed with DP numbers.

However, if some intermediate results seem strange and are to be further analysed, extended-precision frameworks like xNumbers or ArPrec are mandatory in order to deal with the huge amounts of digits that are likely to occur. Alternatively, you may decide to sit down together with your children under the Christmas tree and write add-with-carry as well as subtract-with-borrow codes that can handle up to 1500 decimal figures. If you manage to obey the KISS principle during coding, your work will be highly portable to any programming language, but that’s another story.

List of abbreviations with download links, if applicable.

CAS: computer algebra system

FOSS: free and open source software

MXM: Maxima, a FOSS CAS

C: name of a programming language

gcc: gnu C compiler

F: abbreviation for FORTRAN, the name of a programming language

F77OW: OpenWatcam Fortran 77

F77RM: Ryan-McFarland Fortran 77 for DOS as of 1987, malware????legal????

F99/95G: the Fortran 90/95/99/… front end for the GNU C compiler

F95S: Salford® FTN95 compiler

F99I: Intel® Fortran compiler for Linux

JS: JavaScript(MS: JScript), name of a scripting language targeting at web browsers

PY: Python, name of a programming language

GNM: gnumeric spreadsheet software

LOC: LibreOffice Calc spreadsheet software

MSXL: Microsoft® Excel®

## Leave a Reply