# Do you really know how your spreadsheet works?

Many people all over the world use spreadsheet calculation software on their computers. Only few of them, though, seem to be aware what is really happening beneath the GUI. Many effects that seem strange at a first glance can be understood when considering how most recent spreadsheet programs process numbers. It is reasonable to assume that at least the programming language the spreadsheet software is built in uses a number format called Double Precision. This has two important consequences: firstly, such a software (and the computer it is running on as well) represents numbers using the binary numeral system ( wikipedia has a more advanced description), and secondly, only a limited amount of Double Precision (DP) numbers is available to represent any number that is entered into a cell, e.g. The distance between a DP number and its two direct neighbours nearly always equals 2^-52 times the leading power of 2 that occurs in that number. The situation is different for the few numbers exactly equalling a power of 2: because the leading power of 2 changes at these numbers, the distance to their next smaller DP neighbour is half the distance to their next larger DP neighbour. The DP numbers are most dense around zero and most sparse near their maximum, which is about 10^307.

Consequently, software internally using DP numbers can only exactly represent a decimal input if that equals a binary fraction and as well contains a minimum and maximum power of 2 that do not differ by more than 52. For example, 4, 1, 0.5, 0.25, 1.25, and 0.6125 are exactly representable in the binary as well as decimal numeral system. But what about numbers like sqrt(2), 0.9, 0.1 &hell;? They are rounded to a DP number using a standardised algorithm. Thus, there is a double dilemma with the doubles. Firstly, most of the numbers humans want to work with cannot be exactly represented by a DP number but only by what is called a string. Secondly, most of the usual DP numbers a computer can deal with would require at maximum 751 decimal digits to represent them exactly. Fortunately, in case of IEEE 754 DP numbers 17 decimal digits uniquely identify a DP number (18, if false rounding is to be avoided).

Usually, spreadsheet software displays only 15 digits. But then you cannot spot that the result of 0.5-0.4 is rounded to a different DP number than the direct input of 0.1, as demonstrated in a previous post.
The joint consequences of different spacing of DP neighbours as well as rounding to a DP can lead to considerable differences between the results of mathematically equivalent formulae used to calculate a result. The following graph shows the DP differences between the DP results of y1=1-x⋅x and y2=(1-x)⋅(1+x) for 30,000 different values of DP x between (0;1). The absolute value of the difference does not exceed 1.1⋅10^-16, but the smaller the results for y1 as well as y2 get as x approaches 1, the more different DP numbers exist to represent the difference between y1 and y2. The amount of available levels that have a non-zero value doubles at all the x values where y decreases below a power of 2, i.e., at about x≈0.707 (x>1/sqrt(2), y<1/2), x≈0.866 (x>sqrt(3)/2, y<1/4), x≈0.935 (x>sqrt(7/8), y<1/8) … But unfortunately, even for x close to 1, the absolute values of the differences still nearly fill the interval between +/-6⋅10^-17, though the value of y gets very small there. Thus, the accuracy of the results is questionable and moreover, the question arises which of both methods is more reliable. These aspects will be treated in a follow-up post.

Advertisements

# Numerus rerum omnium nodus

This Latin inscription appears on a wall of the Istituto Nazionale di Statistica in Rome. The four words are a short form of a clause Cicero wrote in his Scipio’s dream and they mean “a number is the knot of almost everything” (at least, according to this web page).

Imho, this part of Scipio’s dream has come true nowadays, because many people having no clue of what numbers are as well as of how to properly deal with them can be found using high-tech equipment which usually relies heavily on binary numbers, e.g. These people get frightened by every unexpected result appearing on the display of their computer/smartphone/… Thus, particularly the manufacturers of popular spreadsheet programs have decided to hide nearly all the ugly details of calculations from their customers. But now, the experts get frightened, too, because they have no means to check every single step of a very complex calculation in detail. But of course, where there is a will there is a way.

Consider the “strange” results that can be observed in the following example obtained using the spreadsheet software called Microsoft®™ Excel®™. The first factor of 2^55 is huge, but in the second factor the very simple decimal numbers 0.5, 0.4, and 0.1 are combined in a way that should yield exactly 0 as result: (0.1+(0.4-0.5)) = (0.1+(-0.1)) = (0.1-0.1) = 0, thus yielding an overall result of 2^55*0 = 0. But this does not happen; in fact, the result is found to be 1. Because some decimal figures of the intermediate results are hidden, even the experts can hardly predict this final result.

In order to demystify this effect I’ve written a short but tricky module called Double_2_Hex which provides the functionality of displaying the bitwise internal representation of a number in Microsoft®™ Excel®™ as a hexadecimal string. More precisely, it deals with the data typ ‘Double’ in VBA®™. You may right-click here in order to download the code and use it if you obey to the GPL V3 license. The first column of the table shows the formulae used to calculate the numbers given in the second column. Finally, in the third column the results obtained when applying the function Dbl2Hex to the numbers in second column are provided. Now it is no longer miraculous why the result of 2^55*(0.1+(0.4-0.5)) is 1 and not 0: the magnitude of the internal binary representation of 0.1 is larger than that of 0.5-0.4 by the tiny amount of 1/2^55 which approximately equals 3E-17. But unfortunately, Microsoft®™ Excel®™ shows 15 decimal digits of a number at maximum, so without the function Dbl2Hex there is little chance to understand what is going (wr)on(g) here.

There will be follow-up posts on some other facts concerning doubles as well as other spreadsheet programs.

# What is this???

I hope “this” will develop into a lively virtual conversation about some aspects of how to apply the laws of mathematics and physics to solve problems using computers. In order to address many people, I will try to use open source software in the examples wherever it is possible. I’m interested in the history of science and the ideas of people having pushed it as well.

Because most of my examples will be inspired by geodesy, geophysics or astronomy, resp., I decided to summarise these subjects by the name “Carolo” in the title of this blog. This is meant as a tribute to the great former scientist Johann Carl Friedrich Gauß. In fact, “Carolo” appears as Latin version of his second German Christian name on the title page of his “Disquisitiones arithmeticae”.

Many Europeans and probably wine connoisseurs all around the world will know that “Barolo” is a famous red wine produced in Italy. According to the region of Italy where most of my practical examples will originate from, I should have chosen “Montepulciano” instead, but that does not rhyme on “Carolo”.