Archive for September, 2011

Intermezzo: how to escape the DP limitations?

Posted in Computer, gnumeric, Microsoft Excel, mpmath, multi-precision calculations, Programming Languages, Python, Software, Software tools, Spreadsheet, VBA, xNumbers on September 14, 2011 by giorgiomcmlx

In general, it is at least a very complicated task to prove that an algorithm is working as intended over the whole range of possible input parameters. When dealing with mathematical problems, it seems to be a good idea to check whether results obtained using a higher level of accuracy agree with those obtained using the standard level. Fortunately, for all major programming languages there are multi-precision libraries as well as automated conversion utilities that transform the source code automatically. The situation seems different in case of spreadsheets, though imho they are used for relatively complicated calculations as well nowadays. Unfortunately, there is much mindless advice on the web like “check the accuracy using addition theorems” or so. However, in case you hopefully read the previous posts on DP numbers (or even more professional matter), you know that you cannot test the accuracy of a numerical expression like 1 – x^2 in DP by adding x^2 and checking whether the result equals 1 in DP, e.g. Thus, there definitely is need of multiple-precision calculations in spreadsheets.

I use three spreadsheet environments, two of which are fundamentally different: two combinations of Microsoft®™ (MS) Excel®™ (XL) under MS Windows®™ (Win) as well as gnumeric (GNM) under Ubuntu Linux (UBL). Fortunately, multi-precision extensions are available more or less easily relying on free and open source software (FOSS) only. In case of XL, the software of my choice is a Visual Basic®™ for Applications (VBA) XL add-in called xNumbers (XN) (old version). XL is shipped with an Integrated Development Environment (IDE) for VBA. In case of GNM, I have decided to use its built-in Python plug-in and write a function suite myself that utilises the multi-precision capabilities of the FOSS PY module mpmath. I have not decided on an IDE for big python projects so far; the proof of concept function library provided here is so simple that I could write it using an editor capable of syntax highlighting. I wanted to figure out whether it is feasible to provide PY based cell formulae to GNM, which offer exactly the same functionality as their XN pendants in XL and thus do not require any change whatsoever in the XL file when it is opened with GNM in UBL. Of course, this GNM extension is totally independent of XL and works with the standard GNM file format as well.

The GNM PY plug-in under UBL requires two files to be located in a (subdirectory of a) specific folder, i.e. “/usr/lib/gnumeric/1.10.1/plugins” in case of the current GNM version 1.10.1x, in order to fully specify a user-defined function (UDF). Unfortunately, this folder belongs to the user “root” in UBL. However, very surprisingly there is no user “root” in a fresh UBL installation.

!!! DANGER – DO NOT TRY THIS AT HOME !!!
If this warning failed to prevent you from reading on, be warned: “root” can be medicine and poison as well … At least, if you are not a subject matter expert in computer security and secure internet router configuration, at first do all the necessary downloads and installations, then plug off your network cable or switch off your mobile network equipment, and only after that try to become the “root” user of UBL.

A search on the web immediately yielded the cure for the UBL-“root” problem: type >>sudo passwd<< in a terminal window, then twice enter an identical and very secure password, and voila, the user “root” is automatically created. Now exit the terminal, re-start and then login as “other”–>”root”. Now you can add a subfolder to the “plugins” directory, which in my case is called “fn-drgst” (GNM will automatically check all subfolders in the “plugins” folder). In this folder, at least two files have to be created: the module file that contains the executable PY code, which in my case is “drgsts_pyfuncs.py”, and an xml file having the mandatory name “plugin.xml” that describes the PY file. All the details of the naming conventions are explained in the section writing your own python xxx on the GNM website. Any PY UDF is referred to three times: firstly in the actual function definition line in PY file, secondly in a line belonging to a named function group in the PY file describing the kinds and numbers of parameters that all functions take, and thirdly in a line in the XML file.

For example, the multi-precision function which performs the addition of two real numbers in xN is named xADD and is invoked by typing “=xADD(C5;C6;30)”, e.g., into an XL cell. In this case, the result in XL will be the sum of the two cells C5 and C6 with an accuracy of 30 decimal digits. Of course, it cannot be an XL number but in fact is a string. Typically, C5 and C6 will therefore contain strings like >>’1.23456789012345678901234567891<<, too. My PY code defines exactly such a function and reads:

def xADD(myStr1,myStr2,myStr3):
# sum of the numbers contained in myStr1 and myStr2,
# i.e. myStr1 + myStr2, with an accuracy of myStr3 decimal places
from mpmath import mp
dps_int = int(myStr3);mp.dps = dps_int;mp.pretty = True
return str(mp.mpf(myStr1) + mp.mpf(myStr2))

Fortunately, it turned out that this function might be called using cell references instead of number strings as well. In the function-group statement, the corresponding line reads:

drgstspyfuncs_functions = {

'xADD': ('sss', 'myStr1,myStr2,myStr3', xADD),

}

It tells to the system that the function “xADD” takes three strings as parameters (‘sss’), how these parameters are named (‘myStr1,myStr2,myStr3’), and provides the internal (xADD) as well as external (‘xADD’) names of the function. In the XML file, the line

<plugin id="Gnumeric_drgstspyfuncs">

introduces the function group defined in the PY file, then the line

<attribute name="module_name" value="drgsts_pyfuncs" />

states the name of the PY module file that contains the aforementioned function group, and finally the line

<function name="xADD" />

tells to the system the external name of any function that is to appear in the GNM function GUI. Unfortunately, a bug in the GNM PY plug-in prevents the formal description of a function from appearing in the GUI (this is why I have omitted such a formal description like a PY “docstring” in the code). The number and kind of parameters are stated correctly in the GNM GUI, though.

As a proof of concept I have created 30 PY functions which correspond to those 28 functions in xN that are available in two or even three different kinds of accuracy, i.e. DP, quadruple precision, and multiple precision, and the two constants pi and e as well. You may download the code of the two defining *.py as well as *.xml files found as text in a PDF file and a spreadsheet in XL 97-2003 file format demonstrating all the functions and providing their results obtained with all my different spreadsheet environments. All software written by myself is licensed under the GPL v3 that is found in both files, too.

Mind the gap when using IEEE754 DP numbers

Posted in binary, Computer, Mathematics, Microsoft Excel, Number processing, Number systems, Programming Languages, Software, Spreadsheet, Uncategorized, VBA on September 11, 2011 by giorgiomcmlx

The gap between two adjacent IEEE754 DP numbers is more or less proportional to the magnitude of these numbers. The overwhelming majority of DP numbers do not exactly equal a power of 2. In this case, the gaps between such a DP number and its direct neighbours on both the smaller as well as the bigger sides have a magnitude of about 1.11E-16 times the smallest power of 2 that is larger than the number. For example, consider the DP number 1.5 (hex 3F F8 00 00 00 00 00 00). The smallest power of 2 which is bigger than 1.5 is 2^1=2. Thus, the biggest DP number being smaller than 1.5 is about 1.5 – 2⋅1.11E-16 = 1,499999999999999778 (exactly hex 3F F7 FF FF FF FF FF FF), and the smallest DP number being bigger than 1.5 is about 1.5 + 2⋅1.11E-16 = 1,500000000000000222 (exactly hex 3F F8 00 00 00 00 00 01).

At all exact powers of 2 the gap between adjacent DP numbers doubles in size. This means the question “does x equal 1.0” will yield the answer “true” for any real x in the range 1.0-5.55E-17 <= x < 1.0+1.11E-16, if rounding to the nearest is used in order to convert an arbitrary real number x into an IEEE754 DP number. Please note that in this special case, the interval limits are located asymmetrically around 1.0! That happens for all exact powers of 2. The size of the gap between two adjacent DP numbers in the interval from 1 to 2, i.e. 2.22E-16, is called machine epsilon.

The following figures show the discretisation effect occurring when using DP numbers. In particular, the region around 1 is explored using cell formulae in the spreadsheet application Microsoft®™ Excel®™ 2002 as well as using the VBA 6®™ built-in programming language. The FOSS add-in xNumbers is used to show how the results look like if the calculations are carried out with considerably higher numerical accuracy, i.e., 30 instead of the 17 decimal places in case of the IEEE754 DP numbers. The vertical scaling had to be shifted downwards by 1 in the figures, because XL cannot deal with axis limits of 1+/-2E-15, for example. The discretisation effect occurs in the first step of all shown calculations, i.e., when the relatively small quantity x is added to 1, and is thus not influenced by the final shift process.

machine epsilon in MS Excel and VBA

There is a remarkable difference between the results obtained using XL directly on the one hand and those obtained by invoking the programming language VBA 6 from XL on the other hand. MS seems to have implemented some “politically correct mathematics” into XL here in order to maximise the interval for which the answer to “does x equal 1” yields true… The results of the XL cell formulae are clearly not IEEE754 compliant.

details of machine epsilon in MS Excel and VBA

DP numbers are most dense around 0. Neglecting the special concept of “denormalised numbers”, the absolute value of the smallest numbers different from 0 is 2^(-1022), which is about 2.225E-308. The absolute value of the biggest DP number is 2^1023+2^1022+…+2^972+2^971 (hex
7F EF FF FF FF FF FF FF), which nearly equals 1.7976931348623157E+308. The adjacent DP number on its lower side, i.e. 1.7976931348623155E+308, is smaller by an amount of 2^971, which is about 2.00E+292. In the biggest dyad of the DP number system, the gap of 2.00E+292 between two adjacent DP numbers is huge in absolute terms, but still small in relative terms: 2.00E+292/1,79769E+308 = 1.11E-16, just as it was to be expected.

I have “translated” a routine from FORTRAN into a VBA array function that gives you all relevant information about the numerical characteristics of your computer. You may download the VBA6 code and use it if you follow the GPL v3 (included).

The size of the DP gap is usually small even in absolute terms for all numbers humans deal with in their every day or business life. However, it should be kept in mind that the DP gap around a number is roughly proportional to its magnitude. For example, in the range between 2^52 and 2^53, which equals about 4.5036E+15 to 9.0072E+15, only the integers (gap is 2^0=1) are exactly representable by DP numbers. In the next bigger dyad from 9.0072E+15 to 1.80144E+16, only the even integers (gap is 2^1=2) are exactly representable by DP numbers, and so on. Of course, this is only a consequence of limiting the mantissa of IEEE DP numbers 53 bits or roughly 17 significant decimal digits.

Even in science and engineering, numbers are very rarely known with an accuracy of more than 10 significant decimal digits. In fact, even most scientific measurements typically yield an accuracy of less than 5 significant decimal digits. This is why usually virtually nobody is bothered by neither the discreteness of the set of DP numbers in the continuum of the real numbers nor their limited accuracy of about 17 decimal digits. Consequently, people are quite surprised if even very simple calculations fail due to an effect called catastrophic cancellation. This will be the subject of a follow-up post.