Archive for the Maxima Category

To SSE, or not to SSE, that is the question

Posted in binary, decimal, gnumeric, Maxima, Microsoft Excel, multi-precision calculations, Number processing, Number systems, Software tools, Spreadsheet, VBA, xNumbers on February 25, 2014 by giorgiomcmlx

Some time ago, I discovered that in a thirty year old floating-point (FP) library the value of the mathematical expression 1 – x2 was determined by calculating 2·(1-x)-(1-x)2 for 0.5 ≤ x ≤ 1. At a first glance, this method appeared rather peculiar to me, but then I realised that the Lemma of Sterbenz (see, e.g. page 45 of this slide collection on FP calculations) guarantees that 1-x is calculated in FP without any round-off error in the specified range. The multiplication by 2 does not introduce errors as well. Yet, there are still two sources of errors: firstly, in case (1-x) has more than 26 significant bits, (1-x)2 is not exactly representable in FP, and secondly, finally taking the difference might cause a round-off error as well.

In my tests though, this old formula outperformed all other versions on average, and so I shared it with developer friends. Surprisingly, some of them responded with a statement like "The results produced by this formula do not differ from those obtained using (1-x)·(1+x)." In the end, the reason for this paradox could be identified: all my tests were executed on SSE registers, calculations on which strictly adhere to the double-precision floating-point format, whereas the calculations not showing any improvement had harnessed the intermediate extended double-precision format of the x87 stack. Modern x87-compatible FPUs provide both of these two very different kinds of intermediate FP number storage.

The SSE have been invented in favour of fast parallel data processing, whereas the x87 double-extended stack provides both 11 additional mantissa bits and 4 additional exponent bits to intermediate calculations in order to try to maintain an overall accuracy of 53 bits and to reduce the probability of intermediate under-/over-flows as well. The decision whether a piece of software uses the x87 or the SSE instruction sets has to be made up at compile time, because the generated byte code is different. Contrary, both the rounding-mode and the exception-handling properties may be influenced at runtime by changing the so called floating-point control word (FPUCW). I could not find an article on Wikipedia, so please see e.g. this collection of or this introduction to the FPUCW world. I also have come across these imho nice summaries that are written from C-programmers points of view: The pitfalls of verifying floating-point computations as well as SciLab is not naive. Unfortunately, values of the FPUCW having an identical meaning differ across platforms. For example, a value of the MS-FPUCW of hex 8001F corresponds to a GNU-GCC-FPUCW value of hex 37F …  Fortunately, the ability to change the rounding mode during runtime enables programmers to harness interval arithmetic in order to compute reliable FP error estimates of their code for both the x87 and SSE instructions sets. The rounding mode can be chosen from the set UP, DOWN, CHOP, and TO NEAREST – TIES TO EVEN, the latter usually being the default setting. Only if the x87 instruction has been selected at compile time, the precision of the calculations may be changed during runtime as well.

The 11 additional significand bits of the x87 extDP format correspond to roughly 3 decimal figures, and thus are a valuable tool in order to minimise the effects of cancellation in intermediate calculations. The 4 additional exponent bits allow for an intermediate product of up to 16 huge (tiny) doubles without causing an over-(under-)flow. Yet, harnessing the extDP format bears one dirty little secret: when the result of an intermediate calculation is transferred from the x87 stack firstly into an extDP x87 register and finally into the DP main memory, the intermediate value is subject to a double-rounding procedure. In combination with the standard rounding mode "TO NEAREST – TIES TO EVEN" this can lead to a slightly wrong result in two situations: an intermediate calculation rounds up/down to an extDP value representing an exact DP tie that is again rounded up/down to a DP value. The following tables give an example:

decimal number    DP hex    comment
+18446744073709551616.0 43F0000000000000 exact DP Num1
+6143.0 40B7FF0000000000 exact DP Num2
+18446744073709557759.0 Num1 + Num2,
not representable in DP
+18446744073709555712.0 43F0000000000001 DP_lo: nearest DP
+18446744073709559808.0 43F0000000000002 DP_hi: next larger DP
-2047.0 error of DP_lo
+2049.0 error of DP_hi

When adding Num1 to Num2 on the x87 extDP stack, the hex representation of the final DP sum in memory will be 43F0000000000002 instead of the closest DP 43F0000000000001. This can be understood by analysing the binary representation of the numbers involved. In order to clearly show the rounding effects, some separators are used: a "V" indicates the border between any two DP half-bytes, each of which corresponds to a single character in the hex representation, an "X" shows where the DP bits end and the extDP bits begin, and finally a "Y" marks the end of the extDP bit sequence.

Num1 + Num2
same with separators
1st step: round (up) to extDP
2nd step: round DP tie (up) to even

In this example, the addition creates a bit sequence, the mantissa of which exceeds the 64 bits of the extDP format. Thus, in a first step, the result is rounded to 64 bits. These extDP bits happen to represent a so-called tie (a number exactly half-way between the two adjacent DP numbers) of the DP format. In this example, the tie-breaking rule TO EVEN finally causes another rounding up procedure, so that the value stored in memory does not represent the DP number closest to the mathematical result of the summation, but the next larger one. Yet, it is still one of the two DP numbers bracketing the true value.

Only two different bit sequences among the 2^13 possible ones lead to this kind of error:
leadingbitsIXOIIIIIIIIIIYItrailingbits ++
leadingbitsOXIOOOOOOOOOOYOtrailingbits —

so that the odds ratio of this error should be expected to be close to 1/2^12 = 1/4096 ≈ 2.44E-4 . Of course, if the bit sequence of an intermediate result does not exceed the 64 bits of the extDP format, there will be only a single rounding step yielding a correct result. When analysing the results of 5.75E5 renormalisation steps of double-double numbers in VBA7, the sum of which is determined to exceed the 64 bit extDP limit, and that were introduced here at the end of this post, I found 1480 errors and thus an odds ratio of about 2.57E-4. Due to the relatively large sample size, I guess the deviation from the theoretical value is caused by non-randomness of the bit sequences involved.

Double-rounding is unavoidable when using the extDP registers of the x87 FPU, but it only slightly increases the representation error of about 0.5 ULPs by at maximum 0.05&percent;. Moreover, this only happens with an odds ratio of about 1/4096. Thus, this disadvantage of rarely occurring false double rounding is usually more than counterbalanced by the gain of 11 bits of intermediate precision in every calculation. However, there is a special kind of multi-precision software that cannot be run on the x87 stack due to the double rounding, i.e. all the double-double as well as quad-double software that use chained DP variables in order to store multi-precision values (see, e.g. this implementation). The double rounding on the x87 stack can lead to violations of the basic assumption of this kind of software, i.e., that the leading DP value always represents the DP value closest to the multi-precision value stored in a chain of doubles. Therefore, arbitrary precision tools being determined to use the x87 stack, for example Xnumbers, need to harness completely different algorithms for internal number storage. The numerical trick using a double-double representation of π/2 introduced in the aforementioned post works reliably, because the firstly calculated intermediate difference does not involve more than 64 significant bits.

The FP instruction set actually used by third-party numerical software is often unknown and needs to be determined by using test cases. In order to avoid problems with the digits chopping feature of most spreadsheet software, examples need to be created that use exact DP values corresponding to less than 14 significant decimal digits. The previously given example can be reformulated as sum of +1.8446744E+19 ≡ 43EFFFFFFDDAD230 and +73709557759 ≡ 4231296E97FF0000. In order to check the DP result for correctness without any debugging capabilities like those provided by Xnumbers for MS XL, e.g., both DP numbers that were summed up before now need to be successively subtracted from the sum, the bigger one at first, and the smaller one afterwards. It is important to assure that all intermediate differences are rounded to DP, which can most easily be achieved by assigning them to different DP variables, or different cells in a spreadsheet. Contrary to the first addition, the two final subtractions do not introduce any FP error, because they are subject to the lemma of Sterbenz. A final result of -2047 corresponds to calculations that have not been executed on the x87, whereas a value of +2049 indicates the opposite. Here are my PY3- and CAS-Maxima scripts:

Python 3     CAS-Maxima
num1 = +1.8446744E+19 +1.8446744E+19;
num2 = +73709557759.0 +73709557759.0;
sum12 = (num1+num2) %o1+%o2;
sum12sub1 = (sum12-num1) %-%o1;
sum12sub1sub2=(sum12sub1-num2) %-%o2;

The following table summarises the results I found in some spreadsheet and scripting software running on MS WIN7x64prof. On Linux, all SW listed in the table that is available on that OS as well returns -2047.

spreadsheet SW   result        scripting SW   result
GNUmeric 1.12.9 +2049 PY2.7.3×64 -2047
GScalc 11.4.2 -2047 PY3.2.3×64 -2047
KSspread 9.1.0 +2049 R2.15.2×64 -2047
LOcalc -2047 R3.0.0×32 +2049
LT SYM 3.0.1 -2047 R3.0.0×64 -2047
MS XL14/VBA7 +2049 FreeMat 4.2 -2047
SM PM 2012 -2047 Maxima x32 +2049

As you can see from the table, the situation is completely messed up. It seems to me as if on the Win32 platform the x87 with its extDP format was used by default, whereas on the Win64 platform the SSE part of the FPU was involved. Of course, any software can override the default setting, and MS XL64 as well as VBA7 indeed do so. Generally speaking, specifying the FPUCW as well as the kind of FPU during compilation is a very good idea in case of numerical libraries that were developed and optimised for only one of the two scenarios. In case these two setting are omitted and the unknown defaults are used at runtime, small deviations are to be expected only in case the software was designed not to harness the extDP precision of the x87 stack. Otherwise, SSE lacking the additional 11 bits of precision will very likely cause things like polynomial approximations to special functions to fail at runtime. In particular, source code of libraries originally designed for the x87 must not be re-compiled for SSE without closely inspecting the results. Libraries that were built before the advent of the x87 can be a very valuable source of inspiration for changes in the code that are likely to become necessary.


Catastrophic cancellation in action

Posted in Computer, Fortran, Maxima, Microsoft Excel, mpmath, multi-precision calculations, Programming Languages, Python, Software, Software tools, Spreadsheet, VBA, xNumbers on July 20, 2012 by giorgiomcmlx


In order to obtain the reference values needed for checking results of numerical software that use data types like double precision (DP), I like to harness three multiple-precision (MP) tools, i.e., xNumbers, Maxima, and PY-mpmath. If all the MP results agree to the last decimal digit, I take them for granted because it seems very unlikely to me that the mentioned tools are mutual clones of each other. Moreover, even effects caused by routines of an OS can be identified because mpmath as well as Maxima are supported on Linux and Windows as well. When saving the MP results to CSV-formatted files, all the MP numbers need to be preceded by a guard character in order to prevent the spreadsheet software, which will later be used to carry out the comparison, from stealing most of the decimal digits. Unfortunately, even parsers of high-level programming languages fail astonishingly often when converting numerical strings into the values of DP variables. Therefore, I usually save not only DP results, but their bytes in hex notation as well. According to my experience, these hex values provide the only reliable kind of how to represent data meant for exchange between any two different computer environments. It is particularly useful in case of floating-point numbers being much smaller than one, because in decimal notation a really huge amount of decimal figures may be necessary in order to uniquely denote a specific DP value.

The tangent function near π/2

The tangent function has a so-called pole () at π/2, i.e., with x approaching π/2 from below, tan(x) tends to infinity, and with x approaching π/2 from above, tan(x) tends to minus infinity. This behaviour contradicts the maximum-smoothness paradigm frequently used when designing an algorithm for a built-in function. Near π/2, correctly approximating the tangent function can only be achieved by using a model function that has a pole at π/2, too. Though it is not uncommon for built-in functions to use several different polynomial representations over the range they are defined on, the usage of model functions having poles is surprisingly rare.
For example, many people have realised that the tangent functions of MS-Excel&register;™ 2002 and VBA6 as well lack the change of modelling paradigm in their model functions. Consequently, over some interval around the mathematically exact value of π/2, they both yield very in-accurate values for tan(x). In the next figure, the effect is demonstrated for the 49 DP values bracketing the exact value of π/2.

The maximum error is found outside the range of this figure; it occurs at pio2_DP, by which the DP value closest to π/2 is denoted here, and which is larger than π/2 by about 1.22E-17. In case of XL2002, the maximum relative error is found to be about -4.1E-4, in case of VBA6, it is 3.3E-5. Obviously, different model functions are being used for the tangent function in XL2002 and VBA6, resp. The relative errors of the tangent function in XL2002 are about 12.4 times bigger than those in VBA6 and of opposite sign! This means XL2002 can hardly be used to try out an algorithm in a spreadsheet before encoding it in VBA6… interesting…

Some people obviously know of the trigonometric identity tan(x)=1/tan(π/2-x), and have tried to use it in order to obtain a more reliable value of tan(x) in case x is very close to π/2. Applying this trigonometric identity naively to a value of x near π/2 in DP even leads to relative errors that are at least about 5 orders of magnitude bigger than those found in case of the flawed built-in tangent functions are. Thus, defining w_DP = pio2_DP – x_DP and then using 1/tan(w_DP) as a substitute for tan(x) is a very bad idea, as can be seen from the next figure.

From comparisons to results of MP software, it is known that around zero the tangent functions in XL2002 and VBA6 as well yield identical values that moreover are correct to DP precision. This means there must be some odd effect that renders the difference pio2_DP – x meaningless in case x is close to pio2_DP. This effect is easily identified: it is called catastrophic cancellation. π is an irrational number that definitely cannot be represented exactly by the sparse set that all the rational DP values form within the dense set of real values. Mathematicians would say something like “The set comprising all the DP values is of measure zero within the set of real numbers”, by which they mean that the probability any given real number can be exactly represented by a DP value is almost always zero.
Imho, this points out one of FORTRAN’s really big crimes: calling a data type “real” that in fact can only represent rational numbers; moreover, it can represent even only a tiny subset of all the rational numbers. Nevertheless, the set of DP numbers has been cleverly designed in order to cover a range of numerical values that should be sufficient for almost all numerical efforts of humans. Unfortunately, this claim does not hold true for the precision of the DP numbers.

The DP value nearest to π/2, pio2_DP, is found to be 3F F9 21 FB 54 44 2D 18 in big endian DP hex notation, which corresponds to about +1.570796326794896558 in decimal notation. Theoretically, exactly identical problems should occur around pi_DP, which is represented by 40 09 21 FB 54 44 2D 18 or about +3.141592653589793116, resp., because not only the error will double from 6.12E-17 to 1.22E-16, but the distance between adjacent DP number will do that too, as 1 < π/2 < 2 and 2 < π < 4. The next figure provides the position of π in the three floating-point formats SP, DP, and QP. All the differences are given in ULPs, an abbreviation that means “unit in the last place” and thus is a relative measure. The corresponding absolute measures are given in the figure. There is one important detail: in SP, which was the dominant floating point format until about 1985, the value nearest to π was larger than the mathematical value, whereas in DP as well as in QP it is smaller than π. This caused much trouble when code was ported from SP to DP about thirty years ago… and fortunately will cause much less trouble when code is ported from DP to QP in the near future.

It is now clear why the cancellation that occurs when π/2-x is calculated as pio2_DP-x_DP is to be called catastrophic: the closer this difference gets to zero, the more prominent the consequences of the in-accuracy in pio2_DP will be. Fortunately, the cure is easy enough: simply provide all the bits lost in cancellation in order to keep the result accurate to the full precision of 53 bits. This can be achieved by defining the DP variable v_DP, which is meant to represent the DP values of π/2-x, as follows: v_DP = [(pio2_DP – x_DP) + pio2_DPcorr]. Therein, pio2_DPcorr contains the trailing 53 bits that need to be added to pio2_DP in order to yield an approximation to π/2 with a precision of 106 bits. Any bits that are lost due to cancellation in the difference (pio2_DP – x_DP) will be filled in from pio2_DPcorr, thus maintaining a precision of 53 bits under all circumstances. Consequently, 1/tan(v) does not show any systematic errors and should be used to replace tan(x) in the region where it is flawed, as can be seen from the next figures.

According to my experience, the tangent function in XL2010 is still flawed around π/2, but the one in VBA7 is not. Keep in mind that mathematically unnecessary brackets can really matter in Excel, i.e., in case x_DP is very close to pio2_DP, entering something corresponding to “=(pio2_DP – x_DP)” in a cell will produce exact results, whereas “= pio2_DP – x_DP” will yield a large interval filled with zeros around x_DP = pio2_DP.

In general, it is hard to predict whether a software implements a tangent function that is precise around π/2 or not. In order to enable you to check for that, here’s to you the two DP values bracketing π2 together with the correct DP values of the tangent function at these values:

x_lo_DP : 3F F9 21 FB 54 44 2D 18 // +1.570796326794896558
x_hi_DP : 3F F9 21 FB 54 44 2D 19 // +1.570796326794896780
tan(x_lo_DP)_DP : 43 4D 02 96 7C 31 CD B5 // +16331239353195370.0
tan(x_hi_DP)_DP : C3 36 17 A1 54 94 76 7A // -6218431163823738.0

In case you would like to implement your own tangent function using the helper variable v as defined above, you need to subtract your x from the value of x_lo_DP given above; the DP that needs to be added to that result in order to maintain the full precision of 53 bits is:

pio2corr : 3C 91 A6 26 33 14 5C 07 // +6.123233995736766036E-17

The level of precision of such pairs of DP values is often referred to by the term “double-double” (DD). In this context, replacing a single line of code in DP precision by its equivalent in DD made the algorithm work properly. As long as the IEEE QP floating-point data type is not widely available in numerical software, using libraries written in DD may solve many problems that are caused by catastrophic cancellation in 53 bits. Depending on whether the programming language used to implement such a library makes use of the high-precision internal registers of modern CPUs or not, some care must be taken in order to make them work properly. Imho, VBA does not use them, but modern versions of FORTRAN, C, and Python etc. do. In this case, the line v = (pio2_DP – x_DP) + pio2_DP_corr would need to be modified in order to assure that the difference (pio2_DP – x_DP) is rounded down to DP precision before it is used again in the addition part. Thus, it has to be pulled out of the CPU and written to memory, from where it is to be re-read then.

Obviously, the procedure of partial enhancement of precision as described in this post has a variety of applications. One of the most important examples of where it is necessary to be implemented is the so-called argument reduction that is frequently applied in all subjects that deal with periodic processes.