## How to calculate 1-x·x in floating-point

Some routines that calculate elliptic integrals show poor overall accuracy in case the elliptic modulus gets close to one, even if they have no pole there. It turned out that this effect is not caused by a numerical instability of the implemented core algorithm, but by a numerically unstable transformation used for calculating the complementary elliptic modulus right at the beginning of these routines. Let x denote the elliptic modulus here, then the complementary modulus y is given by y = sqrt( 1 – x · x ), wherein sqrt denotes the mathematical square-root function. Because IEEE recommendations guarantee an accurate sqrt function, the reason for the accuracy problems had to be found in how the mathematical expression 1 – x·x had been implemented in floating-point arithmetic.

In a first attempt, I tried to use the mathematically equivalent form of y1_DP := 1 – x·x that results from applying the third binomial theorem, which reads y2_DP := ( 1_DP – x_DP ) · ( 1_DP + x_DP ). It turned out that this version is much more accurate than the naive implementation in the regime x_DP ≥ 1/sqrt(2). Quite recently, I happened to come across a different implementation in an old Fortran library that had been designed in the 1980ies and seemed rather strange to me at a first glance, i.e., y3_DP := ( 2_DP * ( 1_DP – x_DP ) ) – ( ( 1_DP – x_DP ) * ( 1_DP – x_DP ) ). This implementation turned out to yield the smallest errors of all three implementations mentioned so far in case abs(x_DP) ≥ 0.5 . Moreover, it yields accurate results for more than 85% of the (tested) values and limits the rel. abs. error for all (tested) values to a maximum of 1 ULP.

This implementation is most easily derived from y2_DP: because for any 0.5 ≤ x_DP ≤ 1 the subtraction 1 – x_DP is accurate, the first multiplicand of y2_DP is always accurate for x_DP ≥ 0.5; so the task is to express 1 + x_DP as a function of 1 – x_DP. Obviously, we have 1 + x = 2 – ( 1 – x ), which yields y4_DP := ( 1 – x_DP ) *( 2 – ( 1 – x_DP ) ). Finally, y3_DP is obtained by applying the distributive law of algebra, which does not hold for floating-point arithmetic and can thus lead to different results (you might want to take the burden to read through Goldberg’s paper).

The following two graphs show the abs. rel. errors obtained for y1_DP, y2_DP, and y3_DP accumulated over samples of 1025 pseudo-random numbers x_DP each. If the accumulated error is found to be larger than 1025, the maximum error for a single x_DP is definitely larger than 1 ULP. On the other hand, if the individual maximum error is known to be limited by 1 ULP, hundred times one minus the value of the accumulated error divided by 1025 gives the percentage of accurate values per sample.

A follow-up post will address the question of how the usage of different floating-point units, e.g. x87 and SSE, influences these results.