Particularly in case of embedded systems, memory usage and calculation speed really matter even nowadays. Thus, if it can be proven in advance that only some steps of a long algorithm need to be carried out on a higher level of accuracy than the rest of the code, a method relying on the so-called Veltkamp-Dekker split (VDS) algorithm for floating-point numbers may be used in these very steps.

#### The Veltkamp – Dekker split of floating-point numbers

The idea behind the VDS algorithm is to divide the mantissa figures of a floating-point number numb (nearly) equally into those of the two numbers numb_hi and numb_lo of the same kind without overlap. Let mfigs denote the mathematical maximum of figures of the mantissa (or more exact, the significand), and let base denote the base of the number system in use. Then, the steps of the VDS algorithm can be described in pseudo code as follows:

1. expo = mfigs - Floor( mfigs / 2 )

2. fact = base ^ expo + 1

3. help = FLPTmult( fact ; numb )

4. numb_hi = FLPTsubt( help ; FLPTsubt( help ; numb ) )

5. numb_lo = FLPTsubt( numb ; numb_hi ) )

In this code, "Floor" means rounding down to the nearest integer, "FLPTmult ( a ; b )" means calculating the product of a and b to exactly the same precision that both a and be have, and "FLPTsubt ( a ; b )" means subtracting b from a on exactly the level of precision that both a and be have. Mixed-precision calculations will be described later.

In case of IEEE754-SP(DP) numbers, base equals 2, and mfigs takes a value of 24(53) inclusive the hidden leading bit, thus yielding a value of expo of 12(27). The first of the two splitted numbers, i.e. numb_hi, carries the leading 12(26) mantissa bits of the original number numb, and the second one, i.e. numb_lo, carries its trailing 12(27) bits. In other words, any reasonable value of expo determines the minimum amount of trailing empty bits of numb_hi.

numb_hi always inherits its sign from numb. Because the sign of numb_lo may be negative, the exponent of numb_hi may be larger than that of numb is. Consequently, the number of significant bits is not preserved either.

The following examples use SP numbers in order to demonstrate important characteristics to the VDS algorithm. The decimal representation of these numbers is calculated utilising an arbitrary precision library in order to enable displaying all significant decimal figures.

numbSP := sum over n from -21 to 2 of 2^n

= +7.999999523162841796875

numbSP has 24 significant bits by construction. The VDS algorithm yields the following results when applied to this number (both SP hex and bin representations are in little endian order):

numbSP = 40 FF FF FF

= O // IOOOOOOI // IIIIIIII-IIIIIIII-IIIIIIII

= + 7.999999523162841796875

```
```numbSP_hi = 41 00 00 00

= O // IOOOOOIO // IOOOOOOO-OOOOOOOO-OOOOOOOO

= + 8.0

`numbSP_lo = B5 00 00 00 `

= I // OIIOIOIO // IOOOOOOO-OOOOOOOO-OOOOOOOO

= - 0.000000476837158203125

Contrary to the hex notation, the binary notation given here does not directly correspond to the byte values in memory but separates the sign bit from the exponent bits and shows the hidden leading mantissa bit as well. Double slashes were inserted between all the three parts of any SP number in order to increase readability. As can be easily seen, numbSP has 24 significant bits, whereas both numbSP_hi and numbSP_lo have only 1.

The number Pi now serves as an example in case the decimal is not exactly representable by a FP number:

numb := Pi ≈ +3.14159265358979323846…

numbSP = 40 49 0F DB

= O // IOOOOOOO // IIOOIOOI-OOOOIIII-IIOIIOII

= + 3.1415927410125732421875

```
```numbSP_hi = 40 49 10 00

= O // IOOOOOOO // IIOOIOOI-OOOIOOOO-OOOOOOOO

= + 3.1416015625

`numbSP_lo = B7 14 00 00 `

= I // OIIOIIIO // IOOIOIOO-OOOOOOOO-OOOOOOOO

= - 0.0000088214874267578125

From comparing the bit pattern of numbSP to that of numbSP_hi, it can be seen that using 12 as value of expo removes exactly the 12 trailing bits of numbSP. A closer look reveals that these bits are not simply cleared but that the value of numbSP_hi is the value of numbSP rounded to that smaller level of binary accuracy. Like in the previous example, the sum of the significant bits in both numbSP_hi and numbSP_lo is 18 and thus much lower than 24, which is the number of significant bits of numbSP.

The main purpose of such a VDS split is to enable calculations on a higher level of accuracy than provided by the data format actually used. Using any of the numerical data formats provided by a programming language, all elementary mathematical operations take two numbers as input and yield one number as output. The trick harnessed in order to achieve a higher level of accuracy is to apply the VDS method to any input number before it is processed. Thus, all the elementary mathematical operations now take four numbers as input and yield a pair of two numbers as output. This additive pair of numbers exactly represents the algebraic result of the elementary operation. The high part of these two output numbers is the closest approximation to the exact result that is possible with only one FP number. The low part gives the correction that needs to be added to the high part in order to get the exact result.

"Exact" here means correct to a higher level of floating-point accuracy; of course, it is still impossible to exactly represent Pi at any level of FP accuracy, for example. The routines for adding and multiplying two numbers that were previously subject to the VDS method will be described in a following post.

#### References

Dekker’s paper is written in English and can be accessed online. You may want to click on the Union Jack to switch the CMS output to the English language.

This archive is one of Germany’s scientific treasuries! You may access many of the original works of Carl Friedrich Gauß, Leonhard Euler, David Hilbert, and so on. Of course, many of the documents will be written in German or even in Latin… In case of hand-written medieval papers, even the notation of formulae often is quite different from that used nowadays…