Archive for June, 2014

Oops, XL did it again

Posted in Computer, Microsoft Excel, multi-precision calculations, Software, Software tools, Spreadsheet, xNumbers on June 22, 2014 by giorgiomcmlx

I really do love software that adheres to the "Law of Least Astonishment (LoLA)" (please see §4.1). Unfortunately, XL definitely does not belong to this group. Quite recently, I came across a violation of the LoLA by XL that is caused by the inconsistent manner of how functions evaluate numerical cells. MS have stated relatively clearly that the binary double-precision floating-point format (DP) is used internally in order to store numerical cell values. On the other hand, in the model-view-controller (MVC) of XL, i.e. a cell, a decimal string comprising at maximum 15 digits is used to display the numerical value of a cell to humans. To my great astonishment, it has turned out that different functions belonging to the same category in the official list of functions of XL2013 may operate on these two very different representations of numerical values. Of course, this can have severe consequences, as a precision of 15 decimal digits roughly corresponds to a precision of only 50 bits, whereas the 53 bits of precision of the IEEE754 DP format roughly correspond to 16 decimal digits.

I suppose virtually nobody regularly working with numbers will expect a cell formula like "= (D5 - FLOOR.PRECISE(D5;1))" to return a negative value in case the value of D5 is positive. In the official description of this function MS states: "Returns a number that is rounded down to the nearest integer or to the nearest multiple of significance. Regardless of the sign of the number, the number is rounded down." Moreover, the function FLOOR.PRECISE is listed in the "math and trigonometry" section of the aforementioned list, so that IMHO people will expect this function to be as accurate as the functions SQR and SIN are, for example. Yet, look what happens if the value of D5 is close to but still smaller than a positive integer. I’ll demonstrate that now in case of the integer 12 (twelve), which in DP big-endian byte order equals 40 28 00 00 00 00 00 00.

The hex representation of a numerical cell value as well as its equivalent exact decimal string were obtained by harnessing the commands "= Dbl2Hex(CellRef)" and "= DCStr(CellRef;100)", resp., which require an appropriate version of the FOSS add-in xNumbers to be installed in XL.

The largest DP value that is smaller than 12, but for which the XL2013x64 functions FLOOR, FLOOR.MATH, FLOOR.PRECISE, INT, ROUNDDOWN, and TRUNC return the mathematically correct result of 11 (eleven), has a hex notation of 40 27 FF FF FF FF FF E3. For all the other 28 DP values from 40 27 FF FF FF FF FF E4 up to 40 27 FF FF FF FF FF FF, the wrong value of 12 (twelve) is returned by all of these functions. Exactly for these 28 values the results of the formula "= (CellRef - FLOOR.PRECISE(CellRef;1))" turn out to be negative: about -4.97E-14 is returned by XL for the input corresponding to 40 27 FF FF FF FF FF E4, and about -1.776E-15 for 40 27 FF FF FF FF FF FF. If the mathematically unnecessary outer "magic parentheses" are omitted in the formula, XL tries to hide the dirty little secrets of its floating-point core from you by artificially zeroing the results for all values from 40 27 FF FF FF FF FF F9 to 40 27 FF FF FF FF FF FF. IMHO, both of these undocumented effects clearly exclude XL from the set of numerical software that might be considered for any serious computation (VBA is different!).

The observed results could be most easily explained by assuming that all these functions do not use the binary DP representation of the cell content in order to calculate their results, but operate on the numerical string comprising 15 decimal digits in the MVC instead. At least we have:
40 27 FF FF FF FF FF E3 hex =
XL: 1.19999999999999E+01
xN: 1.199999999999994848565…E+01
40 27 FF FF FF FF FF E4 hex =
XL: 1.20000000000000E+01
xN: 1.19999999999999502620…E+01
From the given xN values it can be seen that XL2013x64 correctly rounds the binary DP values to strings comprising 15 decimal digits, and that the functions in doubt obviously use this decimal string representation in order to calculate their result. Exactly the same kind of inaccurate results is delivered by the functions that are meant to round up, i.e. CEILING, CEILING.MATH, CEILING.PRECISE (and even ISO.CEILING), and ROUNDUP. In this example, all the 28 results corresponding to input values from 40 28 00 00 00 00 00 01 to 40 28 00 00 00 00 00 1C are flawed. Moreover, the functions ROUND and MROUND are also subject to this kind of flaw if their arguments are close to values at which their result should change.

On the other hand, the functions EVEN, MOD, ODD, and QUOTIENT deliver results that accurately account for the least bit of their arguments. Thus, in order to define an accurate version of the mathematical floor function, you might want to use = QUOTIENT(CellRef;1) + IF(MOD(CellRef;1)=0;0;(SIGN(CellRef)-1)/2), whereas for an accurate version of the mathematical ceiling function, you might want to use = QUOTIENT(CellRef;1) + 1 + IF(MOD(CellRef;1)=0;-1;(SIGN(CellRef)-1)/2). Or, you break free and look out for another numerically much more comprehensive programming language than VBA is, but that can be connected to XL relatively easy. At least, that is what I shall do from now on.

Advertisements

The decimal data sub-type in XL-VBA

Posted in Computer, Microsoft Excel, Programming Languages, Software, Spreadsheet, VBA on June 9, 2014 by giorgiomcmlx

In VBA, the most straightforward way to add some extra precision to calculations with doubles is to try to harness the sub-type decimal of the data type variant. According to the VBA language specification provided by MS quite recently, a variable of sub-type decimal comprises a numerator of 96 bits (or 12 octets) of precision that encode an unsigned integer in binary format, and a denominator comprising 8 bits (or 1 octet) that encode the power of ten by which the numerator has to be divided in order to yield the mathematical value that is represented by the variable. MS further state that there also is a sign bit, and that a decimal is "A rational number represented in a 14 byte data structure". I started to doubt this statement when reading this information on the decimal sub-type (in German). There, a decimal is said to be a data structure of 16 bytes, and moreover, a rather peculiar order of the 12 octets of the numerator is
mentioned. Consequently, I started to check out the details by myself.

For that, in my VBA7.1 I declared a call to the kernel function RtlMoveMemory that can be accessed via the system library kernel32.dll as follows:
Private Declare PtrSafe Sub MoveMem Lib "kernel32" Alias "RtlMoveMemory" (PointerToMemory As Any, PointerToBuffer As Any, ByVal AmountOfOctets As Long)

In order to call this function properly from within a VBA UDF, an array of Bytes of desired length has to be declared. Finally, it will contain an exact copy of the block of computer memory that starts at address PointerToMemory and comprises AmountOfOctets octets. For that, the VBA UDF called HexDump, which is of return type String, is defined like this:
Function HexDump$(ByVal PointerToBufferStart As LongLong, ByVal myOctets As Long)
In HexDump, the amount of octets to be returned is rounded up to the next equal or larger multiple of 16 and
assigned to the variable NumOctets. Then an array of type Byte and of length NumOctets is declared:
ReDim BytArr(1 To NumOctets) As Byte .
When calling MoveMem, the pointer to the start of BytArr in memory is obtained implicitly by simply submitting the reference to its first element:
Call MoveMem(BytArr(1), ByVal PointerToBufferStart, ByVal NumOctets).
The rest of MoveMem is only concerned with creating a proper string from all the elements of BytArr in a form that is easily perceivable by humans.

I then created some decimals according to the following scheme:
Function some_decimals$()

Dim …, d18, d19, d20, d21, d22
d18 = CDec("1234567890123456789012345")
d19 = CDec("123456789012345678901234567")
d20 = CDec("12345678901234567890123456789")
d21 = CDec("1.2345678901234567890123456789")
d22 = CDec("-1.2345678901234567890123456789")
some_decimals = HexDump(VarPtr(d22), 528&)

Quite surprisingly for me at least, the position of the variables in memory is found to be in reversed order of declaration, which means the pointer to the variable declared least has to be submitted to HexDump in order to get back a block of memory that contains all the variables declared. The relevant part of the string returned by HexDump, which encodes the values of all octets in hexadecimal notation, reads as follows:

0E 00 1C 80 32 1B E4 27 15 81 39 6E B1 C9 BE 46
00 00 00 00 00 00 00 00 0E 00 1C 00 32 1B E4 27
15 81 39 6E B1 C9 BE 46 00 00 00 00 00 00 00 00
0E 00 00 00 32 1B E4 27 15 81 39 6E B1 C9 BE 46
00 00 00 00 00 00 00 00 0E 00 00 00 FD 1E 66 00
87 4B 9F 2C A8 F2 58 F1 00 00 00 00 00 00 00 00
0E 00 00 00 6E 05 01 00 79 DF E2 3D 44 A6 36 0F

As I shall show in detail, the first 16 octets indeed encode the value of d22 as a decimal correctly. Yet, the effective amount of memory claimed by each of consecutively declared decimals is neither 14 nor 16 octets, but equals 24 octets, at least on my Win7x64 XL15x64 PC. Consequently, in order to catch the entire block of 22 decimals, an amount of at least 21*24+16 = 520 octets is needed, which is then internally rounded up to 33*16 = 528 octets.

In the German resource mentioned above, it is claimed that the 16 octets of a decimal start with an identifier octet having a value of 14 dec = 0E hex, and that the following octet remains unused. The 3rd octet is said to encode the power of 10 of the denominator (here: 28 dec = 1C hex or 0 dec = 0 hex), and the 4th octet is claimed to encode a positive sign by a value of 0, and a negative one by 128 dec = 80 hex. As can easily be seen from the list given above, all of these statements hold true. Therefore, the trailing 12 octets are needed to encode the 96 bits of the numerator. In order to understand the values returned by HexDump, I harnessed xNumbers for converting “12345678901234567890123456789” into a binary string. This string is found to comprise 94 bits, so that it is presented here on three lines each of which gives 4 octets, starting with the most significant one. We have:

OOIO-OIII IIIO-OIOO OOOI-IOII OOII-OOIO
2-7 E-4 1-B 3-2
OIOO-OIIO IOII-IIIO IIOO-IOOI IOII-OOOI
4-6 B-E C-9 B-1
OIIO-IIIO OOII-IOOI IOOO-OOOI OOOI-OIOI
6-E 3-9 8-1 1-5

so that the 94 bits in big endian byte order and in hexadecimal notation read:
27 E4 1B 32 46 BE C9 B1 6E 39 81 15
which in little endian byte order reads:
15 81 39 6E B1 C9 BE 46 32 1B E4 27 .
When this sequence is compared to that returned by HexDump, the peculiar byte order by which variable values of sub-type decimal store the numerator in computer memory becomes obvious:
32 1B E4 27 15 81 39 6E B1 C9 BE 46
The first block in memory comprises the 4 most significant consecutive octets (green), and the second one the 8 least significant (blue) of them, which is just the opposite of what is said in the mentioned German resource.

When comparing the 16 net octets of the values assigned to d20, d21, and d22, we have:

d22 = CDec("-1.2345678901234567890123456789")
d22 = 0E 00 1C 80 32 1B E4 27 15 81 39 6E B1 C9 BE 46
d21 = CDec("+1.2345678901234567890123456789")
d21 = 0E 00 1C 00 32 1B E4 27 15 81 39 6E B1 C9 BE 46
d20 = CDec( "+12345678901234567890123456789")
d20 = 0E 00 00 00 32 1B E4 27 15 81 39 6E B1 C9 BE 46

This means the power of 10 of the denominator, which equals 28 dec = 1C hex for both d22 and d21, and 0 dec/hex for d20, and that is encoded in the 3rd octet, can also be interpreted as the position of where to insert the decimal separator when counting positions of the unsigned numerator from right to left. Finally, there is a sign bit, but effectively, a decimal uses a sign octet.

Summarising, variables of sub-type decimal encode their values very in-efficiently. On the other hand, they allow for exact calculations with decimal fractions, which makes them the optimal data type in the area of finance. Due to the fact that only the basic algebraic operators support decimals, any function that is to be evaluated needs to be implemented from scratch using only the four basic operators +,-,·, and /, or calling other functions supporting decimals that have already been implemented. Not every double can be converted directly into a decimal due to either the limited range of the latter of at maximum ±(2^96-1) = ±7.9228162514264337593543950335E28 or their minimum resolution of 1E-28 or a combination of both effects. Moreover, only doubles having no more than 29 significant decimal figures can be exactly casted into decimals. When scaling doubles in order to make them fit the limits of decimals, only factors equalling integral powers of 2 should be used in order to leave the floating-point mantissa unchanged. This will be demonstrated in a forthcoming post.