Wednesday, October 17, 2012

Making a Double Out of Parts in VB.NET

Recently I picked up on a piece of code I had started several months ago and didn't have time to finish.  I was a bit stumped actually, though I've got it pretty well beat into shape now (maybe more on that later).  I was making a VB.NET type Fraction using BigIntegers as the base type.  As such, the class would theoretically be able to represent any rational number.  But what if I wanted to convert that result to a double?  So, I started into the code necessary to do the job.  I began with the easy part, the cases where the numerator and denominator can both be converted to doubles.  In that case, .NET already knows how to do the needed conversions.  But if the numerator or denominator is too large to be converted to a double, the result of division (mathematically) may still be within the range of a double.  For example,

4.445 × 10400 / 4.222 × 10399 = 10.528185693983894

An important finishing piece to solving this programming problem is the following function:



First, for a good summary of the double format, see Double-precision floating-point format (Wikipedia).

Code Walkthrough

The double type has 64 bits as does the Int64 type.  We begin with initializing our result variable to 0.  Depending on whether the number is indicated as positive or not, we set the sign bit.  The sign bit (bit 63, the most significant bit) needs to be set (1) to indicate a negative number.  For simplicity of use, I wanted to let the caller work in terms of the exponent (base 2) rather than with a biased result.  The 1023 "excess" or bias is one of those values you don't want to have to remember.  This function obviates the need to be overly conscious of that value.

After making our biased exponent (biasedExp), we check that it is in range.  We have exactly 11 bits to work with.  If our result is greater than 11 bits (0x7FF in hex, 11111111111 in binary), then we complain to the caller with an OverflowException.  If the result is negative, then it still does not fit into the 11 bits and we complain with an OverflowException in this case as well.  The mantissa is required to be no more than 52 bits.  That's 13 "nibbles" (4 bits, a single hex digit) or 6 1/2 bytes.  If the mantissa has any bits set in the upper 12 bits, we complain.  By calling the variable mantissa I have already declared my draconian intent with this function.  I will not coddle the caller by shifting things that are in the wrong place to the right place.  If the caller sends a mantissa which is too large, they may have done the shifting wrong or they may have incorrectly included the implicit 1 at the beginning of the "binary scientific notation."  Either way, the caller has made an error and we cannot know which one it is.  If we could know which error they made, it might make sense to correct it silently and move on.  However, a wrong assumption here could be somebody's program's undoing.  Therefore, we just complain and let the caller figure out what to do about it.

The biased exponent is now shifted into place—this, like the 1023 excess, falls into the category of "painful detail we want to encapsulate so we can forget about it."  (This is not ambiguous like the too-large-mantissa.)  In our final steps we "Or" the biased exponent and mantissa into place and use a call to BitConverter.Int64BitsToDouble() to ensure we have the proper format.

Room For Improvement

To make this function more robust, an exception class could be defined, either several different ones (one for each error) or a class with properties and enumerations to allow the caller to efficiently test for the particular condition that caused the "overflow".  I note also that I have used OverflowException where an underflow has occurred.  In program debugging, an accurate error message ("over" versus "under") can provide a clue to the person debugging the program.

No comments: