cpp_bin_float

namespace boost{ namespace multiprecision{

enum digit_base_type
{
   digit_base_2 = 2,
   digit_base_10 = 10
};

template <unsigned Digits, digit_base_type base = digit_base_10, class Allocator = void, class Exponent = int, ExponentMin = 0, ExponentMax = 0>
class cpp_bin_float;

typedef number<cpp_bin_float<50> > cpp_bin_float_50;
typedef number<cpp_bin_float<100> > cpp_bin_float_100;

typedef number<backends::cpp_bin_float<24, backends::digit_base_2, void, std::int16_t, -126, 127>, et_off>         cpp_bin_float_single;
typedef number<backends::cpp_bin_float<53, backends::digit_base_2, void, std::int16_t, -1022, 1023>, et_off>       cpp_bin_float_double;
typedef number<backends::cpp_bin_float<64, backends::digit_base_2, void, std::int16_t, -16382, 16383>, et_off>     cpp_bin_float_double_extended;
typedef number<backends::cpp_bin_float<113, backends::digit_base_2, void, std::int16_t, -16382, 16383>, et_off>    cpp_bin_float_quad;
typedef number<backends::cpp_bin_float<237, backends::digit_base_2, void, std::int32_t, -262142, 262143>, et_off>  cpp_bin_float_oct;

}} // namespaces

Class template cpp_bin_float fulfils all of the requirements for a Backend type. Its members and non-member functions are deliberately not documented: these are considered implementation details that are subject to change.

The class takes six template parameters:

Digits: The number of digits precision the type should support. This is normally expressed as base-10 digits, but that can be changed via the second template parameter.
base: An enumerated value (either digit_base_10 or digit_base_2) that indicates whether Digits is base-10 or base-2
Allocator: The allocator used: defaults to type void, meaning all storage is within the class, and no dynamic allocation is performed, but can be set to a standard library allocator if dynamic allocation makes more sense.
Exponent: A signed integer type to use as the type of the exponent - defaults to int.
ExponentMin: The smallest (most negative) permitted exponent, defaults to zero, meaning "define as small as possible given the limitations of the type and our internal requirements".
ExponentMax: The largest (most positive) permitted exponent, defaults to zero, meaning "define as large as possible given the limitations of the type and our internal requirements".

The type of number_category<cpp_bin_float<Args...> >::type is std::integral_constant<int, number_kind_floating_point>.

More information on this type can be found in the tutorial.

Implementation Notes

Internally, an N-bit cpp_bin_float is represented as an N-bit unsigned integer along with an exponent and a sign. The integer part is normalized so that its most significant bit is always 1. The decimal point is assumed to be directly after the most significant bit of the integer part. The special values zero, infinity and NaN all have the integer part set to zero, and the exponent to one of 3 special values above the maximum permitted exponent.

Multiplication is trivial: multiply the two N-bit integer mantissa's to obtain a 2N-bit number, then round and adjust the sign and exponent.

Addition and subtraction proceed similarly - if the exponents are such that there is overlap between the two values, then left shift the larger value to produce a number with between N and 2N bits, then perform integer addition or subtraction, round, and adjust the exponent.

Division proceeds as follows: first scale the numerator by some power of 2 so that integer division will produce either an N-bit or N+1 bit result plus a remainder. If we get an N bit result then the size of twice the remainder compared to the denominator gives us the rounding direction. Otherwise we have one extra bit in the result which we can use to determine rounding (in this case ties occur only if the remainder is zero and the extra bit is a 1).

Square root uses integer square root in a manner analogous to division.

Decimal string to binary conversion proceeds as follows: first parse the digits to produce an integer multiplied by a decimal exponent. Note that we stop parsing digits once we have parsed as many as can possibly effect the result - this stops the integer part growing too large when there are a very large number of input digits provided. At this stage if the decimal exponent is positive then the result is an integer and we can in principle simply multiply by 10^N to get an exact integer result. In practice however, that could produce some very large integers. We also need to be able to divide by 10^N in the event that the exponent is negative. Therefore calculation of the 10^N values plus the multiplication or division are performed using limited precision integer arithmetic, plus an exponent, and a track of the accumulated error. At the end of the calculation we will either be able to round unambiguously, or the error will be such that we can't tell which way to round. In the latter case we simply up the precision and try again until we have an unambiguously rounded result.

Binary to decimal conversion proceeds very similarly to the above, our aim is to calculate mantissa * 2^shift * 10^E where E is the decimal exponent and shift is calculated so that the result is an N bit integer assuming we want N digits printed in the result. As before we use limited precision arithmetic to calculate the result and up the precision as necessary until the result is unambiguously correctly rounded. In addition our initial calculation of the decimal exponent may be out by 1, so we have to correct that and loop as well in the that case.