| • Science | • People | • Locations | • Timeline |
| Contents | ||
The intent of the revision is to extend the standard where it has become necessary, to tighten up certain areas of the orginal standard which were left undefined, and to merge in IEEE 854 (the radix-independent floating-point standard).
Where stricter definitions are performance-incompatible with some existing implementation, they are placed in a new section, allowing two levels of implementation.
The standard has been under revision since 2000, with a target completion date of December 2005. Participation is open to people with a solid knowledge of floating-point arithmetic. Monthly meetings are held in the San Francisco Bay area. The mailing list reflects ongoing discussions.
The most obvious enhancements to the standard are the addition of 128-bit and decimal formats, and some new operations, however there have been significant clarifications in terminology throughout. This summary highlights the major differences in each major section of the standard. Note that the revision is not yet an approved standard—so all these changes are, in effect, proposals.
The scope has been widened to include decimal formats and arithmetic.
Many of the definitions have been rewritten for clarification and consistency. A few terms have been renamed for clarity (for example, denormalized has been renamed to subnormal).
The specification levels of a floating-point format have been enumerated, to clarify the distinction between
The sets of representatable entities are then explained in detail, showing that they can be treated with the significand being considered either as a fraction or an integer.
The basic binary formats have the ' quad precision' (128-bit) format added.
Three new decimal formats are described, matching the lengths of the binary formats. These give decimal formats with 7, 16, and 34-digit significands, which may be normalized or unnormalized. For maximum range and precision, the formats merge part of the exponent and significand into a combination field, and compress the remainder of the significand using densely packed decimal encoding.
The round-to-nearest, ties away from zero rounding mode has been added (required for decimal operations only).
This section has numerous clarifications (notably in the area of comparisons), several previously recommended operations (quiet copy, negate, abs, and copysign) are now required.
New operations include Fused multiply-add (FMA), classification predicates (isnan(x), etc.), various min and max functions (which allow a total ordering), and two decimal-specific operations (samequantum and quantize).
The min and max operations are defined in such a way that they are commutative (except for the case of two NaNs as inputs). In particular:
min(+0,-0) = min(-0,+0) = -0
max(+0,-0) = max(-0,+0) = +0
In order to support operations such as windowing in which a NaN input should be quietly replaced with one of the end points, min and max are defined to select a number, x, in preference to a quiet NaN:
min(x,NaN) = min(NaN,x) = x
max(x,NaN) = max(NaN,x) = x
In the current draft, these functions are called minnum and maxnum to indicate their preference for a number over a NaN.