Science  People  Locations  Timeline
Index: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Home > IEEE 754r


 Contents
IEEE 754r is an ongoing revision to the IEEE 754 floating point standard.

The intent of the revision is to extend the standard where it has become necessary, to tighten up certain areas of the orginal standard which were left undefined, and to merge in IEEE 854 (the radix-independent floating-point standard).

Where stricter definitions are performance-incompatible with some existing implementation, they are placed in a new section, allowing two levels of implementation.

1 Revision process

The standard has been under revision since 2000, with a target completion date of December 2005. Participation is open to people with a solid knowledge of floating-point arithmetic. Monthly meetings are held in the San Francisco Bay area. The mailing list reflects ongoing discussions.

2 Summary of the revision

The most obvious enhancements to the standard are the addition of 128-bit and decimal formats, and some new operations, however there have been significant clarifications in terminology throughout. This summary highlights the major differences in each major section of the standard. Note that the revision is not yet an approved standard—so all these changes are, in effect, proposals.

2.1 Scope

The scope has been widened to include decimal formats and arithmetic.

2.2 Definitions

Many of the definitions have been rewritten for clarification and consistency. A few terms have been renamed for clarity (for example, denormalized has been renamed to subnormal).

2.3 Formats

The specification levels of a floating-point format have been enumerated, to clarify the distinction between

  1. the theoretical real numbers (a number line)
  2. the entities which can be represented in the format (a finite set of numbers, together with −0, infinities, and NaN)
  3. the particular representations of the entities: sign-exponent-significand, etc.
  4. the bit-pattern (encoding) used.

The sets of representatable entities are then explained in detail, showing that they can be treated with the significand being considered either as a fraction or an integer.

The basic binary formats have the ' quad precision' (128-bit) format added.

Three new decimal formats are described, matching the lengths of the binary formats. These give decimal formats with 7, 16, and 34-digit significands, which may be normalized or unnormalized. For maximum range and precision, the formats merge part of the exponent and significand into a combination field, and compress the remainder of the significand using densely packed decimal encoding.

2.4 Rounding

The round-to-nearest, ties away from zero rounding mode has been added (required for decimal operations only).

2.5 Operations

This section has numerous clarifications (notably in the area of comparisons), several previously recommended operations (quiet copy, negate, abs, and copysign) are now required.

New operations include Fused multiply-add (FMA), classification predicates (isnan(x), etc.), various min and max functions (which allow a total ordering), and two decimal-specific operations (samequantum and quantize).

2.5.1 min and max

The min and max operations are defined in such a way that they are commutative (except for the case of two NaNs as inputs). In particular:

In order to support operations such as windowing in which a NaN input should be quietly replaced with one of the end points, min and max are defined to select a number, x, in preference to a quiet NaN:

In the current draft, these functions are called minnum and maxnum to indicate their preference for a number over a NaN.



Read more »

Non User