Science  People  Locations  Timeline
Index: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Home > Floating point


 Contents
Computer arithmetic

A floating-point number is a digital representation for a number in a certain subset of the rational numbers, and is often used to approximate an arbitrary real number on a computer. In particular, it represents an integer or fixed-point number (the significand or, informally, the mantissa) multiplied by a base (usually 2 in computers) to some integer power (the exponent). When the base is 2, it is the binary analog of scientific notation (in base 10).

A floating-point calculation is an arithmetic calculation done with floating-point numbers and often involves some approximation or rounding because the result of an operation may not be exactly representable.

A floating-point number a can be represented by two numbers m and e, such that a = m × be. In any such system we pick a base b (called the base of numeration, also the radix) and a precision p (how many digits to store). m (which is called the significand or, informally, mantissa) is a p digit number of the form ±d.ddd...ddd (each digit being an integer between 0 and b−1 inclusive). If the leading digit of m is non-zero then the number is said to be normalized. Some descriptions use a separate sign bit (s, which represents −1 or +1) and require m to be positive. e is called the exponent.

This scheme allows a large range of magnitudes to be represented within a given size of field, which is not possible in a fixed-point notation.

As an example, a floating-point number with four decimal digits (b = 10, p = 4) and an exponent range of ±4 could be used to represent 43210, 4.321, or 0.0004321, but would not have enough precision to represent 432.123 and 43212.3 (which would have to be rounded to 432.1 and 43210). Of course, in practice, the number of digits is usually larger than four.

In addition, floating-point representations often include the special values +∞, −∞ (positive and negative infinity), and NaN ('Not a Number'). Infinities are used when results are too large to be represented, and NaNs indicate an invalid operation or undefined result.

1 Usage in computing

While in the examples above the numbers are represented in the decimal system (that is the base of numeration, b = 10), computers usually do so in the binaryThe binary or base-two numeral system is a system for representing numbers in which a radix of two is used; that is, each digit in a binary numeral may have either of two different values. Typically, the symbols 0 and 1 are used to represent binary number system, which means that b = 2. In computers, floating-point numbers are sized by the number of bitsThis article is about the unit of information, see Bit (disambiguation) for other meanings. A bit (abbreviated b is the most basic information unit used in computing and information theory. A single bit (short for b inary dig it is a zero or a one, or a t used to store them. This size is usually 32 bits or 64 bits, often called "single-precision" and "double-precision". A few machines offer larger sizes; Intel FPUs such as the Intel 8087The 8087 was the first math coprocessor designed by Intel and it was built to be paired with the Intel 8088 and 8086 microprocessors. The purpose of the 8087, the first of the x87 family, was to speed up computations on demanding applications involving fl (and its descendants integrated into the x86x86 or Intel 80x86 is the generic name of a microprocessor architecture first developed and manufactured by Intel. The architecture is called x86 because Intel used to give the earliest processors in this family numeric brand names ending in the sequence architecture) offer 80 bit floating point numbers for intermediate results, and several systems offer 128 bit floating-point, generally implemented in software.



Read more »

Non User