Floating Point/Floating Point Formats
Floating-Point Formats
[edit | edit source]There are 4 different formats of floating point number representation in the IEEE 754 standard:
- Single-Precision
- Double-Precision
- Single, Extended-Precision
- Double, Extended-Precision
Single-Precision
[edit | edit source]Single precision floating point numbers are 32 bits wide. The first bit (bit 31, the MSB) is a sign bit, the next 8 bits (bits 30-23) are the exponent, and the remaining 23 bits are for the significand. Note that even though 23 bits are stored for the significand, the precision() is actually 24 bits. This is a trick made possible by a normalized floating point system with . The exponent is biased by 127, so that negative exponents can be expressed.
Double-Precision
[edit | edit source]Double-precision numbers are 64 bits wide. The MSB (bit 63) is the sign bit. The next 11 bits (bits 62-52) are the exponent, and the rest of the bits (bits 51-0) are for the significand. Again, the precision is actually 53 bits (not 52) because of the same normalization trick.
Extended-Precision
[edit | edit source]Review
[edit | edit source]Format | Width | Precision | Exponent | Significand |
---|---|---|---|---|
Single | 32 bits | 23 bits | bits 30-23 | bits 22-0 |
Double | 64 bits | 52 bits | bits 62-52 | bits 51-0 |