Lesson 7	Floating-point numbers
Objective	Explain what a floating-point number is.

Representation of Floating-Point Numbers in Computers: The IEEE Standard 754

In modern computing, the representation of real numbers, especially floating-point numbers, demands both precision and efficiency. To address this, the Institute of Electrical and Electronics Engineers (IEEE) introduced the IEEE Standard 754 for Floating-Point Arithmetic. This standard has been widely adopted by the computing industry and serves as the benchmark for floating-point computation in computer hardware, languages, and operating systems.

Overview of IEEE Standard 754: The IEEE Standard 754 provides a comprehensive methodology for representing and computing floating-point numbers. It defines:
- Formats for representing floating-point numbers.
- Rounding rules and operations.
- Exception handling (e.g., handling of overflow, underflow, and NaN (Not a Number) situations).
Representation of Numbers: The standard primarily defines two basic formats:
- Single Precision (32 bits): Comprising 1 bit for sign, 8 bits for the exponent, and 23 bits for the fraction.
- Double Precision (64 bits): Comprising 1 bit for sign, 11 bits for the exponent, and 52 bits for the fraction.
There are also extended formats, but the single and double precisions are the most commonly used.
Application to Given Numbers:
- 1/3: This is a rational number but cannot be exactly represented in binary. IEEE 754 will provide an approximation.
- PI: An irrational number, it also cannot be exactly represented. In practice, a truncated or rounded version of its binary form is used.
- 1.23 x 10^35: This number would be represented using the sign bit, an exponent adjusted by a bias value, and a fraction derived from the number's mantissa.
- 2.6 x 10^-28: Similarly, this number would use the sign bit for its negative value, an appropriate exponent, and a fraction.

Indeed, most contemporary computers and computing systems utilize the IEEE Standard 754 for representing and manipulating floating-point numbers. Its widespread adoption ensures consistency and predictability across platforms, making it a cornerstone in the realm of numerical computation.

To represent real numbers such as 1/3, PI, -1.23 * 1035, and -2.6 * 10-28, most computers use IEEE Standard 754 floating-point numbers . Using this representation, a real number is expressed as the product of a binary number greater than or equal to 1 and less than 2 (called the mantissa) multiplied by 2 raised to a binary number exponent.
In practice, it is very unlikely that you will ever need to look at the binary form for the floating-point representation of a real number, so we will just take a quick look at one example to give you the general idea. Single precision floating-point representation uses 32 bits.
1 bit is used for the sign bit, 8 bits are used for the exponent, and 23 bits are used for the mantissa. Here's the 32-bit floating-point representation of the real number 1/3. Floating-point number: Used to represent a real number on a computer.

Sign bit	Exponent	Mantissa
`0`	`01111101`	`01010101010101010101010`

The sign bit is 0, indicating that this is a positive number. The exponent is the binary representation of the decimal number 125. To obtain the actual exponent we subtract 127, to obtain -2. This exponent bias allows the range of the exponent to be from -127 to 128. Finally, the mantissa represents the binary number

1.01010101010101010101010

.
Note that the leading 1 of the mantissa is implied, to provide an additional digit of precision.

The decimal value of the mantissa is:

20 + 2^-2 + 2^-4 + 2^-6 + ... + 2^-22 
= 1 + 1/4 + 1/16 + 1/64 + ... + 1/2097152

This is approximately 1.3333333 and thus, 1/3 is represented, using 32-bit floating-point representation, as approximately 1.3333333 * 2^-2 or 1.3333333 * 1/4.
What's most important to remember about floating-point representation is that it allows you to represent a tremendous range of real numbers, but with limited precision. Single precision (32-bit) floating-point numbers are accurate to about 7 decimal digits, and double precision (64-bit) floating-point numbers are accurate to about 17 decimal digits.
We have covered how a computer stores numbers. Next we will consider how text is stored.

The binary number 01010101010101010101010 represents the following decimal number:

1024 + 512 + 256 + 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = 2047

Therefore, the decimal equivalent of the binary number 01010101010101010101010 is 2047.