Fundamentals of Data Representation: Rounding Errors
From the Specification : Fundamentals of Data Representation - Rounding Errors Know and be able to explain why both fixed point and floating point representation of decimal numbers may be inaccurate. |
For a real number to be represented exactly by the binary number system, it must be capable of being represented by a binary fraction in the given number of bits. Some values cannot ever be represented exactly, for example 0.110.
Maths in a processor is normally performed using set numbers of bits. For example, where you add 8 bits to 8 bits. This will often cause no problems at all:
00110011 (51) +00001010 (10) -------- 00111101 (61)
But what happens if we add the following numbers together:
01110011 (115) +01001010 (74) -------- 10111101 (189)
This may appear to have gone ok, but we have a problem. If we are dealing with twos complement numbers the answer from adding two positive numbers together is negative!
01110011 (115) +01001010 (74) -------- 10111101 (-67!)
Overflow
[edit | edit source]Let's take a look at another problem example, the problem of overflow
1010 (-6) +1010 (-6) -------- (1)0100 (+4!)
As you can see in the sum above, we have added two negative numbers together and the result is a positive number.
To deal with the situations mentioned above we use the status register
The most common flags
[edit | edit source]Flag | Name | Description |
---|---|---|
Z | Zero flag | Indicates that the result of an arithmetic or logical operation (or, sometimes, a load) was zero. |
C | Carry flag | Enables numbers larger than a single word (in the examples above 4 or 8 bits) to be added/subtracted by carrying a binary digit from a less significant word to the least significant bit of a more significant word as needed |
S / N | Sign flag / Negative flag | One indicates whether the result was negative whereas the other indicates whether a subtraction or addition has taken place. |
O | Overflow flag | Indicates that the signed result of an operation is too large to fit in the register width using twos complement representation. |
P | Parity flag | Indicates whether the number of set bits of the last result is odd or even. |
Status register working For the sum that we met earlier we will take a look at how the status register can be used to stop the incorrect answer arising: 01110011 (115) +01001010 (74) -------- 10111101 (-67) Status register: Z = False | C = False | N = True | O = True | P = Even Using these flags you can see that the result is negative, if the original sum used only positive values, then we know we have an error. Looking at the other equation: 1010 (-6) +1010 (-6) ---- (1)0100 Status register: Z = False | C = True | N = False | O = True | P = Odd Using these flags you can see that the result is positive when the original used two negative numbers. We can also see that overflow occurred. |
Exercise: Status register What is the problem with the result of the following 4 bit sum: 1011 (-5) +1011 (-5) ---- Answer: The result would create overflow, giving an incorrect answer: 1011 (-5) +1011 (-5) ---- (1)0110 (+6) In the context of calculations what is overflow? Answer: When a the result of a calculation is too large to fit into a set number of bits. What do we need the status register for? Answer: The status register holds flags keeping track of the results of sums, this helps us to see when there is an error in a result and correct it accordingly Name three flags in a status register: Answer: Overflow, Carry, Negative, Zero Show the Status register for the following sum: 1001 (-7) +1001 (-7) ---- (1)0010 (+2) Answer: Status register: Z = False | C = True | N = False | O = True | P = Odd |