CS274: Computer Architecture - Computer Arithmetic: Floating Point
Activity Goals
The goals of this activity are:- To describe the components of the values in the IEEE 754 floating point standard
- To differentiate between single and double precision floating point values
- To convert floating point values into the IEEE 754 floating point standard format
- To describe the benefits of normalization in the IEEE 754 standard
- To explain how and why floating point precision is finite and subject to loss (approximation)
Supplemental Reading
Feel free to visit these resources for supplemental background reading material.- IEEE 754 Floating Point Standard
- Floating Point Arithmetic Examples
- What Every Computer Scientist Should Know about Floating Point
The Activity
Directions
Consider the activity models and answer the questions provided. First reflect on these questions on your own briefly, before discussing and comparing your thoughts with your group. Appoint one member of your group to discuss your findings with the class, and the rest of the group should help that member prepare their response. Answer each question individually from the activity, and compare with your group to prepare for our whole-class discussion. After class, think about the questions in the reflective prompt and respond to those individually in your notebook. Report out on areas of disagreement or items for which you and your group identified alternative approaches. Write down and report out questions you encountered along the way for group discussion.Model 1: Scientific Notation: a Review
Questions
- What are the components of every value written in scientific notation?
- How might you "normalize" this value by writing it with only a single digit in the one's place?
- For a binary value, every value except for 0 must have what value in the one's place when normalized?
Model 2: Single Precision Floating Point Standard
Questions
- Write 0.5 in binary by writing it in the form
1.xxx * 2^yyy
. What is the exponent and the mantissa? - Complete the conversion to floating point by adding this exponent to 127. This is called a "bias" term, and you should end up with a positive exponent, even though your original exponent was negative. Why do you think all exponents are converted to positive values in this way?
- Look up the double precision standard and list the differences between it and the single precision standard.
- Does double precision offer inrceased range, increased precision, or both?
- What is the approximate range of a single and a double precision floating point value?
- Using only integer MIPS instructions, write an instruction to compare two MIPS floating point values. Hint - you only need one line of code! What does this tell you about the floating point standard? Another hint - this has something to do with the normalization of the exponent by converting all exponents to positive values.
- Why isn't the initial 1 in the
1.xxx
field encoded in the bits of an IEEE floating point number? What is the benefit of this?
Embedded Code Environment
You can try out some code examples in this embedded development environment! To share this with someone else, first have one member of your group make a small change to the file, then click "Open in Repl.it". Log into your Repl.it account (or create one if needed), and click the "Share" button at the top right. Note that some embedded Repl.it projects have multiple source files; you can see those by clicking the file icon on the left navigation bar of the embedded code frame. Share the link that opens up with your group members. Remember only to do this for partner/group activities!Model 3: Representing Floating Point Values
0.0: 0 00000000 00000000000000000000000
1.0 (1.0 x 2^0): 0 01111111 00000000000000000000000
0.5 (0.1 binary = 1.0 x 2^-1): 0 01111110 00000000000000000000000
0.75 (0.11 binary = 1.1 x 2^-1): 0 01111110 10000000000000000000000
3.0 (11 binary = 1.1*2^1): 0 10000000 10000000000000000000000
-0.375 (-0.011 binary = -1.1*2^-2): 1 01111101 10000000000000000000000
1 10000011 01000000000000000000000 = - 1.01 * 2^4 = -20.0
1.0 (1.0 x 2^0): 0 01111111 00000000000000000000000
0.5 (0.1 binary = 1.0 x 2^-1): 0 01111110 00000000000000000000000
0.75 (0.11 binary = 1.1 x 2^-1): 0 01111110 10000000000000000000000
3.0 (11 binary = 1.1*2^1): 0 10000000 10000000000000000000000
-0.375 (-0.011 binary = -1.1*2^-2): 1 01111101 10000000000000000000000
1 10000011 01000000000000000000000 = - 1.01 * 2^4 = -20.0
Questions
- Represent 1.25 as a single precision floating point value.
- What floating point value is represented by the binary field 0 01111110 000000000000000000000000?
- An exponent of 255 with a mantissa is considered infinity (which can be positive or negative based on the sign bit), and NaN is represented by an exponent of 255 with a non-zero mantissa. What floating point value would be represented by the binary field 0 00000000 000000000000000000000000; that is, 0 exponent and 0 mantissa? Note that this is considered a special case and, in reality, it is hard coded to 0
- What is the distance between two floating point numbers? Is it always the same? When might you expect the gap to be larger, or smaller (which field would this depend upon)?
- Represent 0.1 as a single precision floating point value. To calculate a mantissa for any decimal value, repeatedly multiply the decimal portion by 2; if this number is greater than 1, append a 1 to the mantissa. Take the decimal portion of that result and repeat to fill the rest of the mantissa field. Normalize this mantissa with any whole number portion of the float, and use this to generate your exponent.
Model 4: Addition of Floating Point Values and Loss of Precision
1.000 * 2^-1 + -1.11 * 2^-2
1.000 * 2^-1 + -0.111 * 2^-1
0.001 * 2^-1
1.000 * 2^-4
0.5 + -0.4375 = 0.0625
1.000 * 2^-1 + -0.111 * 2^-1
0.001 * 2^-1
1.000 * 2^-4
0.5 + -0.4375 = 0.0625
Questions
- To add floating point values, denormalize one so that there is a single ones place and a mantissa for both values. Then add or subtract, and then re-normalize the result. Generate two floating point values, convert them to IEEE 754 binary, and add them. Check your answer by converting the values back to decimal.
- What is the result of
(-1.9*10^25 + 1)
? Why? - What is the result of
-1.9*10^25 + (1.9*10^25 + 1)
? How about(-1.9*10^25 + 1.9*10^25) + 1
? Are they the same or different, and why? - Is floating point arithmetic associative? That is, do you get the same results by adding floating point values when you move the parenthesis?