View on GitHub

CS274

Computer Architecture

CS274: Computer Architecture - The Memory Hierarchy: Cache Design

Activity Goals

The goals of this activity are:

To design a memory architecture that does not starve the CPU despite the memory hierarchy
To exploit cache design using the principles of locality

Supplemental Reading

Feel free to visit these resources for supplemental background reading material.

The Activity

Directions

Consider the activity models and answer the questions provided. First reflect on these questions on your own briefly, before discussing and comparing your thoughts with your group. Appoint one member of your group to discuss your findings with the class, and the rest of the group should help that member prepare their response. Answer each question individually from the activity, and compare with your group to prepare for our whole-class discussion. After class, think about the questions in the reflective prompt and respond to those individually in your notebook. Report out on areas of disagreement or items for which you and your group identified alternative approaches. Write down and report out questions you encountered along the way for group discussion.

Model 1: The Principle of Locality

Questions

There are two principles of locality: Temporal Locality (referring to a variable or instruction repeatedly in a short period of time) and Spatial Locality (referring to nearby memory addresses following a particular access). Identify all the examples of each in the code example above.
What would happen to our locality if we switched the two for loop lines? This is called switching from a row-major order to a column-major order.
What would happen to our locality if N and M became very large?

Model 2: The Memory Hierarchy

The Memory Mountain from CS:APP by Bryant and O'Hallaron

Questions

What do you think processor registers are made of? Why might computer memory and disk be made of a different material?
Since cache (and, for that matter, processor registers) is so much smaller than primary memory, how can we make efficient use of the space given the principle of locality? For example, should we randomly store data in cache, or take a more strategic approach?
What do you think stride refers to?

Model 3: Exploiting Temporal Locality with the Direct Mapped Cache

Direct Mapped Cache Design with Address Bits

Questions

Which address bits are used to determine which block a word maps to?
Given a cache with 1024 entries, how many bits would determine the block number (a.k.a. row, or index)?
What is the purpose of the tag bits? Why is it necessary to store them?
Why do you think the last two bits of the address are ignored?

Model 4: Direct Mapped Cache Example

Index	Valid	Tag	Data
000
001
010
011
100
101
110
111

Address	Binary	Cache Block	Hit/Miss?
22	10110	110
26	11010	010
22	10110	110
26	11010	010
16	10000	000
3	00011	011
16	10000	000
18	10010	010

Questions

Fill in the table above, assuming an 8-word direct mapped cache.
How many hits and misses were there?
Why were these cache replacements particularly unfortunate?
How could we increase the duration of temporal locality in this cache? That is, what could we do to enable these words to remain in cache longer?

Model 5: Write Policies to Avoid Stale Data

Questions

What is the benefit and drawback of writing data to both cache and to main memory (and every layer in between) whenever a write occurs? This is known as a "write-through" strategy.
What would need to change if a write-back strategy was used instead, in which data was only written to the top layer of cache. What would we need to do to ensure this data is not lost, and when would we need to copy it to the next lower layer?

Model 6: Adding Spatial Locality Multiword Blocks

Questions

What is different about this design?
How does this approach enable spatial locality?
To what block and index does address 67 map?
What other words would be loaded into this block?

Model 7: Mitigating Cache Conflicts with Associative Caches

Questions

Draw a 1 MB cache that is 4-way set associative, with 8 words per block.
Map address 37 to this cache, and indicate which other addresses would be loaded.
Show the tag and index for address 37.
Does a fully associative cache need a block offset, an index, and/or a tag? Why or why not?

Model 8: Cache Replacement Strategies

Draw a direct mapped, two-way set associative, and fully associative cache with 4 words total.

Questions

Map these addresses and count the number of hits and misses: 0, 8, 0, 6, 8.
How does your hit rate change for 8 or 16 word caches?
When you must replace a block, how can you decide which set to replace? What is your rationale for doing so?

Model 9: Reducing the Miss Penalty with Multi-Level Caches

Suppose you have a base CPI of 1 (assuming L1 cache hits) and a 5 GHz CPU clock (0.2 ns per CPU cycle), 100 ns main memory access time, a 2% miss rate at L1, a 0.5% miss rate at L2.

Questions

What is your CPI with and without your secondary cache?

Submission

I encourage you to submit your answers to the questions (and ask your own questions!) using the Class Activity Questions discussion board. You may also respond to questions or comments made by others, or ask follow-up questions there. Answer any reflective prompt questions in the Reflective Journal section of your OneNote Classroom personal section. You can find the link to the class notebook on the syllabus.