

## **Memory Challenge**

- Make memory appear as fast as processor
- Ideal memory:
  - Fast
  - Cheap (inexpensive)
  - Large (capacity)

But can only choose two!

### **Review: What were Memory Elements ?**

#### Memories are large blocks

A significant portion of a modern circuit is memory.

#### Memories are practical tools for system design

- Programmability, reconfigurability all require memory
- Allows you to store data and work on stored data
  - Not all algorithms are designed to process data as it comes, some require data to be stored.
  - Data type determines required storage

### How Can We Store Data

#### Flip-Flops (or Latches)

- Very fast, parallel access
- Expensive (one bit costs 20+ transistors)
- Static RAM
  - Relatively fast, only one data word at a time
  - Less expensive (one bit costs 6 transistors)
- Dynamic RAM
  - Slower, reading destroys content (refresh), one data word at a time, needs special process
  - Cheaper (one bit is only a transistor)
- Other storage technology (hard disk, flash)
  - Much slower, access takes a long time, non-volatile
  - Per bit cost is lower (no transistors directly involved)

# Locality

#### Exploit locality to make memory accesses fast

#### Temporal Locality:

- Locality in time
- If data used recently, likely to use it again soon
- How to exploit: keep recently accessed data in higher levels of memory hierarchy

#### Spatial Locality:

- Locality in space
- If data used recently, likely to use nearby data soon
- How to exploit: when access data, bring nearby data into higher levels of memory hierarchy too

### Taking Advantage of Locality

- Memory hierarchy
- Store everything on disk
- Copy recently accessed (and nearby) items from disk to smaller DRAM memory
  - Main memory
- Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory
  - Cache memory attached to CPU





### Cache Terminology

- Capacity (C):
  - the number of data bytes a cache stores
- Block size (b):
  - bytes of data brought into cache at once
- Number of blocks (B = C/b):
  - number of blocks in cache: B = C/b
- Degree of associativity (N):
  - number of blocks in a set
- Number of sets (S = B/N):
  - each memory address maps to exactly one cache set





# Hits and Misses

- On cache hit, CPU proceeds normally
- On cache miss
  - Stall the CPU pipeline
  - Fetch block from next level of hierarchy
  - Instruction cache miss
    - Restart instruction fetch
  - Data cache miss
    - Complete data access

## **Miss Types**

•Compulsory: first time data accessed

- •Capacity: cache too small to hold all data of interest
- Conflict: data of interest maps to same location in

cache

### Mapping function

- There are fewer cache blocks than main memory blocks, an algorithm is needed for mapping main memory blocks into cache blocks.
- Need to determine which main memory block currently occupies a cache block.
- The choice of the mapping function dictates how the cache is organized.
- •Three techniques for mapping function:
  - Direct mapped
  - N-way set associative
  - Fully associative









|       |       | -      |      |          | Examp       |
|-------|-------|--------|------|----------|-------------|
| Word  | laddr | Binary | addr | Hit/miss | Cache block |
| 2     | 2     | 10 1   |      | Miss     | 110         |
| 000   | N     | -      |      |          |             |
| Index |       | Tag    | Dat  |          |             |
|       |       |        |      |          |             |
| 001   | N     |        |      |          |             |
| 010   | N     |        |      |          |             |
| 011   | N     |        |      |          |             |
| 100   | N     |        |      |          |             |
| 101   | N     |        |      |          |             |
| 110   | Y     | 10     | Me   | m[10110] |             |
| 111   | N     |        |      |          |             |

# Direct Mapped Cache Example

| Word addr | Binary addr | Hit/miss | Cache block |
|-----------|-------------|----------|-------------|
| 26        | 11 010      | Miss     | 010         |

| Index | V | Tag | Data       |
|-------|---|-----|------------|
| 000   | Ν |     |            |
| 001   | Ν |     |            |
| 010   | Y | 11  | Mem[11010] |
| 011   | Ν |     |            |
| 100   | Ν |     |            |
| 101   | Ν |     |            |
| 110   | Y | 10  | Mem[10110] |
| 111   | N |     |            |

| ect I | Ma    | pped   |        | ache     | Examp       |
|-------|-------|--------|--------|----------|-------------|
|       |       |        |        |          |             |
| Word  | laddr | Binary | v addr | Hit/miss | Cache block |
| 2     | 2     | 10 1   | 110    | Hit      | 110         |
| 2     | 6     | 11 0   | )10    | Hit      | 010         |
|       |       |        |        |          |             |
| Index | V     | Tag    | Dat    | ta       |             |
| 000   | Ν     |        |        |          |             |
| 001   | Ν     |        |        |          |             |
| 010   | Υ     | 11     | Ме     | m[11010] |             |
| 011   | Ν     |        |        |          |             |
| 100   | Ν     |        |        |          |             |
| 101   | Ν     |        |        |          |             |
| 110   | Y     | 10     | Me     | m[10110] |             |
| 111   | Ν     |        |        |          |             |

# **Direct Mapped Cache Example**

| Word  | addr | Binary addr |     | Hit/miss   | Cache block |  |  |
|-------|------|-------------|-----|------------|-------------|--|--|
| 16    | 6    | 10 000      |     | Miss 000   |             |  |  |
| 3     |      | 00 011      |     | Miss       | 011         |  |  |
| 16    | 6    | 10 000      |     | Hit 000    |             |  |  |
|       |      | i           |     |            |             |  |  |
| Index | V    | Tag         | Dat | a          |             |  |  |
| 000   | Υ    | 10          | Ме  | Mem[10000] |             |  |  |
| 001   | Ν    |             |     |            |             |  |  |
| 010   | Y    | 11          | Me  | m[11010]   |             |  |  |
| 011   | Y    | 00          | Ме  | m[00011]   |             |  |  |
| 100   | Ν    |             |     |            |             |  |  |
| 101   | N    |             |     |            |             |  |  |
| 110   | Y    | 10          | Me  | m[10110]   |             |  |  |
| 111   | N    |             |     |            |             |  |  |

| Word  | addr   | Binary ad | dr Hit/miss | Cache block |
|-------|--------|-----------|-------------|-------------|
| 18    | 3      | 10 010    | Miss        | 010         |
| 000   | Y      | 10        | Mem[10000]  |             |
| Index | V      | Tag       | Data        |             |
|       |        | 10        | Mem[10000]  |             |
| 001   | Ν      |           |             |             |
| 010   | Y      | 10        | Mem[10010]  |             |
| 011   | Y      | 00        | Mem[00011]  |             |
| 100   | N      |           |             |             |
| 100   |        |           |             |             |
| 100   | Ν      |           |             |             |
| -     | N<br>Y | 10        | Mem[10110]  |             |

### **Associative Caches**

#### Fully associative

- Allow a given block to go in any cache entry
- Requires all entries to be searched at once
- Comparator per entry (expensive)

#### n-way set associative

- Each set contains n entries
- Block number determines which set
  - Block number) modulo (#Sets in cache)
- Search all entries in a given set at once
- n comparators (less expensive)





### Associativity Example

Compare 4-block caches

- Direct mapped, 2-way set associative, fully associative
- Block access sequence: 0, 8, 0, 6, 8

#### Direct mapped

| Block   | Cache | Hit/miss | hiss Cache content after access |   |        |   |  |  |
|---------|-------|----------|---------------------------------|---|--------|---|--|--|
| address | index |          | 0                               | 1 | 2      | 3 |  |  |
| 0       | 0     | miss     | Mem[0]                          |   |        |   |  |  |
| 8       | 0     | miss     | Mem[8]                          |   |        |   |  |  |
| 0       | 0     | miss     | Mem[0]                          |   |        |   |  |  |
| 6       | 2     | miss     | Mem[0]                          |   | Mem[6] |   |  |  |
| 8       | 0     | miss     | Mem[8]                          |   | Mem[6] |   |  |  |
| Ů       | •     | 11100    | [0]                             |   | mom[o] |   |  |  |

### Associativity Example

#### 2-way set associative

| Block   | Cache | Hit/miss | Ca     | ess    |    |      |
|---------|-------|----------|--------|--------|----|------|
| address | index |          | Se     | et O   | Se | et 1 |
| 0       | 0     | miss     | Mem[0] |        |    |      |
| 8       | 0     | miss     | Mem[0] | Mem[8] |    |      |
| 0       | 0     | hit      | Mem[0] | Mem[8] |    |      |
| 6       | 0     | miss     | Mem[0] | Mem[6] |    |      |
| 8       | 0     | miss     | Mem[8] | Mem[6] |    |      |

#### Fully associative

| Block<br>address | Hit/miss | Cache content after access |        |        |  |  |  |  |
|------------------|----------|----------------------------|--------|--------|--|--|--|--|
| 0                | miss     | Mem[0]                     |        |        |  |  |  |  |
| 8                | miss     | Mem[0]                     | Mem[8] |        |  |  |  |  |
| 0                | hit      | Mem[0]                     | Mem[8] |        |  |  |  |  |
| 6                | miss     | Mem[0]                     | Mem[8] | Mem[6] |  |  |  |  |
| 8                | hit      | Mem[0]                     | Mem[8] | Mem[6] |  |  |  |  |

### **Replacement Policy**

- Direct mapped
  - No choice
- Associative
  - Any invalid block first
  - If all are valid, consult the replacement policy
    - Random
    - FIFO first in first out
    - LRU Least recently used
    - LFU Least frequently used

## Write-Through

- On data-write hit, could just update the block in cache
  - But then cache and memory would be inconsistent
- Write through: also update memory
- But makes writes take longer
- Solution: write buffer
  - Holds data waiting to be written to memory
  - CPU continues immediately
    - Only stalls on write if write buffer is already full

29

