Lecture 8: Main Memory

COMP 3400 Lecture 8: Main Memory

The overall goal

Every fetch/execute cycle requires instructions/data to be transferred to/from main memory.

To read data from the memory unit, the CPU sends a fetch request along with the desired address.
To write data, it sends a store request along with the desired address and contents.
Thus a running process generates a sequence of memory requests.

The goal of memory management is to satisfy these requests as efficiently as possible in a multiprogramming environment. Our initial focus is on mechanism over policy: the CPU scheduler decides which process will run next; the memory manager assures the process is loaded in main memory.

Mapping logical to physical addresses

We start with three assumptions:

a process' instructions are in binary executable format
the memory unit will at any given time contain instructions and data from more than one process.
a process can be loaded into any available memory location

This is reasonable: even if only one process is currently running, a quick context switch to a "ready" process cannot occur unless at least a portion of that process is already contained in memory. And if there are multiple processes, the OS will not be able to guarantee them a "reserved" location in memory.

Given those assumptions, address references contained in the binary executable code will in general not match the target physical addresses when the process runs. The address references in the program are thus logical addresses, not physical addresses. In order for the program to run correctly, logical addresses must be mapped, or bound, to physical addresses at some point. Logical addresses are also known as virtual addresses. The de-coupling of virtual and physical addresses permits a process to have a larger address space than is physically present. This is turn leads to the requirement that a process can execute while only partially loaded into memory.

Historically, address binding has been available at any of three steps:

compile time : physical addresses generated when program compiled
load time : physical addresses generated when the program is loaded into memory in preparation for running. Compiler generates addresses starting with 0.
execution time : physical addresses generated as the process runs. The process itself contains addresses starting with 0.

Computers and OSs in the modern era use execution time binding almost exclusively. Exceptions include certain OS kernel code which resides in fixed low memory locations (e.g. interrupt vectors and handlers).

Execution time binding requires special hardware consisting at a minimum of a relocation (base) register and limit register, located in the memory-management unit (MMU). Recall these were introduced in lecture 1.

process base physical address is loaded into the relocation register, and
process address space size is loaded into the limit register.
At each address reference,

the logical address is compared with the limit (interrupt generated if outside the limit), then
the relocation register value is added to the logical address to create the physical address.
physical address goes into the memory address register (MAR) in the memory unit for its fetch/store operation.

This is simple and fast, but only works if memory for the entire process is allocated in one piece. This leads to poor performance, so modern operating systems permit process memory to be split up. Memory allocation mechanisms are discussed next.

Partitioned Allocation

Partitioning involves loading the entire process space into memory. Physical memory is thus partitioned into the various processes, and each process is stored in a contiguous chunk of memory.

This makes address limit checking and relocation very simple! It is not efficient use of space because different processes have different size.

Early partitioning techniques used fixed partitions, a fixed number of partitions of fixed size each. Each partition holds exactly one process, and the OS made an effort to fit processes to partitions.

This soon gave way to variable partitions in which the partition was only as large as the process, and could be allocated in any available contiguous hole of memory large enough to contain it. The OS must maintain a list of holes. The trick then is matching processes to holes, and several strategies emerged:

best fit : place the process in the smallest hole that can hold it
worst fit : place the process in the largest hole
first fit : place the process in the first hold that can hold it

Best fit uses space efficiently but creates small holes that are hard to fill and can add up to lots of wasted space. Worst fit creates large holes that are more likely to be used by later processes. First fit allocates faster because there is little decision logic. Note these are all policy decisions.

All these approaches result in external fragments, holes between processes that are not large enough to be usable. Fragments can be periodically removed through compaction (analogous to disk de-fragmenting) but this requires OS overhead.

Paged Allocation

The fragmentation problems of variable partitioning were caused by requirement for contiguous memory allocation. Paging allows the physical process space to be non-contiguous.

Physical memory is divided into fixed sized blocks called frames.
Logical memory is divided into blocks of the same size called pages.
When loading for execution, any page can be loaded into any available frame.
Frame/page size is power of 2, typically 2K to 8K bytes. Windows and Linux use 4K byte frames.

Here we consider only the mechanism, which is loading pages into frames and translating logical addresses to physical addresses. The policy, which concerns which frames to load, when to load them, where to load them and when to replace them, is covered in the virtual memory lecture.

Paging eliminates external fragmentation but introduces internal fragmentation. This is simply the unused portion of the last page of a process, on average one half a page in size. Fragmentation is reduced only by reducing the page size, which has performance costs of its own (results in more frames and thus longer page table, see below).

Another cost of paging: since any frame can be allocated to any page, the OS has to keep track of which frames are available (for future allocations), plus keep track of which frames are allocated to which process (for protection -- prevent process from accessing frame allocated to different process) and which frames are available for allocation. It keeps track using a frame table with one entry per physical frame.

Hardware support for paged address translation

The hardware required to implement paged addressing includes:

a register to hold the logical (virtual) address, partitioned into
- page number in high order bits
- page displacement in low order bits
- boundary between the two depends on page size (e.g. 4K page == 2¹² bytes == 12 bit displacement)
a register to hold the resulting physical address, similarly partitioned into

frame number in high order bits (possibly fewer bits than for page number)
frame displacement in low order bits (same number of bits as for page displacement)

a page table to hold page-to-frame mapping information

page number from logical address is table index
table value at that index is the corresponding frame number

The translation process is simple (and is not sequential as implied by numbered steps):

logical address is loaded into logical address register
page displacement is copied to frame displacement
page number is used to index into page table
page table entry is copied to frame number
physical address used to access memory

An example would be good here.

Implementing the page table

Each process has its own page table. The page table (or at least a pointer to it) is part of the Process Control Block and must be saved/restored upon context switch.

The page table can be implementing using registers only if the table is very small. The benefits of a large page table (e.g. 1 million entries) are so great the sacrifice is made to store them in main memory. In this case, the base address of the page table itself is stored in a register. To get the page table entry thus requires an additional memory access to accomplish steps 3 and 4 above:

add the page number to the page table base address
fetch contents of the resulting address, which is the desired frame number

Protection bits can also be added to each page table entry. Two examples are:

read/write/execute : determines process usage privileges for this page frame. For instance if page contains data, execute should not be allowed. If read-only, do not allow writing.
valid/invalid : process will not occupy entire logical address space so some page table entries do not have corresponding frames. Deny access to such entries by setting to invalid.

If multiple processes are running the same application, it is advantageous to keep only one shared copy of the application's binary code in memory. The page tables for those processes will all contain entries pointing to the same set of shared frames occupied by the application.

Speeding up the translation process

As described just above, the logical-to-physical address translation itself requires a memory access to get the page table entry. Thus each memory request requires two memory accesses. Can this process be made faster? You know the answer.

The answer is a small associative cache of recently-accessed page table entries called the Translation Look-aside Buffer (TLB).

each TLB entry is a <page number, frame number> pair
perform associative lookup on page number -- searches all table entries in parallel
if matched, have a TLB hit and the corresponding frame number is copied into physical address register, bypassing the page table and saving memory access
if no match, have to go to page table -- don't forget to add result to TLB for next time

Address translation now becomes:

logical address is loaded into logical address register
page displacement is copied to frame displacement of physical address register
page number is used to search TLB
If TLB hit, copy frame number into physical address
If TLB miss,

add page number to page table base address
get page table entry (frame number) from memory
copy frame number into physical address
add page/frame entry to TLB

physical address used to access memory

Multi-level Paging

Page table size is a huge concern. Consider a 32 bit virtual address with a reasonable 4KB page size; the displacement field uses the low order 12 bits, leaving 20 for the page number. The page table thus has 2²⁰ (over 1 million) entries, with each entry requiring 4 bytes or more -- 4 MB of RAM per process!

Wait, it gets worse...the page table must be stored in contiguous locations to allow page number to be used as index!

Solution? Page the page table! Instead of having one page table with 2²⁰ entries, you could have, say, 2¹⁰ page tables each with 2¹⁰ entries. E.g. 1024 page tables of 1024 entries each. The contiguous storage requirement then drops from 4MB to 4KB (e.g. could be stored in one frame).

Advantage? Page table can be split up and stored in non-contiguous frames, facilitating memory management.

Disadvantage? Now two memory accesses are required to find the frame number, one to access the outer page table and a second to access the page table page. This can be overcome using a TLB, since a TLB hit would eliminate both accesses.

Can this be extended to 3-levels? 4? Sure -- the Motorola 68030, used in Macintoshes for years, implemented 4-level paging.

Alternative to paging: Segmentation

Paging is fine but results in memory organization that bears no resemblance to process structures. Examples of process structures are functions, classes, modules, data, and so forth.

Organizing memory by segmentation means thinking of the logical address space as a collection of segments.

Each segment represents an identifiable program structure.
Each segment has a name and a length.
Each segment has its own logical address space starting at address 0.
A segment name can be represented by a numeric ID.
A logical address is thus represented by a <segment number, displacement> pair.

segment number is an index into the segment table (replaces page table), which yields segment's base address
displacement is index into the segment

use simple base+displacement to calculate physical address

Advantage of segmentation: facilitates sharing and protection. For read-only contents, such as program module, several processes can share one copy of memory-resident segment. Can control access/manipulation of a segment through protection bits stored in one segment table entry.

Disadvantage of segmentation: external fragmentation of memory, since allocation is based on variable segment sizes.

The ultimate: Paged Segmentation

The advantages of segmentation are considerable, but how can we control the fragmentation problem?? You guessed it: page the segments! We relax the requirement that a segment be stored in contiguous memory. A generic solution involves defining a segment table where each entry points to the page table for that segment. You should be able to figure it out from there.

We will not go into the details of this solution. However you should be aware that this is not just of theoretical interest; the Intel Pentium processor implements the technique of paged segments.

[ COMP 3400 | Peter Sanderson | Math Sciences home page | Otterbein ]

Last updated:
Peter Sanderson (PSanderson@otterbein.edu)