COMP 3400 Lecture 8: Main Memory

[ previous | schedule | next ]

The overall goal

Every fetch/execute cycle requires instructions/data to be transferred to/from main memory. The goal of memory management is to satisfy these requests as efficiently as possible in a multiprogramming environment. Our initial focus is on mechanism over policy: the CPU scheduler decides which process will run next; the memory manager assures the process is loaded in main memory.

Mapping logical to physical addresses

We start with three assumptions:
  1. a process' instructions are in binary executable format
  2. the memory unit will at any given time contain instructions and data from more than one process.
  3. a process can be loaded into any available memory location

This is reasonable: even if only one process is currently running, a quick context switch to a "ready" process cannot occur unless at least a portion of that process is already contained in memory. And if there are multiple processes, the OS will not be able to guarantee them a "reserved" location in memory.

Given those assumptions, address references contained in the binary executable code will in general not match the target physical addresses when the process runs. The address references in the program are thus logical addresses, not physical addresses. In order for the program to run correctly, logical addresses must be mapped, or bound, to physical addresses at some point. Logical addresses are also known as virtual addresses. The de-coupling of virtual and physical addresses permits a process to have a larger address space than is physically present. This is turn leads to the requirement that a process can execute while only partially loaded into memory.

Historically, address binding has been available at any of three steps:

Computers and OSs in the modern era use execution time binding almost exclusively. Exceptions include certain OS kernel code which resides in fixed low memory locations (e.g. interrupt vectors and handlers).

Execution time binding requires special hardware consisting at a minimum of a relocation (base) register and limit register, located in the memory-management unit (MMU). Recall these were introduced in lecture 1.

This is simple and fast, but only works if memory for the entire process is allocated in one piece. This leads to poor performance, so modern operating systems permit process memory to be split up. Memory allocation mechanisms are discussed next.

Partitioned Allocation

Partitioning involves loading the entire process space into memory. Physical memory is thus partitioned into the various processes, and each process is stored in a contiguous chunk of memory.

This makes address limit checking and relocation very simple! It is not efficient use of space because different processes have different size.

Early partitioning techniques used fixed partitions, a fixed number of partitions of fixed size each. Each partition holds exactly one process, and the OS made an effort to fit processes to partitions.

This soon gave way to variable partitions in which the partition was only as large as the process, and could be allocated in any available contiguous hole of memory large enough to contain it. The OS must maintain a list of holes. The trick then is matching processes to holes, and several strategies emerged:

Best fit uses space efficiently but creates small holes that are hard to fill and can add up to lots of wasted space. Worst fit creates large holes that are more likely to be used by later processes. First fit allocates faster because there is little decision logic. Note these are all policy decisions.

All these approaches result in external fragments, holes between processes that are not large enough to be usable. Fragments can be periodically removed through compaction (analogous to disk de-fragmenting) but this requires OS overhead.


Paged Allocation

The fragmentation problems of variable partitioning were caused by requirement for contiguous memory allocation. Paging allows the physical process space to be non-contiguous.

Here we consider only the mechanism, which is loading pages into frames and translating logical addresses to physical addresses. The policy, which concerns which frames to load, when to load them, where to load them and when to replace them, is covered in the virtual memory lecture.

Paging eliminates external fragmentation but introduces internal fragmentation. This is simply the unused portion of the last page of a process, on average one half a page in size. Fragmentation is reduced only by reducing the page size, which has performance costs of its own (results in more frames and thus longer page table, see below).

Another cost of paging: since any frame can be allocated to any page, the OS has to keep track of which frames are available (for future allocations), plus keep track of which frames are allocated to which process (for protection -- prevent process from accessing frame allocated to different process) and which frames are available for allocation. It keeps track using a frame table with one entry per physical frame.

Hardware support for paged address translation

The hardware required to implement paged addressing includes:

The translation process is simple (and is not sequential as implied by numbered steps):
  1. logical address is loaded into logical address register
  2. page displacement is copied to frame displacement
  3. page number is used to index into page table
  4. page table entry is copied to frame number
  5. physical address used to access memory
An example would be good here.

Implementing the page table

Each process has its own page table. The page table (or at least a pointer to it) is part of the Process Control Block and must be saved/restored upon context switch.

The page table can be implementing using registers only if the table is very small. The benefits of a large page table (e.g. 1 million entries) are so great the sacrifice is made to store them in main memory. In this case, the base address of the page table itself is stored in a register. To get the page table entry thus requires an additional memory access to accomplish steps 3 and 4 above:

  1. add the page number to the page table base address
  2. fetch contents of the resulting address, which is the desired frame number

Protection bits can also be added to each page table entry. Two examples are:

If multiple processes are running the same application, it is advantageous to keep only one shared copy of the application's binary code in memory. The page tables for those processes will all contain entries pointing to the same set of shared frames occupied by the application.

Speeding up the translation process

As described just above, the logical-to-physical address translation itself requires a memory access to get the page table entry. Thus each memory request requires two memory accesses. Can this process be made faster? You know the answer.

The answer is a small associative cache of recently-accessed page table entries called the Translation Look-aside Buffer (TLB).

Address translation now becomes:
  1. logical address is loaded into logical address register
  2. page displacement is copied to frame displacement of physical address register
  3. page number is used to search TLB
  4. If TLB hit, copy frame number into physical address
  5. If TLB miss,
  6. physical address used to access memory

Multi-level Paging

Page table size is a huge concern. Consider a 32 bit virtual address with a reasonable 4KB page size; the displacement field uses the low order 12 bits, leaving 20 for the page number. The page table thus has 220 (over 1 million) entries, with each entry requiring 4 bytes or more -- 4 MB of RAM per process!

Wait, it gets worse...the page table must be stored in contiguous locations to allow page number to be used as index!

Solution? Page the page table! Instead of having one page table with 220 entries, you could have, say, 210 page tables each with 210 entries. E.g. 1024 page tables of 1024 entries each. The contiguous storage requirement then drops from 4MB to 4KB (e.g. could be stored in one frame).

Advantage? Page table can be split up and stored in non-contiguous frames, facilitating memory management.

Disadvantage? Now two memory accesses are required to find the frame number, one to access the outer page table and a second to access the page table page. This can be overcome using a TLB, since a TLB hit would eliminate both accesses.

Can this be extended to 3-levels? 4? Sure -- the Motorola 68030, used in Macintoshes for years, implemented 4-level paging.


Alternative to paging: Segmentation

Paging is fine but results in memory organization that bears no resemblance to process structures. Examples of process structures are functions, classes, modules, data, and so forth.

Organizing memory by segmentation means thinking of the logical address space as a collection of segments.

Advantage of segmentation: facilitates sharing and protection. For read-only contents, such as program module, several processes can share one copy of memory-resident segment. Can control access/manipulation of a segment through protection bits stored in one segment table entry.

Disadvantage of segmentation: external fragmentation of memory, since allocation is based on variable segment sizes.


The ultimate: Paged Segmentation

The advantages of segmentation are considerable, but how can we control the fragmentation problem?? You guessed it: page the segments! We relax the requirement that a segment be stored in contiguous memory. A generic solution involves defining a segment table where each entry points to the page table for that segment. You should be able to figure it out from there.

We will not go into the details of this solution. However you should be aware that this is not just of theoretical interest; the Intel Pentium processor implements the technique of paged segments.


[ COMP 3400 | Peter Sanderson | Math Sciences home page | Otterbein ]

Last updated:
Peter Sanderson (PSanderson@otterbein.edu)