Lecture 9: Virtual Memory

C SC 340 Lecture 9: Virtual Memory

What is virtual memory?

Recall the previous lecture concerned organizing memory and translating logical memory addresses to physical addresses. Then follow this line of reasoning that motivates virtual memory.

The OS should support multiprogramming so the CPU will not be idle while a process is waiting on I/O or some other event
An instruction can be executed in a fetch/decode/execute cycle only if it is stored in (main) memory
As a result of 1 and 2, it is necessary to keep more than one process in memory at a time
However it is not possible to store more than one process beginning at memory location 0 (or some other fixed address) at the same time.
As a result of 4, it is not possible for a compiler to generate absolute addresses because it does not know what address the process will be loaded into.
As a result of 5, the compiler generates logical addresses which assume starting address is 0.
Address translation hardware is then used to translate each logical address request into a physical address at runtime.
There is no good reason to restrict logical address space from exceeding physical address space.
There is no good reason to require that an entire process be in memory while it is running - vast parts of it are unused at any given time or possibly not used at all!
If we allow both 8 and 9, then processes can be larger than the total physical memory! Programs and processes thus occupy virtual memory.

Point #9 is the final key to virtual memory. By allowing a partially-loaded process to run, the OS:

loads processes faster, since only a portion needs to be loaded
allows the programmer to compose programs (including data) larger than physical memory
allows more processes to be memory-resident simultaneously, and therefore a higher degree of multiprogramming

Our study of virtual memory is limited to systems that use paged memory allocation. Assume the translation hardware consists of registers for the virtual and physical address plus a basic page table. The TLB and all variations of the page table are allowed but are not relevant to the topics at hand.

So, what are the "topics at hand"?

The major topics we cover here are

how to recognize that a new page needs to be read into memory
how much time it takes to perform a memory access when a new page needs to be read into memory
which page to replace if a new page is needed but no frames are available
how many frames to allocate to a process
what thrashing is and how to avoid it

Demand Paging

Demand paging refers to the practice of not loading a page into memory until it is needed by the running process. E.g. load the page on demand. If no pages are initially loaded for a new process, it is called pure demand paging.

The mechanism for determining that a page needs loaded is relatively simple

Each page table entry has a valid-invalid bit.
- Valid means the page is loaded into memory and the frame number is correct.
- Invalid means either the page number is beyond the process address space (which is a program error), or the page is not currently loaded into a frame.
address translation hardware will trap to OS if page table entry bit set to invalid.
OS trap handler determines which condition has occurred:

if the page was outside process address space, terminate process
if the page was not loaded, then page it in

When a process requests memory located in a page not yet loaded, a page fault occurs. The OS must allocate a frame to the demanded page. This is potentially very complex (to do well) and involves OS policy.

Before getting into that, here are some other issues that are usually pretty easy to solve:

Where do pages reside when not loaded in a frame? On disk. Normally, however, the OS reserves special swap space on the disk which is managed separately from files, to achieve the fastest possible response
How long does it take to service a page demand? This requires a disk read and thus depends on on current queuing, seek and rotational delays. Textbook calculations conclude an average of 8 milliseconds.
How frequently do page demands occur? At 8 milliseconds to service a demand, hopefully very rarely! Effective memory access time doubles if only 1 demand occurs every 40,000 memory accesses! This is affected both by process behavior and OS policy.
What happens to the running process while the page demand is being serviced? The OS will not allow CPU to sit idle for 8 milliseconds. The CPU scheduler will select a different process to run and the context switch will be performed. Note that page demands may occur in the middle of an instruction execution (e.g. reading an operand in from memory), yet the process cannot later resume from the middle. It has to restart the current instruction when the process resumes. For most machine instruction sets, especially RISC, this is not a problem.

Page Replacement

When a page demand occurs, the OS (among other things):

consults its frame table to find an available frame,
allocates the frame to the page,
reads the page contents from disk into the frame, and
updates its frame table and the process' page table.

But what happens if there are no available frames? The process has an immediate demand for the page, and it should be serviced promptly. It is the OS reasonability to replace a currently allocated page. But which to choose? This is the page replacement problem.

In this situation, additional steps are needed between 1 and 2.

1a. if no frames are available, select a victim page for replacement, and
1b. if necessary, save the victim to swap storage (disk)

Step 1b requires yet another disk operation, which slows down the paging process even more! Fortunately, it can often be avoided. How? Add a dirty bit to the frame table or page table entry. Dirty bit is 0 when page loaded, and marked 1 when any location in that page is modified. If it is still 0 at replacement time, it need not be written to disk. A similar bit can be used to mark read-only pages.

The page to be replaced can be selected either from among those in the same process, called local replacement, or from among those in all processes, called global replacement. The replacement techniques described here apply equally well to both.

An optimal page replacement technique

The ideal technique is to replace the page that will not be used again for the longest time period! Remind you of SJF scheduling? Same implementation problem, but can be used as a benchmark.

As with SJF, we'll try to approximate the ideal by using past behavior to predict future behavior.

First In First Out (FIFO) replacement

This technique replaces the oldest page, e.g. the page which has been in memory the longest. The data structure is easy to maintain, just a queue of page-frame allocations. Insert at the tail and replace at the head.

Advantage? Easy to implement. Disadvantage? The oldest pages are often the most heavily used. Swapping one out results in it needing to be paged back in very soon.

FIFO does not take process memory access behavior into account, so is not a good approximation to the optimal.

Least Recently Used (LRU) replacement

Uses past behavior to predict the future in this way: replace the page that has not been used for the longest period of time. This is the "back in time" equivalent to the optimal.

Implementing LRU is a bear because it requires significant overhead. One approach is to store clock value in frame table entry upon load, then at replacement time search frame table for oldest entry. Another approach is maintain stack of of all page numbers; when page is referenced pull it from wherever it is on stack and put it on top, then at replacement time the one on the bottom is the oldest. Both approaches require something to be updated at every memory reference.

Approximating LRU through aging

Pure LRU requires too much overhead. Here is an approach that approximate it at much lower cost. It combines the concepts of the clock, a reference bit, and a daemon OS process. Here's how it works:

define a history byte for each page, initialized to all 0's when page loaded
define a reference bit for each page, initialized to 0 when page loaded and set to 1 when page referenced
an OS daemon awakes periodically, perhaps every 100 ms, and does this for every allocated page:

shifts the history one bit right (this is the aging)
shifts the reference bit value into high order history bit
resets the reference bit to 0

at page replacement time, choose a page with the smallest history value.

The history byte in this case represents page usage over the past 800 ms, and the bit positions represents age. If never used, its value remains 0 because all the reference bit values shifted into it were 0. If frequently used, its value may be as high as 255 (all 1's). Low order bit is furthest peek into the past.

Approximating LRU through second chance

Here is a second LRU approximation approach that combines FIFO replacement and a reference bit. Here's how it works:

The tail of the FIFO list is linked to the head, forming a circular list (e.g. the marks on the clock).
As in FIFO, an newly-loaded page is added to the tail.
Define a hand that points to the oldest page in the clock (list).
Define the reference bit for each page, initialized to 0 when page loaded and set to 1 when page referenced
At page replacement time, check reference bit of page that hand is pointing to

if 0, replace that page
if 1, reset it to 0, move hand to the next page, and repeat

This last step represents the "second chance": if the oldest page has been recently used, it is given a second chance.

Determining how many frames to allocate

The OS can allocate the minimum number of frames. Consider the minimum number of frames needed by a single instruction. You may say "one" but in fact several may be required. Here are some possibilities:

The instruction itself may cross a page boundary. More a problem in machines having CISC (complex) instruction set because specialized instructions tend to require more operands. The instruction may require 6 or 8 bytes or more. RISC instructions are small and fixed length such as 4 bytes and do not overlap boundaries.
The instruction requires operand to be fetched from memory, and the operand is stored on a different page than the instruction.
The operand is an indirect one. I.e., the operand contains the address of a value to be fetched. The instruction, operand, and indirect address can all be on different pages.
The instruction set may allow multiple levels of indirection. The above example had one level. For each additional level, an additional page may be required.

The OS can allocate an equal share of available frames to every new process. This is simple to implement but not very effective.

The OS can allocate a number of frames proportional to the process size, priority or both. Larger and/or higher priority processes are allocated more frames than smaller and/or lower priority ones.

The OS can dynamically regulate the number of frames allocated to a process. Some methods are described below in the discussion of thrashing.

Thrashing

When demand paging occurs with high frequency, the OS spends an inordinate amount of time simply swapping pages between memory and disk. Because little productive work is accomplished despite the flurry of activity, the situation is called thrashing.

What causes this to happen? It is usually a combination of page replacement and CPU scheduling policy coupled with high process workload. Assume global page replacement:

a "greedy" process "steals" pages from other processes,
the affected processes subsequently generate more page faults which in turn take pages from yet other processes.
Recall that processes waiting for I/O are blocked, so the processes involved in paging cannot run.
as a result, the CPU's "ready" queue diminishes and CPU utilization decreases.
the OS notices this and increases the degree of multiprogramming - the number of processes that can reside in memory simultaneously!
this opens the floodgates to new processes
the new processes generate a large number of page faults to get started
those page faults take frames from other processes, which cause them to generate page faults
the CPU scheduler kicks up the degree of multiprogramming again! See where this is headed?

What can be done to keep this situation from occurring? Here are several techniques.

use local replacement: this prevents one process from stealing frames from another. Thrashing is localized to one process. This affects system performance somewhat but not that badly.
allocate as many frames as process needs: This cannot be predicted exactly. But if we assume that process execution follows temporal locality, the OS can calculate a working set model of the process and assure it has enough frames for its working set (details below)
regulate a process' page fault rate: If number of frames is too few then many page faults will occur - if the page fault rate is above an upper threshold then allocate extra frames to the process. If number of frames is too large, then few page faults will occur - if the page fault rate is below a lower threshold then take some frames away from the process.

The working set model of a process was mentioned above as one way to prevent thrashing.

This is based on the locality principle, which says a process works with a small set of pages (a locality) for awhile then rapidly transitions to another small set.

many page faults occur during transition to new locality
very few page faults occur once pages of a locality are in memory

The working set of a process is the set of all pages referenced in a time window extending back Δ references from the present.
The cardinality of this set estimates the size of the process localities and therefore the number of frames it should be allocated.
Value of Δ is critical. If too large, it can encompass more than one locality. If too small, it can encompass only part of a locality.
The OS adjusts the number of allocated frames to equal the size of the working set.
Requires some overhead because the working set changes dynamically as execution proceeds

[ C SC 340 | Peter Sanderson | Math Sciences server | Math Sciences home page | Otterbein ]

Last updated:
Peter Sanderson (PSanderson@otterbein.edu)