COMP 3400 Lecture 9: Virtual Memory
[ previous
| schedule
| next ]
What is virtual memory?
Recall the previous lecture concerned organizing memory and
translating logical memory addresses to physical addresses. Then follow this
line of reasoning that motivates virtual memory.
- The OS should support multiprogramming so the CPU will not be idle while a
process is waiting on I/O or some other event
- An instruction can be executed in a fetch/decode/execute cycle only if it
is stored in (main) memory
- As a result of 1 and 2, it is necessary to keep more than one process in memory
at a time
- However it is not possible to store more than one process beginning
at memory location 0 (or some other fixed address) at the same time.
- As a result of 4, it is not possible for a compiler to generate absolute
addresses because it does not know what address the process will be loaded into.
- As a result of 5, the compiler generates logical addresses which assume starting
address is 0.
- As a result of 6, address translation hardware is needed to translate each logical address request
into a physical address at runtime.
- There is no good reason to restrict logical address space from exceeding physical address space.
- There is no good reason to require that an entire process be in memory while it is
running - vast parts of it are unused at any given time or possibly not used at all!
- If we allow both 8 and 9, then processes can be larger than the total
physical memory! Programs and processes thus occupy virtual memory.
Point #9 is the final key to virtual memory. By allowing a partially-loaded process
to run, the OS:
- loads processes faster, since only a portion needs to be loaded
- allows the programmer to compose programs (including data) larger than physical memory
- allows more processes to be memory-resident simultaneously, and therefore a higher
degree of multiprogramming
Our study of virtual memory is limited to systems that use paged memory allocation.
Assume the
translation hardware consists of registers for the virtual and physical address plus a basic
page table. The TLB and all variations of the page table are allowed but are not
relevant to the topics at hand.
The major topics we cover here
- how to recognize that a new page needs to be read into memory
- how much time it takes to perform a memory access when a new page needs to be read into memory
- which page to replace if a new page is needed but no frames are available
- how many frames to allocate to a process
- what thrashing is and how to avoid it
Demand Paging
Demand paging refers to the practice of not loading a page into memory
until it is needed by the running process. E.g. load the page on demand. If
no pages are initially loaded for a new process, it is called pure demand
paging.
The mechanism for determining that a page needs loaded is relatively simple
- Each page table entry has a valid-invalid bit.
- Valid means the page is
loaded into memory and the frame number is correct.
- Invalid means either
the page number is beyond the process address space (which is a program error),
or the page is not currently loaded into a frame.
- address translation hardware will trap to OS if page table entry bit set to invalid.
- OS trap handler determines which condition has occurred:
- if the page was outside process address space, terminate process
- if the page was not loaded, then page it in
When a process requests memory located in a page not yet loaded, a page fault
occurs. The OS must allocate a frame to the demanded page. This is potentially
very complex (to do well) and involves OS policy.
Before getting into that, here are some other issues that are usually pretty easy to
solve:
- Where do pages reside when not loaded in a frame? On disk. Normally,
however, the OS reserves special swap space on the disk which is
managed separately from files, to achieve the fastest possible response
- How long does it take to service a page demand? This requires a
disk read and thus depends on on current queuing, seek and rotational delays.
Textbook calculations conclude an average of 8 milliseconds.
- How frequently do page demands occur? At 8 milliseconds to service
a demand, hopefully very rarely! Effective memory access time doubles
if only 1 demand occurs every 40,000 memory accesses! This is affected both
by process behavior and OS policy.
- What happens to the running process while the page demand is being serviced?
The OS will not allow CPU to sit idle for 8 milliseconds. The CPU scheduler
will select a different process to run and the context switch will be performed.
Note that page demands may occur in the middle of an instruction
execution (e.g. reading an operand in from memory), yet the process cannot
later resume from the middle. It has to restart the current instruction when
the process resumes. For most machine instruction sets, especially RISC, this is not a problem.
Page Replacement
When a page demand occurs, the OS (among other things):
- consults its frame table to find
an available frame,
- allocates the frame to the page,
- reads the page contents from disk
into the frame, and
- updates its frame table and the process'
page table.
But what happens if there are no available frames? The process has
an immediate demand for the page, and it should be serviced promptly. It is
the OS reasonability to replace a currently allocated page. But which to choose?
This is the page replacement problem.
In this situation, additional steps are needed between 1 and 2.
1a. if no frames are available, select a victim page for replacement, and
1b. if necessary, save the victim to swap storage (disk)
Step 1b requires yet another disk operation, which slows down the paging
process even more! Fortunately, it can often be avoided. How? Add a modified bit
(aka dirty bit) to the frame table or page table entry. Modified bit is 0 when page loaded, and marked 1 when
any location in that page is modified. If it is still 0 at replacement time, it need
not be written to disk. Similarly, if protection bit indicates page is read-only, then it need not be written to disk.
The page to be replaced can be selected either from among those in
the same process, called local replacement, or from among those in
all processes, called global replacement. The replacement techniques
described here apply equally well to both.
An optimal page replacement technique
The ideal technique is to replace the page that will not be used again for
the longest time period! Remind you of SJF scheduling? Same implementation problem, but
can be used as a benchmark.
As with SJF, we'll try to approximate the ideal by using past behavior to predict
future behavior.
First In First Out (FIFO) replacement
This technique replaces the oldest page, e.g. the page which has been in memory
the longest. The data structure is easy to maintain, just a queue of page-frame
allocations. Insert at the tail and replace at the head.
Advantage? Easy to implement. Disadvantage? The oldest pages are often the
most heavily used. Swapping one out results in it needing to be paged back in very soon.
FIFO does not take process memory access behavior into account, so is not a good
approximation to the optimal.
Least Recently Used (LRU) replacement
Uses past behavior to predict the future in this way: replace the page that
has not been used for the longest period of time. This is the "back in time"
equivalent to the optimal.
Implementing LRU is a bear because it requires significant overhead. One approach
is to store clock value in frame table entry upon load, then at replacement
time search frame table for oldest entry. Another approach is maintain stack
of of all page numbers; when page is referenced pull it from wherever it is
on stack and put it on top, then at replacement time the one on the bottom is
the oldest. Both approaches require something to be updated at every memory
reference.
Approximating LRU through aging
Pure LRU requires too much overhead. Here is an approach that approximate it
at much lower cost. It combines the concepts of the clock, a reference bit, and
an OS process running in the background. Here's how it works:
- define a history byte for each page, initialized to all 0's when page loaded
- define a reference bit for each page, initialized to 0 when page loaded and set to 1
when page referenced
- an OS background process awakes periodically, perhaps every 100 ms, and does this for every allocated page:
- shifts the history byte's value one bit right (this is the aging)
- shifts the reference bit value into high order history bit
- resets the reference bit to 0
- at page replacement time, choose a page with the smallest history value.
The history byte in this example represents page usage over the past 800 ms, and the
bit positions represents age. If
never used, its value remains 0 because all the reference bit values shifted into
it were 0. If frequently used, its value may be as high as 255 (all 1's). Low
order bit is furthest peek into the past.
There are other algorithms for approximating LRU.
Determining how many frames to allocate
The OS can allocate the minimum number of frames. Consider the minimum
number of frames needed by a single instruction. You may say "one" but in fact
several may be required. Here are some possibilities:
- The instruction itself may cross a page boundary. More a problem in machines
having CISC (complex) instruction set because specialized instructions tend to require
more operands. The instruction may require 6 or 8 bytes or more. RISC instructions
are small and fixed length such as 4 bytes and do not overlap page boundaries.
- The instruction requires operand to be fetched from memory, and the operand is
stored on a different page than the instruction.
- The operand is an indirect one. I.e., the operand contains the
address of a value to be fetched. The instruction, operand, and indirect address
can all be on different pages.
- The instruction set may allow multiple levels of indirection. The above example
had one level. For each additional level, an additional page may be required.
The OS can allocate an equal share of available frames to every new
process. This is simple to implement but not very effective.
The OS can allocate a number of frames proportional to the process
size, priority or both. Larger and/or higher priority processes are allocated
more frames than smaller and/or lower priority ones.
The OS can dynamically regulate the number of frames allocated to
a process. Some methods are described below in the discussion of thrashing.
Thrashing
When demand paging occurs with high frequency, the OS spends an inordinate amount of time
simply swapping pages between memory and disk. Because little productive work is accomplished despite
the flurry of activity, the situation is called thrashing.
What causes this to happen? It is usually a combination of page replacement
and CPU scheduling policy coupled with high process workload. Assume global
page replacement:
- a "greedy" process "steals" pages from other processes,
- the affected processes subsequently generate more page faults which in turn take
pages from yet other processes.
- processes involved page faults are considered blocked (waiting for "input").
- as a result, the CPU's "ready" queue diminishes and CPU utilization decreases.
- the OS notices this and increases the degree of multiprogramming
- the number of processes that can reside in memory simultaneously!
- this opens the floodgates to new processes
- the new processes generate a large number of page faults to get started
- those page faults take frames from other processes, which cause them
to generate page faults
- the CPU scheduler kicks up the degree of multiprogramming again! See where this is headed?
What can be done to keep this situation from occurring? Here are several techniques.
- use local replacement: this prevents one process from stealing frames
from another. Thrashing is localized to one process. This affects system performance
somewhat but not that badly.
- allocate as many frames as process needs: This cannot be predicted
exactly. But if we assume that process execution follows temporal locality,
the OS can calculate a working set model of the process and assure it
has enough frames for its working set (details below)
- regulate a process' page fault rate: If number of frames is too
few then many page faults will occur - if the page fault rate is above an
upper threshold then allocate extra frames to the process. If number of frames
is too large, then few page faults will occur - if the page fault rate is
below a lower threshold then take some frames away from the process.
The working set model of a process was mentioned above as one way
to prevent thrashing.
- This is based on the locality principle, which says a process works
with a small set of pages (a locality) for awhile then rapidly transitions
to another small set.
- many page faults occur during transition to new locality
- very few page faults occur once pages of a locality are in memory
- The working set of a process is the set of all pages referenced
in a time window extending back Δ references from the present.
- The cardinality of this set estimates the size of the process localities
and therefore the number of frames it should be allocated.
- Value of Δ is critical. If too large, it can encompass more than one
locality. If too small, it can encompass only part of a locality.
- The OS adjusts the number of allocated frames to equal the size of the working
set.
- Requires some overhead because the working set changes dynamically as execution
proceeds
[ COMP 3400
| Peter Sanderson
| Math Sciences home page
| Otterbein
]
Last updated:
Peter Sanderson (PSanderson@otterbein.edu)