CSC 150 Chapter 5: Computer Organization and Architecture
primary information resource:
An Invitation to Computer Science (Java), Third Edition,
G. Michael Schneider and Judith L. Gersting,
Course Technology, 2007.
[ previous
| schedule
| next ]
Topic Index
[ Overview | Memory
| ALU | Control Unit | I/O
]
Overview
Computer organization: study of the functional units which comprise
the physical computer and their relationships.
This is a higher level of abstraction: computer is collection of functional
units which are themselves collection of circuits which are collection
of gates which are built of transistors.
Software analogy: OO software application is collection of classes which
are themselves collections of methods which are collections of statements
which are collections of machine instructions.
Advances in computer organization are marked by generations:
-
First generation computers constructed from vacuum tubes (roughly, the
1950s)
-
Second generation computers constructed from transistors (roughly, the
1960s)
-
Third generation computers constructed from integrated circuits (starting
in mid 1960s).
-
Fourth generation saw the first microcomputers (starting in mid 1970s)
-
It gets pretty blurry from there.
Nearly all modern digital computers are organized according to a
design known as the Von Neumann architecture. Pronounce it "NOI-man"
(just like Freud is pronounced "froid")
-
John Von Neumann was brilliant 20th century mathematician (and more).
-
Computers with Von Neumann architecture were first built just after WW
II.
-
Key was stored program concept (memory holds both data and programs)
-
Recall from C SC 100: stored program concept originated with Jacquard loom
in late 1700s. Instructions on punched cards, but no data (it was
not a computing machine).
-
Recall from C SC 100: Charles Babbage's Analytic Engine incorporated it
too. Instructions were on punched cards, but data storage was internal
to the Engine.
-
Von Neumann took Babbage's idea one step further, albeit 100 years later:
place data and instructions in same store.
Basic Von Neumann architecture
-
Memory to hold data and instructions
-
Processor consisting of control unit to fetch, decode, and execute instructions
one at a time, and arithmetic-logic unit (ALU) to perform arithmetic and
logic (e.g. comparison) operations.
-
Input/Output (I/O) units to communicate with outside world.
Organization of Memory
Holds data and instructions, a.k.a. RAM.
Logically consists of a linear sequence of addressable units known
generically as cells but universally implemented as bytes (8 bit chunks).
Addresses are numbered sequentially starting at 0 (0,1,2,3,….).
Maximum # addresses, and therefore bytes, determined by # bits available
to hold an address. Nearly all machines have 32 bit addresses, which
means there are 232 possible addresses. How many is that?
Memory capacity usually specified in kilobytes, megabytes, gigabytes,
terabytes.
-
Although a kilogram is 1000 grams, a kilobyte is not 1000 bytes!
Similar for the others.
-
Memory units have capacity which is power of 2 or multiple thereof.
-
Since 210 is 1024, which is pretty close to 1000, the kilo- prefix came
into use.
-
Similarly, 220 is 1,048,576 pretty close to 1 million, and the mega- prefix.
The two operations associated with memory are:
Fetch: retrieve the contents of a memory cell given its address
Store: modify the contents of a memory cell given its address and the
new contents
Memory unit has several basic components (see textbook Figure "Overall
RAM Organization"):
-
Memory cells organized in a 2D grid
-
Memory Address Register (MAR) to hold address of cell whose contents are
to be read or modified.
-
Memory Data Register (MDR) to hold value which has been read or which is
to be stored.
-
Fetch/Store controller, to control direction of flow between MDR and memory
cells (from cells to MDR for fetch, from MDR to cells for store).
-
Decoder circuits to transform address stored in MAR into the two activated
control lines that uniquely access the corresponding cell.
Additional notes about those components:
-
Use of 2-dimensional address grid greatly reduces total number of control
lines. For N-bit addresses, 1-dim requires 2N control lines, 2-dim
requires only 2 * sqrt(2N) = 2N/2+1control lines (example: for N=16, this
is 512 versus 65536). Also vastly simplifies decoder design,
although it requires two decoders (identical in structure).
-
The width, in bits, of the MAR is equal to the maximum allowable memory
addressing capacity, not the actual capacity for a given computer.
With 232 possible addresses, MAR is 32 bits wide.The width of the MDR is
determined by the number of bytes which can be read/written in one fetch/store
operation. It is not related to the width of the MAR, although it
is frequently the same, 32 bits (4 bytes).
-
A major factor in determining MDR width: want to minimize number of memory
fetches while at same time minimizing amount of wasted fetched data (fetched
but not used). These are contradictory goals. 32 bits allows
for efficient fetching of machine instructions.
Cache Memory
Look at a motherboard, what can you say about the location of memory
cards in relation to location of processor (CPU)? They are physically
separated.
As processors became faster, an instruction could be executed faster
than the next instruction could be fetched – memory became a bottleneck.
-
Possible solution: place all memory on same chip as processor. Drawbacks?
-
Possible solution: place a portion of memory on same chip as processor.
Drawbacks?
-
Possible solution: place a small but separate memory unit on same chip
as processor. Cache.
The last one is practical solution. Cache is pronounced “cash”.
You are probably familiar with this concept as applied by web browsers.
Describe what browser caching is.
If you read CPU specifications, you’ve probably also seen this (e.g.
L1 or level 1 cache, L2 or level 2 cache).
Memory access from cache is several times faster than access from memory
unit.
How this modifies fetch/store: When memory fetch is requested, first
look in cache. If there (called a "cache hit" or "hit")
then we're done. If not there, read from memory unit as usual but place extra
copy into cache. For memory store, store into cache as well.
We’ll skip the gory details (Such as, what if we read from memory unit
into cache but cache is already full? Or, what could happen when data written
to cache but not to memory unit?)
Cache works because of temporal locality principle. This means
when a given memory location is used it is likely to be used again very
soon (temporal refers to time). Example: updating loop counter.
To a lesser extent, spacial locality applies too. This means if
memory cell k is used, then cell k+1 will likely be used very soon.
Examples are sequential instructions, or array elements processed in a
loop.
The higher the cache hit rate, the better. This is percentage
of total memory access requests satisfied in the cache w/o going to memory
unit.
Example: suppose access time from memory unit is 60 ns (nanoseconds) and access
time from cache is 10 ns. If hit rate is 50%, then the average access
time is 10 ns * .5 + (10+60) ns * .5 = 40 ns. If hit rate is 90%, then
the average access time is 10 ns * .9 + (10+60) ns * .1 = 16 ns!
Organization of I/O units
-
Input and Output devices are generally referred to as peripherals.
-
Their operation is in general orders of magnitude slower than the electronic
memory and processing units.
-
Coordination between memory/processor and I/O units is difficult because
of operation speed differences. Suppose the CPU had to sit idle waiting
for your every keystroke?
-
I/O controllers handle the interface between memory/processor and I/O devices.
-
Buffers are used to hold data being read/written, and interrupt signals
are used to alert memory/processor units that the I/O operation is complete.
-
Most I/O units are either used for human-computer interaction (keyboard,
monitor, speakers) or for mass data storage (disk, tape).
-
Mass storage devices classified as either Direct Access or Sequential Access.
-
Nearly all direct access devices are rotating disks of some sort (hard
disk, floppy disk, CD)
-
Nearly all sequential access devices are linear tape devices (reeled tape,
cassette)
Disk as example of Direct Access.
-
Each recordable surface is flat, donut-shaped and magnetic.
-
Each storage cell must be uniquely addressable.
-
Data are stored in fixed length cells called sectors.
-
Sectors are stored side-by-side to form a circle called a track.
-
Tracks are stored concentrically on the disk.
-
The disk rotates rapidly.
-
Read/Write head is mounted on arm which can move back and forth between
outside edge of disk and its center.
-
To read/write a sector, arm must be positioned over the desired track and
disk must be rotated to start of that sector.
-
Access time varies (falls in the 10s of milliseconds range) and has 3 components:
-
Seek delay – time for arm to arrive at desired track
-
Rotational delay – time for disk to rotate to start of desired sector.
-
Transfer time – time to read/write the sector.
-
The above general description applies to hard disks, floppies and CDs.
Sequential Access storage
-
Storage cells are arranged linearly on ribbon of tape.
-
This eliminates seek delay since read/write head is fixed.
-
The “rotational” delay can be a killer since in the worst case every storage
cell on the tape has to pass under the read/write head.
-
Used normally only for operations such as backup/restore, which require
a large number of cells to be read/written in sequence. Eliminates
“rotational” delay.
Organization of ALU
Arithmetic/Logic unit, surprisingly, performs arithmetic (+,-,*,/) and
logical (=,<,>, bitwise AND/OR/NOT) operations.
The ALU located on the processor chip because it works so closely with
the control unit and needs to be very fast (and because it does the processing!)
Operations are performed on two (or in the case of NOT, one) input values
with one resulting output value.
Where are the input and output values stored? In registers. A
register is a small fixed length circuit to hold a single value.
It is not economical for the ALU to reach out to the memory unit to
find its inputs and store its output, so prior to the operation the input
values are fetched from memory and copied into two registers within the
ALU. The result is placed into a third register, and only later stored
to memory.
For added speed and flexibility, most ALUs have 16 to 32 registers (a.k.a.
general purpose registers).
To carry out an operation, the ALU needs to know which registers hold
its operands and which register to put the result into. Control lines
handle this.
For simplicity and speed, the ALU actually performs all its operations
in parallel on the same input! All the outputs then go into a multiplexor
circuit. Selector lines also go into the multiplexor, to tell it
which operation was the desired one. The multiplexor produces a single
output: the result of the selected operation.
For instance, suppose we want to add two values. The ALU performs
addition, equality, AND, and OR. The input values are loaded into
registers then all four operations are performed simultaneously.
The four results all go into the multiplexor. The selector lines
are set to allow only the results of the addition to be output from the
multiplexor.
Organization of Control Unit
Function of control unit is to fetch instruction from memory, decode
it, and execute it.
Such instructions are called machine instructions.
Format of instruction: operation code followed by 0 or more operand
values.
Operation codes
-
machine implements, in hardware, a fixed set of different operations
-
Examples are: LOAD data from memory, STORE data to memory, ADD two values,
JUMP to instruction at specified location, COMPARE value to 0, etc.
-
Operation code is fixed length binary value. This limits the number
of possible operations.
-
Collection of operation codes defines instruction set.
-
Instruction set design philosophy of small number of general operations:
RISC
-
Instruction set design philosophy of large number of specific operations:
CISC
-
RISC requires execution of more operations to perform a given function,
CISC requires more complex circuitry for decoding and execution.
-
Instruction set is thus tied inextricably to hardware; this limits the
portability of compiled programs.
-
Each operation has associated circuitry to carry it out. To execute
an instruction, the instruction decoder senses the operation code and activates
the control line for the appropriate circuit.
Operand values (address fields)
-
Each operand represents an address or (less frequently) a value.
-
Operand field must be wide enough to contain any memory address (same as
MAR).
-
Depending on instruction format, there may be from 0 to 3 operands.
-
Different instructions in same set may require different numbers of operands.
For instance ADD may require 3 (addresses for the two values to be added
plus address to store result) and JUMP requires only 1 (address of instruction
to be fetched next).
Registers required by Control Unit
-
Instruction Register (IR) holds the fetched copy of the instruction to
be executed.
-
Program Counter (PC) holds the address of the next instruction to be fetched
and executed. Unless set to a specific value by the current instruction
(e.g. JUMP), the address is simply incremented to fetch the next sequential
instruction.
Simplified fetch/decode/execute cycle (no cache)
1. Address in Program Counter is copied into the Memory Address Register.
2. Instruction is fetched and placed in Memory Data Register.
3. Instruction in Memory Data Register is copied into the Instruction
Register.
4. Address in Program Counter is incremented.
5. The operation code from the Instruction Register goes into the instruction
decoder.
6. The instruction decoder activates the circuit for that operation
code.
7. The circuit executes the instruction.
Instruction Set used in textbook and lab manual.
See textbook figure "Instruction Set for Our Von Neumann Machine".
Four classes of instructions: data transfer, arithmetic, comparison,
branching.
END OF MATERIAL FOR EXAM 1.
[ C
SC 150 | Peter
Sanderson | Math Sciences server
| Math Sciences home page
| Otterbein ]
Last updated:
Peter Sanderson (PSanderson@otterbein.edu)