Computer Organization and Architecture

CSC 150 Chapter 5: Computer Organization and Architecture
primary information resource: An Invitation to Computer Science (Java), Third Edition, G. Michael Schneider and Judith L. Gersting, Course Technology, 2007.

[ previous | schedule | next ]

Topic Index
[ Overview | Memory | ALU | Control Unit | I/O ]

Overview

Computer organization: study of the functional units which comprise the physical computer and their relationships.

This is a higher level of abstraction: computer is collection of functional units which are themselves collection of circuits which are collection of gates which are built of transistors.

Software analogy: OO software application is collection of classes which are themselves collections of methods which are collections of statements which are collections of machine instructions.

Advances in computer organization are marked by generations:

First generation computers constructed from vacuum tubes (roughly, the 1950s)
Second generation computers constructed from transistors (roughly, the 1960s)
Third generation computers constructed from integrated circuits (starting in mid 1960s).
Fourth generation saw the first microcomputers (starting in mid 1970s)
It gets pretty blurry from there.

Nearly all modern digital computers are organized according to a design known as the Von Neumann architecture. Pronounce it "NOI-man" (just like Freud is pronounced "froid")

John Von Neumann was brilliant 20th century mathematician (and more).
Computers with Von Neumann architecture were first built just after WW II.
Key was stored program concept (memory holds both data and programs)
Recall from C SC 100: stored program concept originated with Jacquard loom in late 1700s. Instructions on punched cards, but no data (it was not a computing machine).
Recall from C SC 100: Charles Babbage's Analytic Engine incorporated it too. Instructions were on punched cards, but data storage was internal to the Engine.
Von Neumann took Babbage's idea one step further, albeit 100 years later: place data and instructions in same store.

Basic Von Neumann architecture

Memory to hold data and instructions
Processor consisting of control unit to fetch, decode, and execute instructions one at a time, and arithmetic-logic unit (ALU) to perform arithmetic and logic (e.g. comparison) operations.
Input/Output (I/O) units to communicate with outside world.

Organization of Memory

Holds data and instructions, a.k.a. RAM.
Logically consists of a linear sequence of addressable units known generically as cells but universally implemented as bytes (8 bit chunks).

Addresses are numbered sequentially starting at 0 (0,1,2,3,….). Maximum # addresses, and therefore bytes, determined by # bits available to hold an address. Nearly all machines have 32 bit addresses, which means there are 2³² possible addresses. How many is that?

Memory capacity usually specified in kilobytes, megabytes, gigabytes, terabytes.

Although a kilogram is 1000 grams, a kilobyte is not 1000 bytes! Similar for the others.
Memory units have capacity which is power of 2 or multiple thereof.
Since 2¹⁰ is 1024, which is pretty close to 1000, the kilo- prefix came into use.
Similarly, 2²⁰ is 1,048,576 pretty close to 1 million, and the mega- prefix.

The two operations associated with memory are:
Fetch: retrieve the contents of a memory cell given its address
Store: modify the contents of a memory cell given its address and the new contents

Memory unit has several basic components (see textbook Figure "Overall RAM Organization"):

Memory cells organized in a 2D grid
Memory Address Register (MAR) to hold address of cell whose contents are to be read or modified.
Memory Data Register (MDR) to hold value which has been read or which is to be stored.
Fetch/Store controller, to control direction of flow between MDR and memory cells (from cells to MDR for fetch, from MDR to cells for store).
Decoder circuits to transform address stored in MAR into the two activated control lines that uniquely access the corresponding cell.

Additional notes about those components:

Use of 2-dimensional address grid greatly reduces total number of control lines. For N-bit addresses, 1-dim requires 2N control lines, 2-dim requires only 2 * sqrt(2N) = 2N/2+1control lines (example: for N=16, this is 512 versus 65536). Also vastly simplifies decoder design, although it requires two decoders (identical in structure).
The width, in bits, of the MAR is equal to the maximum allowable memory addressing capacity, not the actual capacity for a given computer. With 2³² possible addresses, MAR is 32 bits wide.The width of the MDR is determined by the number of bytes which can be read/written in one fetch/store operation. It is not related to the width of the MAR, although it is frequently the same, 32 bits (4 bytes).
A major factor in determining MDR width: want to minimize number of memory fetches while at same time minimizing amount of wasted fetched data (fetched but not used). These are contradictory goals. 32 bits allows for efficient fetching of machine instructions.

Cache Memory

Look at a motherboard, what can you say about the location of memory cards in relation to location of processor (CPU)? They are physically separated.

As processors became faster, an instruction could be executed faster than the next instruction could be fetched – memory became a bottleneck.

Possible solution: place all memory on same chip as processor. Drawbacks?
Possible solution: place a portion of memory on same chip as processor. Drawbacks?
Possible solution: place a small but separate memory unit on same chip as processor. Cache.

The last one is practical solution. Cache is pronounced “cash”. You are probably familiar with this concept as applied by web browsers. Describe what browser caching is.

If you read CPU specifications, you’ve probably also seen this (e.g. L1 or level 1 cache, L2 or level 2 cache).

Memory access from cache is several times faster than access from memory unit.

How this modifies fetch/store: When memory fetch is requested, first look in cache. If there (called a "cache hit" or "hit") then we're done. If not there, read from memory unit as usual but place extra copy into cache. For memory store, store into cache as well.

We’ll skip the gory details (Such as, what if we read from memory unit into cache but cache is already full? Or, what could happen when data written to cache but not to memory unit?)

Cache works because of temporal locality principle. This means when a given memory location is used it is likely to be used again very soon (temporal refers to time). Example: updating loop counter.

To a lesser extent, spacial locality applies too. This means if memory cell k is used, then cell k+1 will likely be used very soon. Examples are sequential instructions, or array elements processed in a loop.

The higher the cache hit rate, the better. This is percentage of total memory access requests satisfied in the cache w/o going to memory unit.

Example: suppose access time from memory unit is 60 ns (nanoseconds) and access time from cache is 10 ns. If hit rate is 50%, then the average access time is 10 ns * .5 + (10+60) ns * .5 = 40 ns. If hit rate is 90%, then the average access time is 10 ns * .9 + (10+60) ns * .1 = 16 ns!

Organization of I/O units

Input and Output devices are generally referred to as peripherals.
Their operation is in general orders of magnitude slower than the electronic memory and processing units.
Coordination between memory/processor and I/O units is difficult because of operation speed differences. Suppose the CPU had to sit idle waiting for your every keystroke?
I/O controllers handle the interface between memory/processor and I/O devices.
Buffers are used to hold data being read/written, and interrupt signals are used to alert memory/processor units that the I/O operation is complete.
Most I/O units are either used for human-computer interaction (keyboard, monitor, speakers) or for mass data storage (disk, tape).
Mass storage devices classified as either Direct Access or Sequential Access.
Nearly all direct access devices are rotating disks of some sort (hard disk, floppy disk, CD)
Nearly all sequential access devices are linear tape devices (reeled tape, cassette)

Disk as example of Direct Access.

Each recordable surface is flat, donut-shaped and magnetic.
Each storage cell must be uniquely addressable.
Data are stored in fixed length cells called sectors.
Sectors are stored side-by-side to form a circle called a track.
Tracks are stored concentrically on the disk.
The disk rotates rapidly.
Read/Write head is mounted on arm which can move back and forth between outside edge of disk and its center.
To read/write a sector, arm must be positioned over the desired track and disk must be rotated to start of that sector.
Access time varies (falls in the 10s of milliseconds range) and has 3 components:

Seek delay – time for arm to arrive at desired track
Rotational delay – time for disk to rotate to start of desired sector.
Transfer time – time to read/write the sector.

The above general description applies to hard disks, floppies and CDs.

Sequential Access storage

Storage cells are arranged linearly on ribbon of tape.
This eliminates seek delay since read/write head is fixed.
The “rotational” delay can be a killer since in the worst case every storage cell on the tape has to pass under the read/write head.
Used normally only for operations such as backup/restore, which require a large number of cells to be read/written in sequence. Eliminates “rotational” delay.

Organization of ALU

Arithmetic/Logic unit, surprisingly, performs arithmetic (+,-,*,/) and logical (=,<,>, bitwise AND/OR/NOT) operations.

The ALU located on the processor chip because it works so closely with the control unit and needs to be very fast (and because it does the processing!)

Operations are performed on two (or in the case of NOT, one) input values with one resulting output value.

Where are the input and output values stored? In registers. A register is a small fixed length circuit to hold a single value.

It is not economical for the ALU to reach out to the memory unit to find its inputs and store its output, so prior to the operation the input values are fetched from memory and copied into two registers within the ALU. The result is placed into a third register, and only later stored to memory.

For added speed and flexibility, most ALUs have 16 to 32 registers (a.k.a. general purpose registers).

To carry out an operation, the ALU needs to know which registers hold its operands and which register to put the result into. Control lines handle this.

For simplicity and speed, the ALU actually performs all its operations in parallel on the same input! All the outputs then go into a multiplexor circuit. Selector lines also go into the multiplexor, to tell it which operation was the desired one. The multiplexor produces a single output: the result of the selected operation.

For instance, suppose we want to add two values. The ALU performs addition, equality, AND, and OR. The input values are loaded into registers then all four operations are performed simultaneously. The four results all go into the multiplexor. The selector lines are set to allow only the results of the addition to be output from the multiplexor.

Organization of Control Unit

Function of control unit is to fetch instruction from memory, decode it, and execute it.

Such instructions are called machine instructions.

Format of instruction: operation code followed by 0 or more operand values.

Operation codes

machine implements, in hardware, a fixed set of different operations
Examples are: LOAD data from memory, STORE data to memory, ADD two values, JUMP to instruction at specified location, COMPARE value to 0, etc.
Operation code is fixed length binary value. This limits the number of possible operations.
Collection of operation codes defines instruction set.
Instruction set design philosophy of small number of general operations: RISC
Instruction set design philosophy of large number of specific operations: CISC
RISC requires execution of more operations to perform a given function, CISC requires more complex circuitry for decoding and execution.
Instruction set is thus tied inextricably to hardware; this limits the portability of compiled programs.
Each operation has associated circuitry to carry it out. To execute an instruction, the instruction decoder senses the operation code and activates the control line for the appropriate circuit.

Operand values (address fields)

Each operand represents an address or (less frequently) a value.
Operand field must be wide enough to contain any memory address (same as MAR).
Depending on instruction format, there may be from 0 to 3 operands.
Different instructions in same set may require different numbers of operands. For instance ADD may require 3 (addresses for the two values to be added plus address to store result) and JUMP requires only 1 (address of instruction to be fetched next).

Registers required by Control Unit

Instruction Register (IR) holds the fetched copy of the instruction to be executed.
Program Counter (PC) holds the address of the next instruction to be fetched and executed. Unless set to a specific value by the current instruction (e.g. JUMP), the address is simply incremented to fetch the next sequential instruction.

Simplified fetch/decode/execute cycle (no cache)
1. Address in Program Counter is copied into the Memory Address Register.
2. Instruction is fetched and placed in Memory Data Register.
3. Instruction in Memory Data Register is copied into the Instruction Register.
4. Address in Program Counter is incremented.
5. The operation code from the Instruction Register goes into the instruction decoder.
6. The instruction decoder activates the circuit for that operation code.
7. The circuit executes the instruction.

Instruction Set used in textbook and lab manual.
See textbook figure "Instruction Set for Our Von Neumann Machine".
Four classes of instructions: data transfer, arithmetic, comparison, branching.

END OF MATERIAL FOR EXAM 1.

[ C SC 150 | Peter Sanderson | Math Sciences server | Math Sciences home page | Otterbein ]

Last updated:
Peter Sanderson (PSanderson@otterbein.edu)