CSC 150 Chapter 6: Assembly Language and Assemblers
primary information resource:
An Invitation to Computer Science (Java), Third Edition,
G. Michael Schneider and Judith L. Gersting,
Course Technology, 2007.
[ previous | schedule | next ]
Programming in binary or machine code is really no fun!
Consider this program, written in the text/lab machine language.
What does this program do? First instruction is at address 0.
Address Contents
00000000 0000000000001100
00000001 0011000000001101
00000010 0001000000001110
00000011 0111000000010000
00000100 1001000000000111
00000101 0001000000001111
00000110 1000000000001010
00000111 0000000000010000
00001000 0101000000001110
00001001 0001000000001111
00001010 1110000000001111
00001011 1111000000000000
00001100 0000000000000110
00001101 0000000000000111
00001110 0000000000000000
00001111 0000000000000000
00010000 0000000000000000
Recall from the text that each of the 16 operations has a corresponding
name. This is called an op code mnemonic.
The noun “mnemonic” means “devices and techniques to improve memory”
(http://www.answers.com/topic/mnemonic)
When you apply the mnemonics (see Figure 5.19 or 6.5 in the text) to the above code, the result is:
00000000 LOAD 000000001100
00000001 ADD 000000001101
00000010 STORE 000000001110
00000011 COMPARE 000000010000
00000100 JUMPGT 000000000111
00000101 STORE 000000001111
00000110 JUMP 000000001010
00000111 LOAD 000000010000
00001000 SUBTRACT 000000001110
00001001 STORE 000000001111
00001010 OUT 000000001111
00001011 HALT 000000000000
00001100 0000000000000110
00001101 0000000000000111
00001110 0000000000000000
00001111 0000000000000000
00010000 0000000000000000
Why didn’t I assign the LOAD mnemonic to the last 5 “instructions”? Because it turns out they are not instructions, but data! You can’t easily see this by looking at the machine code, but could probably guess it based on the operands of the first few instructions.
Through op code mnemonics, we’ve made a huge leap in understandability. The tradeoff is that now our program can no longer be directly executed on the machine. It has to be translated into machine code.
The software that performs this translation is called an assembler.
Translating op code mnemonics is one duty of an assembler |
|
If you like, we can write the numbers in decimal and get rid of leading zeroes. The result is:
0 LOAD 12
1 ADD 13
2 STORE 14
3 COMPARE 16
4 JUMPGT 7
5 STORE 15
6 JUMP 10
7 LOAD 16
8 SUBTRACT 14
9 STORE 15
10 OUT 15
11 HALT
12
6
13
7
14
0
15
0
16
0
You will probably find this a little more readable than the hexadecimal version.
This is a good time to explain the logic of statements 3 and 4.
Statement 3 compares the contents of address 16 with contents of register R, then sets the corresponding condition code.
Statement 4 checks the CCR. If it contains 100 (GT), execution control jumps to address 7 (e.g. 7 is written into the PC). Otherwise, control continues to the next statement.
This is the basis for conditional control (e.g. if-then-else). Notice that control jumps when the condition is true and drops through when it is false. This is the opposite of what we are used to.
Translating numeric values is another duty of the assembler. |
|
Suppose I wanted to add a third value, 4, to the initial sum?
I’ll store the value 4 at the end, and insert a second ADD statement
between the ADD and STORE. Like so:
0 LOAD 12
1 ADD 13
ADD ?? <-- insert
2 STORE 14
3 COMPARE 16
4 JUMPGT 7
5 STORE 15
6 JUMP 10
7 LOAD 16
8 SUBTRACT 14
9 STORE 15
10 OUT 15
11 HALT
12
6
13
7
14
0
15
0
16
0
4 <-- insert
Now we’ve opened a can of worms!
Does the term maintainability mean anything to you?
The solution to this mess is to allow use of symbolic labels to refer to memory addresses.
There are two aspects to labels: definition and use.
Here I've defined some labels both for data addresses (similar to Java variables) and also labels for some instruction addresses to serve as the target for JUMP instructions.
LOAD
A
ADD
B
STORE
C
COMPARE ZERO
JUMPGT NEGATE
STORE
D
JUMP
OUTPUT
NEGATE: LOAD ZERO
SUBTRACT C
STORE
D
OUTPUT: OUT D
HALT
A:
6
B:
7
C:
0
D:
0
ZERO:
0
Now with labels, what would change if I wanted to add a third value to the initial sum? Nice.
Handling labels makes one more job for the assembler. |
|
We’re close but not quite there! The assembler needs a little housekeeping information from you. This is provided using pseudo operations, assembly language operations that do not generate instructions or data.
.BEGIN
LOAD
A
ADD
B
STORE
C
COMPARE ZERO
JUMPGT NEGATE
STORE
D
JUMP
OUTPUT
NEGATE: LOAD ZERO
SUBTRACT C
STORE
D
OUTPUT: OUT D
HALT
A: .DATA 6
B: .DATA 7
C: .DATA 0
D: .DATA 0
ZERO: .DATA 0
.END
Here’s one you can “take to the bank”! You can copy this code and paste it into the Assembler that came with the lab manual.
NOTE: Due to a quirk in the lab software, you will get an error message on the “.END” statement – change it to lower case and it will assemble and execute.
Handling pseudo ops is yet another duty of the assembler. |
|
Appreciate high level languages yet? Here it is in Java...
int a=6;
int b=7;
int c,d;
c = a + b;
if (c >= 0)
d = c;
else
d = -c;
System.out.print(d);
Through this running example, we encountered several duties of the assembler:
The assembler must also do a couple other things:
Once successfully assembled and stored as an object file, the object program can be executed. A small utility called the loader is responsible for loading it from the file into memory and initializing the Program Counter (PC) to the address of the first instruction.
Translating op code mnemonics to binary
Data structure: op code table, each row has op code mnemonic and corresponding binary code.
Logic: when a statement is read, look for a match between its
operator and an op code table entry. If match is found, set op code
to the corresponding binary code. If not, generate error message,
set op code to all 1s (halt), and keep going.
Should we use a sequential or a binary search?
Translating numeric values to binary
Data structure: none
Logic: decimal-to-binary translation algorithm.
Handling labels
Data structure: symbol table, each row contains label and corresponding binary address.
Logic: two cases to consider: label definition and label use.
Label definition. When label definition is detected, store it plus current location counter (address of instruction currently being assembled) value in the symbol table. If already there, generate error message and keep going.
Label use. When label use is read (symbolic operand), look for a match in the symbol table. If match is found, set operand (address field) to corresponding address. If not, generate error message, set operand to all 0s, and keep going.
PROBLEM: What if label use precedes its definition? Notice how many times this happens in our running example. We clearly need to allow this.
TWO REASONABLE SOLUTIONS:
.begin
load A LOOP: compare B jumpgt DONE increment B out B jump LOOP DONE: halt A: .data 9 B: .data 0 .end |
Forward References A B DONE |
Backward References LOOP |
Handling pseudo ops
Data Structures: none
Logic: When .BEGIN is read, activate assembler logic.
When .END is read, the assembler is finished with the pass.
When .DATA is read, the assembler reads and translates the operand
value into binary and adds it to generated machine code.
If incorrect pseudo-code is read, generate error message and
keep going.
Another example program, this time in assembly. It is an example from my CSC 120 lecture notes on “for” loops. The Java code is:
int result = 0;
for ( int counter=1; counter<=7; counter++ ) {
result = result + counter;
}
.BEGIN
CLEAR
RESULT
LOAD
FIRST
STORE
COUNTER
LOOP: LOAD FINAL
COMPARE
COUNTER
JUMPGT
FINISH
LOAD
RESULT
ADD
COUNTER
STORE
RESULT
INCREMENT COUNTER
JUMP
LOOP
FINISH: HALT
FIRST: .DATA 1
FINAL: .DATA 7
COUNTER: .DATA 0
RESULT: .DATA 0
.END
See the correspondence between assembly code and Java?
.BEGIN
CLEAR
RESULT -- int result = 0;
LOAD
FIRST
STORE
COUNTER -- int counter = 1;
LOOP: LOAD FINAL
COMPARE
COUNTER
JUMPGT
FINISH -- counter <= 7;
LOAD
RESULT
ADD
COUNTER -- result = result +
STORE
RESULT -- counter;
INCREMENT COUNTER -- counter++
JUMP
LOOP
FINISH: HALT
FIRST: .DATA 1
FINAL: .DATA 7
COUNTER: .DATA 0
RESULT: .DATA 0
.END
What changes are required to run the counter down instead of up? Does this affect the result?
Another Example: Largest Fibbonacci number smaller than 20
int first = 1; // first value in sequence
int second = 1; // second value in sequence
int result = 2; // third value in sequence
int limit = 20;
while (result < limit) {
first = second;
second = result;
result = first + second;
}
result = second; // “undo” last calculation
.BEGIN
LOOP: LOAD LIMIT
COMPARE
RESULT
JUMPGT
DONE -- done if RESULT bigger
JUMPEQ
DONE -- done if RESULT equal
LOAD
SECOND
STORE
FIRST -- first = second;
LOAD
RESULT
STORE
SECOND -- second = result;
LOAD
FIRST
ADD
SECOND
STORE
RESULT -- result=first+second;
JUMP
LOOP
DONE: LOAD SECOND
STORE
RESULT -- result = second;
HALT
FIRST: .DATA 1
SECOND: .DATA 1
RESULT: .DATA 2
LIMIT: .DATA 20
.END
The assembler is classified as system software.
System software provides its clients a virtual machine interface.
It exists between a client and the actual machine, and provides clients a simplified and usually hardware-independent view of the computer.
The assembly language format, for example, need not necessarily directly reflect the machine architecture. An assembler could allow its programmers to write using a different instruction set and different instruction formats than the underlying hardware.
Example: assembler provides a two-address virtual machine, but
generates machine code for a one-address physical machine. Programmer
could write the instruction ADD X,Y and the
assembler translates it to the three-instruction sequence:
LOAD X
ADD Y
STORE X
We will study two classes of system software: operating systems and compilers.
Compilers, which we study later in the term, fall into the same general
class as assemblers; both are language translators.