Assembly Language and Assemblers

CSC 150 Chapter 6: Assembly Language and Assemblers
primary information resource: An Invitation to Computer Science (Java), Third Edition, G. Michael Schneider and Judith L. Gersting, Course Technology, 2007.

[ previous | schedule | next ]

What assemblers do and why we use them

Programming in binary or machine code is really no fun!

Consider this program, written in the text/lab machine language.

What does this program do? First instruction is at address 0.

Address Contents
00000000 0000000000001100
00000001 0011000000001101
00000010 0001000000001110
00000011 0111000000010000
00000100 1001000000000111
00000101 0001000000001111
00000110 1000000000001010
00000111 0000000000010000
00001000 0101000000001110
00001001 0001000000001111
00001010 1110000000001111
00001011 1111000000000000
00001100 0000000000000110
00001101 0000000000000111
00001110 0000000000000000
00001111 0000000000000000
00010000 0000000000000000

Recall from the text that each of the 16 operations has a corresponding name. This is called an op code mnemonic.
The noun “mnemonic” means “devices and techniques to improve memory” (http://www.answers.com/topic/mnemonic)

When you apply the mnemonics (see Figure 5.19 or 6.5 in the text) to the above code, the result is:

00000000 LOAD     000000001100
00000001 ADD      000000001101
00000010 STORE    000000001110
00000011 COMPARE 000000010000
00000100 JUMPGT   000000000111
00000101 STORE    000000001111
00000110 JUMP     000000001010
00000111 LOAD     000000010000
00001000 SUBTRACT 000000001110
00001001 STORE    000000001111
00001010 OUT      000000001111
00001011 HALT     000000000000
00001100       0000000000000110
00001101       0000000000000111
00001110       0000000000000000
00001111       0000000000000000
00010000       0000000000000000

Why didn’t I assign the LOAD mnemonic to the last 5 “instructions”? Because it turns out they are not instructions, but data! You can’t easily see this by looking at the machine code, but could probably guess it based on the operands of the first few instructions.

Through op code mnemonics, we’ve made a huge leap in understandability. The tradeoff is that now our program can no longer be directly executed on the machine. It has to be translated into machine code.

The software that performs this translation is called an assembler.

Translating op code mnemonics is one duty of an assembler

Translate op code mnemonics into binary op codes
Produce error message when invalid mnemonic is used

If you like, we can write the numbers in decimal and get rid of leading zeroes. The result is:

0 LOAD     12
1 ADD      13
2 STORE    14
3 COMPARE 16
4 JUMPGT    7
5 STORE    15
6 JUMP     10
7 LOAD     16
8 SUBTRACT 14
9 STORE    15
10 OUT      15
11 HALT
12           6
13           7
14           0
15           0
16           0

You will probably find this a little more readable than the hexadecimal version.

This is a good time to explain the logic of statements 3 and 4.

Statement 3 compares the contents of address 16 with contents of register R, then sets the corresponding condition code.

There are three possibilities: GT (greater than), EQ (equal), LT (less than)
Exactly one of the three will be set to 1 and the other two set to 0.
GT is set if contents of 16 are greater than contents of R. etc.
Our machine has a condition code register (CCR) in which the low-order (2⁰) bit represents LT, the 2¹ bit represents EQ, and the 2² bit represents GT.

Statement 4 checks the CCR. If it contains 100 (GT), execution control jumps to address 7 (e.g. 7 is written into the PC). Otherwise, control continues to the next statement.

This is the basis for conditional control (e.g. if-then-else). Notice that control jumps when the condition is true and drops through when it is false. This is the opposite of what we are used to.

Translating numeric values is another duty of the assembler.

Translate decimal values into binary
Handle storage of negative values (signed magnitude or two’s complement)
Produce error message if value is outside acceptable range (values here limited to 12 bits for memory addresses, 16 bits for data values)

Suppose I wanted to add a third value, 4, to the initial sum?
I’ll store the value 4 at the end, and insert a second ADD statement between the ADD and STORE. Like so:

0 LOAD     12
1 ADD      13
   ADD      ?? <-- insert
2 STORE    14
3 COMPARE 16
4 JUMPGT    7
5 STORE    15
6 JUMP     10
7 LOAD     16
8 SUBTRACT 14
9 STORE    15
10 OUT      15
11 HALT
12           6
13           7
14           0
15           0
16           0
             4 <-- insert

Now we’ve opened a can of worms!

The second ADD needs to go into location 2, so everything beyond that needs to be bumped up by one location.
Once that’s done, all operands that were higher than 1 (all of them, in this case) are now invalid! They all need to be bumped up by 1 also. What a pain.
How many changes resulted? The addition of one instruction and one data item, plus modification of 11 out of the other 12 instructions!

Does the term maintainability mean anything to you?

The solution to this mess is to allow use of symbolic labels to refer to memory addresses.

There are two aspects to labels: definition and use.

Defining a label associates it with a specific address. Do this by placing the label in the spot where we’ve been showing the addresses: to the left of the op code. A label has a colon (:) at the end to separate it from the op code mnemonic.
Using a label tells the assembler to replace the label with its associated address. Where do we use addresses? In the operand (address) field.

Here I've defined some labels both for data addresses (similar to Java variables) and also labels for some instruction addresses to serve as the target for JUMP instructions.

        LOAD     A
        ADD      B
        STORE    C
        COMPARE ZERO
        JUMPGT   NEGATE
        STORE    D
        JUMP     OUTPUT
NEGATE: LOAD     ZERO
        SUBTRACT C
        STORE    D
OUTPUT: OUT      D
        HALT
A:               6
B:               7
C:               0
D:               0
ZERO:            0

Now with labels, what would change if I wanted to add a third value to the initial sum? Nice.

Handling labels makes one more job for the assembler.

find label definitions,
associate each label with its corresponding address,
translate all uses of the label to its address
produce error message for multiply-defined label or use of undefined label.

We’re close but not quite there! The assembler needs a little housekeeping information from you. This is provided using pseudo operations, assembly language operations that do not generate instructions or data.

It needs you to mark the beginning and end of your program so it will know when to stop assembling. This is done using the pseudo operations .BEGIN and .END.
It also needs you to identify locations that contain data instead of instructions. This is really for your own safety. Since in machine code data are indistinguishable from instructions, you could mistakenly place a data item within the instruction flow and the machine would interpret it as an instruction! That was not done in this case, but we’ll see an example below. Mark a data item using the pseudo operation .DATA.

        .BEGIN
        LOAD     A
        ADD      B
        STORE    C
        COMPARE ZERO
        JUMPGT   NEGATE
        STORE    D
        JUMP     OUTPUT
NEGATE: LOAD     ZERO
        SUBTRACT C
        STORE    D
OUTPUT: OUT      D
        HALT
A:      .DATA    6
B:      .DATA    7
C:      .DATA    0
D:      .DATA    0
ZERO:   .DATA    0
        .END

Here’s one you can “take to the bank”! You can copy this code and paste it into the Assembler that came with the lab manual.

NOTE: Due to a quirk in the lab software, you will get an error message on the “.END” statement – change it to lower case and it will assemble and execute.

Handling pseudo ops is yet another duty of the assembler.

detect presence of .BEGIN and .END and start/stop translation process accordingly
detect .DATA to mark locations holding data not instructions
produce error message for invalid pseudo op.
produce error message if data item falls into middle of instruction flow.

Appreciate high level languages yet? Here it is in Java...

int a=6;
int b=7;
int c,d;
c = a + b;
if (c >= 0)
d = c;
else
d = -c;
System.out.print(d);

Summary of Assembler Duties

Through this running example, we encountered several duties of the assembler:

Translating op code mnemonics to binary
Translating numeric values to binary
Handling labels
Handling pseudo ops

The assembler must also do a couple other things:

as each machine language instruction or data item is generated, increment a location counter to keep track of the address where the current instruction/data is to be stored in memory.
as each machine language instruction or data item is generated, store it to a file called an object file.

Once successfully assembled and stored as an object file, the object program can be executed. A small utility called the loader is responsible for loading it from the file into memory and initializing the Program Counter (PC) to the address of the first instruction.

Data structures and logic and required by the assembler

Translating op code mnemonics to binary

Data structure: op code table, each row has op code mnemonic and corresponding binary code.

Logic: when a statement is read, look for a match between its operator and an op code table entry. If match is found, set op code to the corresponding binary code. If not, generate error message, set op code to all 1s (halt), and keep going.
Should we use a sequential or a binary search?

Translating numeric values to binary

Data structure: none

Logic: decimal-to-binary translation algorithm.

Handling labels

Data structure: symbol table, each row contains label and corresponding binary address.

Logic: two cases to consider: label definition and label use.

Label definition. When label definition is detected, store it plus current location counter (address of instruction currently being assembled) value in the symbol table. If already there, generate error message and keep going.

Label use. When label use is read (symbolic operand), look for a match in the symbol table. If match is found, set operand (address field) to corresponding address. If not, generate error message, set operand to all 0s, and keep going.

PROBLEM: What if label use precedes its definition? Notice how many times this happens in our running example. We clearly need to allow this.

TWO REASONABLE SOLUTIONS:

Assembler makes two “passes” through the assembly language program. On the first pass, it looks only for label definitions and builds the symbol table. On the second pass, it does everything else. Details and flowcharts in textbook.
When label use is detected and no match is found in the symbol table, store the label and the location counter in a separate table (forward reference table). When assembler reaches end of program, then for each entry in forward reference table, try to match the label to a symbol table entry. If match is found, patch the label’s address into the assembled machine code at the saved location counter address. If not, generate error message and keep going.

.begin
load A
LOOP: compare B
jumpgt DONE
increment B
out B
jump LOOP
DONE: halt
A: .data 9
B: .data 0
.end Forward References
A
B
DONE Backward References
LOOP

Handling pseudo ops

Data Structures: none

Logic: When .BEGIN is read, activate assembler logic.
When .END is read, the assembler is finished with the pass.
When .DATA is read, the assembler reads and translates the operand value into binary and adds it to generated machine code.
If incorrect pseudo-code is read, generate error message and keep going.

More Example Assembly Language Programs

Another example program, this time in assembly. It is an example from my CSC 120 lecture notes on “for” loops. The Java code is:

int result = 0;
for ( int counter=1; counter<=7; counter++ ) {
result = result + counter;
}

        .BEGIN
         CLEAR     RESULT
         LOAD      FIRST
         STORE     COUNTER
LOOP:    LOAD      FINAL
         COMPARE   COUNTER
         JUMPGT    FINISH
         LOAD      RESULT
         ADD       COUNTER
         STORE     RESULT
         INCREMENT COUNTER
         JUMP      LOOP
FINISH: HALT
FIRST:   .DATA     1
FINAL:   .DATA     7
COUNTER: .DATA     0
RESULT: .DATA     0
         .END

See the correspondence between assembly code and Java?

.BEGIN
CLEAR RESULT -- int result = 0;

LOAD FIRST
STORE COUNTER -- int counter = 1;

LOOP:    LOAD      FINAL
         COMPARE   COUNTER
         JUMPGT    FINISH -- counter <= 7;

         LOAD      RESULT
         ADD       COUNTER -- result = result +
         STORE     RESULT -- counter;

INCREMENT COUNTER -- counter++

         JUMP      LOOP
FINISH: HALT
FIRST:   .DATA     1
FINAL:   .DATA     7
COUNTER: .DATA     0
RESULT: .DATA     0
         .END

What changes are required to run the counter down instead of up? Does this affect the result?

Another Example: Largest Fibbonacci number smaller than 20

int first = 1;   // first value in sequence
int second = 1; // second value in sequence
int result = 2; // third value in sequence
int limit = 20;
while (result < limit) {
    first = second;
    second = result;
    result = first + second;
}
result = second; // “undo” last calculation

        .BEGIN
LOOP:   LOAD      LIMIT
        COMPARE   RESULT
        JUMPGT    DONE   -- done if RESULT bigger
        JUMPEQ    DONE   -- done if RESULT equal

LOAD SECOND
STORE FIRST -- first = second;

LOAD RESULT
STORE SECOND -- second = result;

        LOAD      FIRST
        ADD       SECOND
        STORE     RESULT -- result=first+second;

        JUMP     LOOP
DONE: LOAD      SECOND
        STORE     RESULT -- result = second;
        HALT
FIRST: .DATA     1
SECOND: .DATA     1
RESULT: .DATA     2
LIMIT: .DATA     20
        .END

System Software

The assembler is classified as system software.

System software provides its clients a virtual machine interface.

It exists between a client and the actual machine, and provides clients a simplified and usually hardware-independent view of the computer.

The assembly language format, for example, need not necessarily directly reflect the machine architecture. An assembler could allow its programmers to write using a different instruction set and different instruction formats than the underlying hardware.

Example: assembler provides a two-address virtual machine, but generates machine code for a one-address physical machine. Programmer could write the instruction ADD X,Y and the assembler translates it to the three-instruction sequence:
   LOAD X
   ADD   Y
   STORE X

We will study two classes of system software: operating systems and compilers.

Compilers, which we study later in the term, fall into the same general class as assemblers; both are language translators.

[ C SC 150 | Peter Sanderson | Math Sciences server | Math Sciences home page | Otterbein ]

Last updated:
Peter Sanderson (PSanderson@otterbein.edu)