CSC 150 Chapter 6: Assembly Language and Assemblers
primary information resource: An Invitation to Computer Science (Java), Third Edition, G. Michael Schneider and Judith L. Gersting, Course Technology, 2007.

[ previous | schedule | next ]

What assemblers do and why we use them

Programming in binary or machine code is really no fun!

Consider this program, written in the text/lab machine language.

What does this program do?  First instruction is at address 0.

Address         Contents
00000000  0000000000001100
00000001  0011000000001101
00000010  0001000000001110
00000011  0111000000010000
00000100  1001000000000111
00000101  0001000000001111
00000110  1000000000001010
00000111  0000000000010000
00001000  0101000000001110
00001001  0001000000001111
00001010  1110000000001111
00001011  1111000000000000
00001100  0000000000000110
00001101  0000000000000111
00001110  0000000000000000
00001111  0000000000000000
00010000  0000000000000000
 


Recall from the text that each of the 16 operations has a corresponding name.  This is called an op code mnemonic.
The noun “mnemonic” means “devices and techniques to improve memory” (http://www.answers.com/topic/mnemonic)

When you apply the mnemonics (see Figure 5.19 or 6.5 in the text) to the above code, the result is:

00000000  LOAD     000000001100
00000001  ADD      000000001101
00000010  STORE    000000001110
00000011  COMPARE  000000010000
00000100  JUMPGT   000000000111
00000101  STORE    000000001111
00000110  JUMP     000000001010
00000111  LOAD     000000010000
00001000  SUBTRACT 000000001110
00001001  STORE    000000001111
00001010  OUT      000000001111
00001011  HALT     000000000000
00001100       0000000000000110
00001101       0000000000000111
00001110       0000000000000000
00001111       0000000000000000
00010000       0000000000000000

Why didn’t I assign the LOAD mnemonic to the last 5 “instructions”?  Because it turns out they are not instructions, but data!  You can’t easily see this by looking at the machine code, but could probably guess it based on the operands of the first few instructions.

Through op code mnemonics, we’ve made a huge leap in understandability.  The tradeoff is that now our program can no longer be directly executed on the machine.  It has to be translated into machine code.

The software that performs this translation is called an assembler.

Translating op code mnemonics is one duty of an assembler
  • Translate op code mnemonics into binary op codes
  • Produce error message when invalid mnemonic is used

If you like, we can write the numbers in decimal and get rid of leading zeroes.  The result is:

 0 LOAD     12
 1 ADD      13
 2 STORE    14
 3 COMPARE  16
 4 JUMPGT    7
 5 STORE    15
 6 JUMP     10
 7 LOAD     16
 8 SUBTRACT 14
 9 STORE    15
10 OUT      15
11 HALT
12           6
13           7
14           0
15           0
16           0

You will probably find this a little more readable than the hexadecimal version.

This is a good time to explain the logic of statements 3 and 4.

Statement 3 compares the contents of address 16 with contents of register R, then sets the corresponding condition code.

Statement 4 checks the CCR.  If it contains 100 (GT), execution control jumps to address 7 (e.g. 7 is written into the PC).  Otherwise, control continues to the next statement.

This is the basis for conditional control (e.g. if-then-else).  Notice that control jumps when the condition is true and drops through when it is false.  This is the opposite of what we are used to.

Translating numeric values is another duty of the assembler.
  • Translate decimal values into binary
  • Handle storage of negative values (signed magnitude or two’s complement)
  • Produce error message if value is outside acceptable range (values here limited to 12 bits for memory addresses, 16 bits for data values)

Suppose I wanted to add a third value, 4, to the initial sum?
I’ll store the value 4 at the end, and insert a second ADD statement between the ADD and STORE.   Like so:

 0 LOAD     12
 1 ADD      13
   ADD      ?? <-- insert
 2 STORE    14
 3 COMPARE  16
 4 JUMPGT    7
 5 STORE    15
 6 JUMP     10
 7 LOAD     16
 8 SUBTRACT 14
 9 STORE    15
10 OUT      15
11 HALT
12           6
13           7
14           0
15           0
16           0
             4 <-- insert

Now we’ve opened a can of worms!


Does the term maintainability mean anything to you?

The solution to this mess is to allow use of symbolic labels to refer to memory addresses.

There are two aspects to labels: definition and use.

Here I've defined some labels both for data addresses (similar to Java variables) and also labels for some instruction addresses to serve as the target for JUMP instructions.

        LOAD     A
        ADD      B
        STORE    C
        COMPARE  ZERO
        JUMPGT   NEGATE
        STORE    D
        JUMP     OUTPUT
NEGATE: LOAD     ZERO
        SUBTRACT C
        STORE    D
OUTPUT: OUT      D
        HALT
A:               6
B:               7
C:               0
D:               0
ZERO:            0

Now with labels, what would change if I wanted to add a third value to the initial sum?  Nice.

Handling labels makes one more job for the assembler.
  • find label definitions,
  • associate each label with its corresponding address,
  • translate all uses of the label to its address
  • produce error message for multiply-defined label or use of undefined label.


We’re close but not quite there!  The assembler needs a little housekeeping information from you.  This is provided using pseudo operations, assembly language operations that do not generate instructions or data.

  1. It needs you to mark the beginning and end of your program so it will know when to stop assembling.  This is done using the pseudo operations .BEGIN and .END.
  2. It also needs you to identify locations that contain data instead of instructions.  This is really for your own safety.  Since in machine code data are indistinguishable from instructions, you could mistakenly place a data item within the instruction flow and the machine would interpret it as an instruction!  That was not done in this case, but we’ll see an example below.  Mark a data item using the pseudo operation .DATA.


        .BEGIN
        LOAD     A
        ADD      B
        STORE    C
        COMPARE  ZERO
        JUMPGT   NEGATE
        STORE    D
        JUMP     OUTPUT
NEGATE: LOAD     ZERO
        SUBTRACT C
        STORE    D
OUTPUT: OUT      D
        HALT
A:      .DATA    6
B:      .DATA    7
C:      .DATA    0
D:      .DATA    0
ZERO:   .DATA    0
        .END
 

Here’s one you can “take to the bank”!  You can copy this code and paste it into the Assembler that came with the lab manual.

NOTE: Due to a quirk in the lab software, you will get an error message on the “.END” statement – change it to lower case and it will assemble and execute.

Handling pseudo ops is yet another duty of the assembler.
  • detect presence of .BEGIN and .END and start/stop translation process accordingly
  • detect .DATA to mark locations holding data not instructions
  • produce error message for invalid pseudo op.
  • produce error message if data item falls into middle of instruction flow.

Appreciate high level languages yet?  Here it is in Java...

int a=6;
int b=7;
int c,d;
c = a + b;
if (c >= 0)
    d = c;
else
    d = -c;
System.out.print(d);

Summary of Assembler Duties

Through this running example, we encountered several duties of the assembler:

The assembler must also do a couple other things:

Once successfully assembled and stored as an object file, the object program can be executed.  A small utility called the loader is responsible for loading it from the file into memory and initializing the Program Counter (PC) to the address of the first instruction.

Data structures and logic and required by the assembler

Translating op code mnemonics to binary

Data structure:  op code table, each row has op code mnemonic and corresponding binary code.

Logic: when a statement is read, look for a match between its operator and an op code table entry.  If match is found, set op code to the corresponding binary code.  If not, generate error message, set op code to all 1s (halt), and keep going.
Should we use a sequential or a binary search?
 

Translating numeric values to binary

Data structure: none

Logic: decimal-to-binary translation algorithm.
 

Handling labels

Data structure: symbol table, each row contains label and corresponding binary address.

Logic:  two cases to consider: label definition and label use.

Label definition. When label definition is detected, store it plus current location counter (address of instruction currently being assembled) value in the symbol table.  If already there, generate error message and keep going.

Label use. When label use is read (symbolic operand), look for a match in the symbol table.  If match is found, set operand (address field) to corresponding address.  If not, generate error message, set operand to all 0s, and keep going.

PROBLEM:  What if label use precedes its definition?  Notice how many times this happens in our running example.  We clearly need to allow this.

TWO REASONABLE SOLUTIONS:

  1. Assembler makes two “passes” through the assembly language program.  On the first pass, it looks only for label definitions and builds the symbol table.  On the second pass, it does everything else.  Details and flowcharts in textbook.
  2. When label use is detected and no match is found in the symbol table, store the label and the location counter in a separate table (forward reference table).  When assembler reaches end of program, then for each entry in forward reference table, try to match the label to a symbol table entry.  If match is found, patch the label’s address into the assembled machine code at the saved location counter address.  If not, generate error message and keep going.
      .begin
      load      A
LOOP: compare   B
      jumpgt    DONE
      increment B
      out       B
      jump      LOOP
DONE: halt
A:    .data     9
B:    .data     0
      .end
Forward References
A
B
DONE
Backward References
LOOP

Handling pseudo ops

Data Structures:  none

Logic:  When .BEGIN is read, activate assembler logic.
When .END is read, the assembler is finished with the pass.
When .DATA is read, the assembler reads and translates the operand value into binary and adds it to generated machine code.
If  incorrect pseudo-code is read, generate error message and keep going.

More Example Assembly Language Programs

Another example program, this time in assembly.  It is an example from my CSC 120 lecture notes on “for” loops.  The Java code is:

int result = 0;
for ( int counter=1; counter<=7; counter++ ) {
    result = result + counter;
}

        .BEGIN
         CLEAR     RESULT
         LOAD      FIRST
         STORE     COUNTER
LOOP:    LOAD      FINAL
         COMPARE   COUNTER
         JUMPGT    FINISH
         LOAD      RESULT
         ADD       COUNTER
         STORE     RESULT
         INCREMENT COUNTER
         JUMP      LOOP
FINISH:  HALT
FIRST:   .DATA     1
FINAL:   .DATA     7
COUNTER: .DATA     0
RESULT:  .DATA     0
         .END

See the correspondence between assembly code and Java?

        .BEGIN
         CLEAR     RESULT  -- int result = 0;

         LOAD      FIRST
         STORE     COUNTER -- int counter = 1;

LOOP:    LOAD      FINAL
         COMPARE   COUNTER
         JUMPGT    FINISH  -- counter <= 7;

         LOAD      RESULT
         ADD       COUNTER -- result = result +
         STORE     RESULT  -- counter;

         INCREMENT COUNTER -- counter++

         JUMP      LOOP
FINISH:  HALT
FIRST:   .DATA     1
FINAL:   .DATA     7
COUNTER: .DATA     0
RESULT:  .DATA     0
         .END

What changes are required to run the counter down instead of up?  Does this affect the result?


Another Example:  Largest Fibbonacci number smaller than 20
 

int first = 1;   // first value in sequence
int second = 1;  // second value in sequence
int result = 2;  // third value in sequence
int limit = 20;
while (result < limit)  {
    first = second;
    second = result;
    result = first + second;
}
result = second; // “undo” last calculation
 

        .BEGIN
LOOP:   LOAD      LIMIT
        COMPARE   RESULT
        JUMPGT    DONE   -- done if RESULT bigger
        JUMPEQ    DONE   -- done if RESULT equal

        LOAD      SECOND
        STORE     FIRST  -- first = second;

        LOAD      RESULT
        STORE     SECOND -- second = result;

        LOAD      FIRST
        ADD       SECOND
        STORE     RESULT -- result=first+second;

        JUMP     LOOP
  DONE: LOAD      SECOND
        STORE     RESULT -- result = second;
        HALT
FIRST:  .DATA     1
SECOND: .DATA     1
RESULT: .DATA     2
LIMIT:  .DATA     20
        .END

System Software

The assembler is classified as system software.

System software provides its clients a virtual machine interface.

It exists between a client and the actual machine, and provides clients a simplified and usually hardware-independent view of the computer.

The assembly language format, for example, need not necessarily directly reflect the machine architecture.  An assembler could allow its programmers to write using a different instruction set and different instruction formats than the underlying hardware.

Example: assembler provides a two-address virtual machine, but generates machine code for a one-address physical machine.  Programmer could write  the instruction  ADD X,Y  and the assembler translates it to the three-instruction sequence:
   LOAD  X
   ADD   Y
   STORE X

We will study two classes of system software: operating systems and compilers.

Compilers, which we study later in the term, fall into the same general class as assemblers; both are language translators.


[ C SC 150 | Peter Sanderson | Math Sciences server  | Math Sciences home page | Otterbein ]

Last updated:
Peter Sanderson (PSanderson@otterbein.edu)