Sweet16-GP Assembler

by SysOps
This entry is part 6 of 8 in the series Sweet16-GP CPU: A Complete Development Cycle

In my first post I showed you the instruction set of the Sweet16-GP. Within the instruction set presented we saw examples of the Sweet16-GP assembly language. However, we didn’t talk about the assembly language at that time. It’s now time to discuss the Sweet16-GP’s assembly language.

Statements: Statements in the Sweet16’s assembly language can be borken down into four basic parts; Label, Operation, Operand, and Comment. of all these parts only the Operation is required in all statements. Most statements however also require an operand. The operand can be a register, literal value or label reference. The label at the beginning of the statement is the label declaration.

Comments: The assembly language used with the Sweet16-GP allows comments. Comments begin with a semicolon and end at the end of the line. 

Labels: Labels give an human readable name to the line. This name can then be used to reference the line at any point in the program. Labels must begin with a letter or underscore character and may contain alphanumeric and underscore character. Labels declarations must end with a colon. Label declarations occur when the label is located at the beginning of a line. Labels are referenced when they are not the first item in the line. If a label declaration occurs on a line by itself, it is associated with the following line.

Numbers: The assembler can accept numerical values in the following bases: 2, 8, 10, 16. These correspond to binary, octal, decimal, and hexadecimal. Binary values must begin with “0b”; a zero followed by a lower case “B”. Octal values must begin with “0c”, and hexadecimal values must begin with “0x”. Decimal numbers are written normally. Note that all values must be integers. Real numbers are not supported. 

Here are some examples of numerical values:

  • 0b11101011 – binary
  • 0c47 – octal
  • 0xfecd – hexadecimal
  • 65535 – decimal
  • 0fae – not allowed
  • 34.92 – not allowed

Style: Assembly language code typically follows a simple coding style. Below is an example of a typical piece of Sweet16-GP assembly code:

;
; Test for BRN
; On Entry: 
; On Exit: ACC = 0x8000, R1 = 0x0010, R2 = 0x01, PC = 0x13, N and V flags Set
;
start:  SET     R0,     0x7FF0      ; Load ACC with 0xFFF0
        SET     R2,     0x0001      ; Load R2 with 0x1F00
        SET     R1,     0x0000      ; Load R1 with 0x00
        SET     R3,     0x0000
calc:   ADD     R2                  ; ADD R2 to ACC and place results in ACC
        INC     R1                  ; Count how many times we Add R2 to ACC (R0)
        BRV     end                 ; Branch to HALT instruction
        BRA     calc                ; Data block FB
end:    HALT                        ; STOP Execution

As you can see, there is a comment block at the top that documents the code. It is suggested that each block of code be documented this way. Always include the conditions that are expected on entry into the code, and the exit state. List any modified registers, stack pointers, or memory locations that are left in an alter state by the program. 

Declare your line labels at the beginning of the line. Make then human readable. If you need long labels place them above the line it should be associated with. Always end a label declaration with a colon. Here are some examples of label declarations:

  • start:
  • end:
  • _loop_01:
  • _cond_true:
  • _cond_false:
  • loop_iter:

Instructions should all be placed in a single column after the label declarations as show above. White space should be used to place the instruction operands in a single column after the instructions. Finally, comments should all line up into a single column. There is nothing in the assembler to enforce this columnization.  However, you’ll find your code much easier to read and groc if you follow these guidelines.

Notes: Many assemblers allow for the operands to be mathematical expressions. They also, typically offer special features like a characters that reference the program counter at the point in the program at which the special character is used. The Sweet16-GP assembler does not include such features. Perhaps a later version will add these feature. However, for this project I wanted to maintain simplicity. Lastly, the assembler also does not support macros or predefined routines. 

Assemblers are unique in the parser/translator world in that their source (assembly) language has a nearly one-to-one relationship with its output language (machine code). This fact makes it simple to create basic assemblers using crude pattern matching parsers. Often regular expressions are employed to recognize the various language constructs.

The only difficulty encounter is usually label references. Since labels can be referenced before they are declared. This, however, can be dealt with by breaking the parser into two parts and allowing each part to separately scan the source code.

Our assembler will use a simple two-pass process in which the first pass (scan of the source code) creates a simple intermediate representation. The second pass will take the results of the first pass and convert it to actual machine code. We will also use a table of opcodes and addressing modes to help us build our intermediate code.

The first thing we will need is a simple framework with a line of code for testing. Let’s get that setup. Our assembler will expect our assembly code in an interrable form as it will work line-by-line. So we’ll need to pass our code as a list of lines of assembly.

"""Sweet16-GP Assembler"""

if __name__ == '__main__':
    text = """
        ;This is a comment
        start:  SET  ACC, 0xFFDE  ; This is also a comment
        end:    HALT  ; end of program
    """
    lines = text.split('\n')

Next. to make parsing easier we will write a simple routine to remove comments from our assembly source code.

def strip_lines(self, lines):
    """ Takes a sequence of lines and strips comments and blank lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        yield line

As you can see we simply loop over the lines in our text and if we find the semicolon character that begins a comment, we remove everything from the semicolon to the end of the line. Next, we call strip() on the line to remove all leading and trailing whitespace, then we yield the line to the caller.

If you’re not familiar with the yield keyword in python 3 I suggest you lookup python 3 generators for a detailed explanation. In a nutshell, however, yield basically stops the loop on each iteration and returns the currently processed line to the caller. When the caller returns back to the strip_lines() method, it will re-enter the method with the state that existed the last time yield was called. In other words, the method remembers where it left off in the loop and starts from there on subsequent calls. You can learn more about the yield keyword here.

It’s time to test this. Change the code to read:

def strip_lines(lines):
    """ Takes a sequence of lines and strips comments and blank lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        yield line


if __name__ == '__main__':
    text = """
        ;This is a comment
        start:  SET  ACC, 0xFFDE  ; This is also a comment
        end:    HALT  ; end of program
    """
    lines = text.split('\n')

    # Remove comments and blank lines
    for line in strip_lines(lines):
        print(line)

If you run this code now you should get something like the following output:

start:  SET  ACC, 0xFFDE
end:    HALT

Process finished with exit code 0

Not bad for the few minutes it took us to write that. We can now take in assembly code and clean it of blank lines and comments. 

Now we have one more step of preprocessing before we can move on to assembling our code. Recall that some registers can be referred to by name. For example, R0 is also known as the accumulator and is named ACC. This naming is great for reading and writing code but we can make assembly easier if we replace these friendly names with the canonical register name Rx where x is the numerical index into the register file. The best place to do this is within our strip_lines() method. However, to keep the code modular and easy to understand we’ll write a subroutine for this and make a call to it in strip_lines().

def replace_register_names(self, line):
    """ Replace register names with 'R' + register index.
        Example: ACC becomes R0. STATUS becomes R6."""
    line = line.replace('ACC', 'R0')
    line = line.replace('RETSTACK', 'R4')
    line = line.replace('COMP', 'R5')
    line = line.replace('STATUS', 'R6')
    line = line.replace('PC', 'R7')
    return line

As you can see we simply take a line of text and replace any register names found in the line with the R-value. Now let’s place the call to replace_register_names() in strip_lines().

def strip_lines(self, lines):
    """ Takes a sequence of lines and strips comments and blank lines.
        Also, calls replace_register_names on stripped lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        line = replace_register_names(line)
        yield line

Our new code now looks like this:

"""Sweet16-GP Assembler"""

def replace_register_names(line):
    """ Replace register names with 'R' + register index.
        Example: ACC becomes R0. STATUS becomes R6."""
    line = line.replace('ACC', 'R0')
    line = line.replace('RETSTACK', 'R4')
    line = line.replace('COMP', 'R5')
    line = line.replace('STATUS', 'R6')
    line = line.replace('PC', 'R7')
    return line

def strip_lines(lines):
    """ Takes a sequence of lines and strips comments and blank lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        line = replace_register_names(line)
        yield line


if __name__ == '__main__':
    text = """
        ;This is a comment
        start:  SET  ACC, 0xFFDE  ; This is also a comment
        end:    HALT  ; end of program
    """
    lines = text.split('\n')

If you run the program again, you will see that the register name ACC has been replaced by R0. Exactly what we want. Your output should look like the following:

start:  SET  R0, 0xFFDE
 end:    HALT

 Process finished with exit code 0

That’s it for today, next time we are going to write a method to take our cleaned code and parse it line by line. 

Series Navigation<< Sweet16 CPU EmulatorSweet16-GP Assembler – Part 2 >>