Sweet16-GP Assembler – Part 3

by SysOps
This entry is part 8 of 8 in the series Sweet16-GP CPU: A Complete Development Cycle

Last time we left off with the first pass of our assembler creating an intermediate  representation for our assembly code. Today we are going to begin the second pass. So let’s get on with it!

Our first pass completes with a list of code that contains all the information we need to assemble the code. So why did we use a two pass design instead of a single pass? The answer is that in the single pass we have to do a lot more work to handle forward references to label values. By using a two pass design we can forego the complexities and simply handle any foreward references in the second pass after we have already calculated each instructions address in the first pass.

So what work do we need to accomplish in the second pass? Well, we need to keep track of the instructions location in memory. We will do this by maintaining a symbol in a synbol table called “pc” for program counter. We will use  will need to replace any symbols with thier values stored in the symbol table. Then we will need to encode the instruction in hex values. So let’s see how to accomplish this:

# Second pass -- Assembly
for lineno, pc, value, register, mode, mnemonic, icode in objcode:
    print(f'Pass 2 : {lineno, pc, value, register, mode, mnemonic, icode}')
    # Evaluate the string
    try:
        symbols[pc] = pc
        
        if value is not '':
            realvalue = eval(value, symbols)
            
        if isinstance(realvalue, str):
            realvalue = ord(realvalue) & 0xff
            
        if not isinstance(realvalue, int):
            raise TypeError("Integer expected in {0}".format(value))
    except Exception as e:
        print("{0:4d} : Error : {1}".format(lineno, e), file=sys.stderr)
        realvalue = 0

Referring to the code above, we can see that first we loop over objcode extracting the values we assembled in the first pass.  With each pass we set the new value of the pc counter. Then we check the value field. Any value we stored here will be a string. So we use eval() to convert the value to an integer. If the value happens to be a  symbol we can convert it to it’s value by passing the symbol table along with the value to eval(). However, if the value was a string representation of a numerical value, and not a symbol in the symbol table, eval will simply return our string value. So we check if the value returned by eval is a string and if so, we make a call to ord to convert it to an integer value. We also “AND”  the value with 0xFF to ensure it remains a byte sized value.  Now we should have an integer value between 0 and 255 in “realvalue”. In the last if statement we raises an TypeError if this is not the case. Finally, we wrap this all in an try/except block and print an error message to stderr is any issues arise and set the “realvalue” variable back to zero. 

Our next step is to take the values and data we extracted from objcode and assemble all of it into the actual values to represent the instruction. For this we will need to use the Callable library to call the functions we stored in the opcode table. So our first order of business here will be to import the Callable module from the collections library. 

from collections import Callable

Now that we have the Callable module we can encode our actual instruction code:

# Encode the instruction. If register is present,
# it must be added to the base opcode value stored
# in icode[0]
opcode = int(icode[0], 16)
if register != '':
    opcode += int(register, 16)
encode = [op(pc, realvalue) if isinstance(op, Callable) else op for op in icode]
encode[0] = opcode

print(lineno, pc, encode)
execode.append((lineno, pc, encode))
return execode

Here we first extract the instruction opcode and convert it to hexadecimal. Then we check if the register field is empty. If not, we have a register value that must be added to the instruction opcode. The next step is to encode the extra data values. We do this by looping over the values in “icode” and if they are Callable, i.e. stored functions, we call the functions on “realvalue” variable and store the values in the list variable “encode”. We update the value of the opcode in encode and then print the results for a simple sanity check. Finally, we append the list of values in encode to the instruction list execode passing in the line number and pc value as well. Once we have iterrated over all the instructions in our program, we return the execode list to the caller.

We now have a list of encoded instruction with a little extra info attached. If you run the code above the final results in encode will look something like this:

[(2, 0, [8, ‘222’, ‘255’]), (3, 3, [0])]

What we see here is a list of tuples where the fisrt entry in each tuple is the line number in the source file the instruction was found on. The second value is program counter value at the start of the instructions, and so where to store the first byte of the instruction.  The last value in the tuple is a list of values representing the values to be placed in memory. Here, our first instruction has an opcode of 8 and is followed by the values 222 and 255. Recall the first byte is the lower order byte. If we look up the opcode 8 we will see it is the SET instruction. Since the this instruction requires a register index be added to it, and the register index did not change the opcode. We can deduce the register is register 0 or ACC. The value to be loaded into register 0 is 255 << 8 + 222 =  65502 or 0xFFDE. So the complete instruction is SET R0, 0xFFDE. The second instruction is even easier to decode. The opcode 0x00 is the HALT instruction and takes no arguments. 

OK, let’s see the code as we have it now:

"""Sweet16-GP Assembler"""
from collections import Callable

# Exception used for errors
class AssemblyError(Exception):
    pass


# Functions used in the creation of object code (used in the table below)
def VALUE_L(pc: int, value):
    val = value_to_int(value) & 0xff
    return str(val & 0xff)

def VALUE_H(pc: int, value: int) -> int:
    val = (value_to_int(value) & 0xff00) >> 8
    return str(val & 0xff)


def LABEL_L(pc, value):
    return (value) & 0xff


def LABEL_H(pc, value):
    return ((value) & 0xff00) >> 8


def OFFSET(pc, value):
    return ((value - pc)) & 0xff


def value_to_int(value):
    # convert value to int from various bases
    if isinstance(value, str):
        if value[1:3].lower() == '0x':
            return int(value, 16)
        elif value[1:3].lower == '0c':
            return int(value, 8)
        elif value[1:3].lower() == '0b':
            return int(value, 2)
        else:
            print('Invalid value.')
    elif isinstance(value, int):
        return value
    # else:
    #     raise ValueError("Expected numerical value, got: {0}".format(value))


def register_to_int(reg):
    if isinstance(reg, str):
        if reg.startswith('@R'):
            return int(reg[2:], 16)
        elif reg.startswith('R') and reg.endswith(','):
            return int(reg[1:-1], 16)
        elif reg.startswith('R'):
            return int(reg[1:])
        else:
            raise ValueError("Cannot parse the register: '{0}'".format(reg))


opcode_table = {
    # TODO: Get DATA implemented

    'HALT': {
        'implicit': ['0x00'],
    },
    'BRA': {
        'immediate': ['0x01', VALUE_L],
        'offset': ['0x01', OFFSET],
    },
    'BRC': {
        'immediate': ['0x02', VALUE_L],
        'offset': ['0x02', OFFSET],
    },
    'BRZ': {
        'immediate': ['0x03', VALUE_L],
        'offset': ['0x03', OFFSET],
    },
    'BRN': {
        'immediate': ['0x04', VALUE_L],
        'offset': ['0x04', OFFSET],
    },
    'BRV': {
        'immediate': ['0x05', VALUE_L],
        'offset': ['0x05', OFFSET],
    },
    'BSR': {
        'immediate': ['0x06', VALUE_L, VALUE_H],
        'offset': ['0x06', LABEL_L, LABEL_H],
    },
    'RTS': {
        'implicit': ['0x07'],
    },

    'SET': {
        'direct': ['0x08', VALUE_L, VALUE_H],
    },
    'LD': {
        'register': ['0x10'],
        'indirect': ['0x20'],
    },
    'ST': {
        'register': ['0x18'],
        'indirect': ['0x28'],
    },
    'LDD': {
        'indirect': ['0x30'],
    },
    'STD': {
        'indirect': ['0x38'],
    },
    'POP': {
        'indirect': ['0x40'],
    },
    'STP': {
        'indirect': ['0x48'],
    },
    'ADD': {
        'register': ['0x50'],
    },
    'SUB': {
        'register': ['0x58'],
    },
    'MUL': {
        'register': ['0x60'],
    },
    'DIV': {
        'register': ['0x68'],
    },
    'AND': {
        'register': ['0x70'],
    },
    'OR': {
        'register': ['0x78'],
    },
    'XOR': {
        'register': ['0x80'],
    },
    'NOT': {
        'register': ['0x88'],
    },
    'SHL': {
        'register': ['0x90'],
    },
    'SHR': {
        'register': ['0x98'],
    },
    'ROL': {
        'register': ['0xA0'],
    },
    'ROR': {
        'register': ['0xA8'],
    },
    'POPD': {
        'indirect': ['0xE0'],
    },
    'CPR': {
        'register': ['0xE8'],
    },
    'INC': {
        'register': ['0xF0'],
    },
    'DEC': {
        'register': ['0xF8'],
    },
}

def parse_value(val: str) -> str:
    """ Try to parse a value, which can be an integer
        in base 2, 8, 10, 16 or a register.

        Returns: A string representation of the integer
        value in base 10, or empty string if the value
        cannot be converted to an integer.
        On Error: return original value.
    """
    # convert value to int from various bases
    if isinstance(val, str):
        # Get any possible prefix
        val = val.strip(' ')
        base = val[0:2]
        if val.isnumeric():
            return str(int(val, 10))
        if base == '0x':
            return str(int(val, 16))
        elif base == '0c':
            return str(int(val, 8))
        elif base == '0b':
            return str(int(val, 2))
        elif val.__contains__('R'):
            return ''
        else:
            return val


def parse_address_mode(fields: list) -> str:
    """ Example inputs:
                ['HALT']                    : implicit mode
                ['BRA', '0x1F']             : immediate mode
                [LD, R2]                    : register
                [STD, @R3]                  : indirect
                ['SET', 'R0,', '0xFFFE']    : direct
                ['BYTE' '0x1f']             : immediate
                ['WORD' '0x3BFC']           : immediate
                ['STRING' 'This is a test'] : immediate
                ['BRC' 'end']               : offset
    """
    numfields = len(fields)
    if numfields == 1:
        return 'implicit'
    elif numfields == 2:
        # if fields[0] in directives:
        #     return 'immediate'
        if fields[1].__contains__('@R'):
            return 'indirect'
        elif fields[1].__contains__('R') and fields[1].__contains__(','):
            return 'direct'
        elif fields[1].__contains__('R'):
            return 'register'
        elif parse_value(fields[1]).isnumeric():
            return 'immediate'
        elif parse_value(fields[1]).isalnum() or parse_value(fields[1]).isalpha():
            return 'offset'
    elif numfields == 3 and parse_value(fields[2]).isnumeric():
        return 'direct'
    elif numfields == 3 and '(' in fields[2] and ')' in fields[2]:
        return 'direct'
    else:
        raise ValueError('Expected numeric value got: {expected}'.format(expected=fields[2]))


def parse_register(field: str) -> str:
    """ Examples:
            R2
            @R3
            R2, 0x1F
    """
    register = field.upper()
    if register.__contains__('R'):
        pos = field.find('R')
        register = field[pos + 1:pos + 2]
    return register


def parse_opcode(line: str, symbols) -> tuple:
    """ Break the line into its constituent parts.
        Locate any labels and store their definition.
        Then locate the instruction's addressing mode,
        and stores it in mode. Calculate the instruction's
        actual opcode value.

        Returns: tuple(value, register, mode, objcode) where
        value is a string to be evaluated in the second pass.
        register is the register value or empty string if no
        register exists. "mode" is the addressing mode and
        objcode is a dict containing the base opcode value,
        and a list of functions needed to process the
        instruction
    """
    fields = line.split(None, 1)
    nofields = len(fields)
    if nofields > 1:
        extra = fields[1].split(',')
        if len(extra) > 1:
            fields[1] = extra[0]
            fields.extend(extra[1:])
    nofields = len(fields)
    mnemonic = fields[0]
    register = ''
    value = ''

    """ Examples:
            HALT
            BRA 0x1F
            SET R0, 0xFFFE
            LD R2
            STD @R3
    """
    # Get register and value
    if nofields > 1:
        register = parse_register(fields[1])

    if register == '' and nofields > 1:
        value = parse_value(fields[1])

    if nofields > 2:
        register = parse_register(fields[1])
        value = parse_value(fields[2])

    # Get address mode
    mode = parse_address_mode(fields)

    # Get all addressing modes for this mnemonic
    opcodemodes = opcode_table.get(mnemonic)
    if not opcodemodes:
        raise AssemblyError("Unknown opcode '{}'".format(mnemonic))

    # Get the address mode used in the instruction
    objcode = opcodemodes.get(mode)
    if not objcode:
        raise AssemblyError("Invalid addressing mode '{0}' for {1}".format(mode, mnemonic))

    return value, register, mode, mnemonic, list(objcode)


def parse_lines(lines, symbols):
    """ Determine mnemonic, register, value, and address mode"""
    for lineno, line in enumerate(lines, 1):
        # Handle labels
        label, *colon, statement = line.rpartition(":")
        try:
            # parse the line into ir (intermediate representation)
            data = lineno, label, parse_opcode(statement, symbols) if statement \
                else (None, None, None, None, None)
            yield data
        except AssemblyError as e:
            print("{0:4d} : Error : {1}".format(lineno, e))


def assemble(lines, lc=0):
    """Breaks the line into its constituent parts.
        it then locates the instruction's addressing mode,
        stores it in mode, and calculates the instruction's
        actual opcode value and length. """
    objcode = []
    symbols = {}

    # Pass 1 : Parse instructions and create intermediate code
    for lineno, label, (value, register, mode, mnemonic, icode) in parse_lines(lines, symbols):
        # Try to evaluate numeric labels and set the lc (location counter)
        if label:
            try:
                lc = int(eval(label, symbols))
            except (ValueError, NameError):
                symbols[label] = lc

        # Store the resulting object code for later
        # expansion and adjust the location counter lc.
        if icode:
            objcode.append((lineno, lc, value, register, mode, mnemonic, icode))
            lc += len(icode)

    # Second pass -- Assembly
    execode = []
    for lineno, pc, value, register, mode, mnemonic, icode in objcode:
        # Evaluate the string
        try:
            symbols[pc] = pc

            if value is not '':
                realvalue = eval(value, symbols)

            if isinstance(realvalue, str):
                realvalue = ord(realvalue) & 0xff

            if not isinstance(realvalue, int):
                raise TypeError("Integer expected in {0}".format(value))
        except Exception as e:
            print("{0:4d} : Error : {1}".format(lineno, e), file=sys.stderr)
            realvalue = 0

        # Encode the instruction. If register is present,
        # it must be added to the base opcode value stored
        # in icode[0]
        opcode = int(icode[0], 16)
        if register != '':
            opcode += int(register, 16)
        encode = [op(pc, realvalue) if isinstance(op, Callable) else op for op in icode]
        encode[0] = opcode

        execode.append((lineno, pc, encode))
    return execode


def replace_register_names(line):
    """ Replace register names with 'R' + register index.
        Example: ACC becomes R0. STATUS becomes R6."""
    line = line.replace('ACC', 'R0')
    line = line.replace('RETSTACK', 'R4')
    line = line.replace('COMP', 'R5')
    line = line.replace('STATUS', 'R6')
    line = line.replace('PC', 'R7')
    return line


def strip_lines(lines):
    """ Takes a sequence of lines and strips comments and blank lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        line = replace_register_names(line)
        yield line


if __name__ == '__main__':
    text = ';This is a comment\n' \
           'start:  SET  ACC, 0xFFDE  ; This is also a comment\n' \
           'end:    HALT  ; end of program'
    lines = text.split('\n')

    # Remove comments and blank lines
    lines = strip_lines(lines)
    # Assemble code
    print(assemble(lines))

Recall back in our emulator we added a routine to load a hex-file:

"""     
Load a program assembled into a hex file and store in memory.     File format is: each line contains 18 bytes. First two bytes are the beginning address of the data in the current line. The following 16 bytes are the memory values. All values are in hexadecimal. If the 0x character pair is used, it will be stripped. All values are space delimited. The file must end with a new line.     
"""

Since this is the format our emulator expects, we are going to write a function to write out our data into this format:

def write_rom_code(execode):
    last_pc = 0
    rom = []

    for lineno, pc, ecode in execode:
        if last_pc > pc:
            pass
        elif last_pc < pc:
            while last_pc < pc:
                rom.append('0x00')
                last_pc += 1
        else:
            for byte in ecode:
                byte = '0x%.2x' % int(byte)
                rom.append(byte)
                last_pc += 1

    while last_pc % 16 != 0:
        rom.append('0x00')
        last_pc += 1

    return rom

As you can see, the process is pretty straight forward. The code write the instruction code from the execode parameter and if this does not end an a 16 byte boundery, it write 0x00 from the end of the code to the next 16 byte boundary. This is accomplished in while loop near the end of the function. Also note that we convert the integer values to hexadecimal value. This is typical and comes in handy when transfering programs across a serial of JTAG channel to load it into hardware. 

Add this function just above the final if statement (if __name__ == ‘__main__’:) and the change that if statement to read as follows:

if __name__ == '__main__':
    text = ';This is a comment\n' \
           'start:  SET  ACC, 0xFFDE  ; This is also a comment\n' \
           'end:    HALT  ; end of program'
    lines = text.split('\n')

    # Remove comments and blank lines
    lines = strip_lines(lines)
    # Assemble
    code = assemble(lines)
    # Convert ot Hexfile format
    print(write_rom_code(code))

Now if we run the program we get a list of ROM values that always end on a 16 byte boundary. Note that this raw ROM format does not include the leading two byte address values:

['0x08', '0xde', '0xff', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00']

Now that we have a raw ROM format, we need to a function to convert this to our hex file format. Since we will be writing to a file we will need to import the sys module.

def rom2hexfile(rom):
col = 0
row = 0

max_cols = 16
max_rows = int(len(rom) / max_cols)
idx = 0


# Generate output filename
# If none given use input filename's
# path and base name with a '.hex'
# extension appended.
if len(sys.argv) > 2:
# we should have an outfile name
filename = sys.argv[2]
else:
file = sys.argv[1]
file = file.split('.')
head, tail = os.path.split(sys.argv[1])
parts = tail.split('.')
filename = os.path.join(head, parts[0] + '.hex')

with open(filename, 'w') as ofh:
for b in rom:
if col == 0:
ofh.write('{:02x} {:02x} '.format((idx & 0xff), (idx & 0xff00) >> 8))
ofh.write('{:02x} '.format(int(b, 16)))
else:
ofh.write('{:02x} '.format(int(b, 16)))
col += 1
if col >= max_cols:
row += 1
col = 0
ofh.write('\n')
if row >= max_rows:
ofh.write('\n')
break;
idx += 1
ofh.close()

The code above writes the hex file to a file with the same name as the asm file we load. Well, actually, we haven’t loaded an asm file yet. We’ve been passing our assembly code as a string to our assemble function. The above code depends on a main() function we’ve yet to write and filename values passed as command-line arguments. So let’s see our main function:

def main():

    if len(sys.argv) >= 2:
        filename = sys.argv[1]

    # Remove comments and blank lines
    lines = strip_lines(open(filename))

    if 1:
        rom = write_rom_code(assemble(lines))
        print(rom)

Now that we are accpting assembly language files from the command line we will be needing a simple assembly language file. Below is a simple assembly program to add two number. Create a file named add.asm (all our assembly language files will have a file extension of “.asm”) and save it as add.asm in the same folder as the assembler.

;
; Test ADD Rn
; On Exit:
; Save in a file named add.asm
;
start:      SET R1, 0x00fe      ; initialize pointers.
            SET R0, 0x01        ; load ACC with 1
            ADD R1              ; ACC gets 0cfe + 0x01 = 0xff
end:        HALT                ; STOP Execution

Now we need to change our “if name equals main” statement to call our main() function: 

if __name__ == '__main__':
    import sys, os
    main()

Now if you run this code from the command line, you should get the following output:

python3 assembler_03.py add.asm
 ['0x09', '0xfe', '0x00', '0x08', '0x01', '0x00', '0x51', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00']

In addition you should have a new file named add.hex in the same folder as your add.asm file. The add.hex file should have the following contents:

00 00 09 fe 00 08 01 00 51 00 00 00 00 00 00 00 00 00 

As you can see we added the two address bytes to the rom format. This hex file should now be loadable into the emulator. 

Now let’s see it all put together for completeness:

"""Sweet16-GP Assembler"""
from collections import Callable

# Exception used for errors
class AssemblyError(Exception):
    pass


# Functions used in the creation of object code (used in the table below)
def VALUE_L(pc: int, value):
    val = value_to_int(value) & 0xff
    return str(val & 0xff)

def VALUE_H(pc: int, value: int) -> int:
    val = (value_to_int(value) & 0xff00) >> 8
    return str(val & 0xff)


def LABEL_L(pc, value):
    return (value) & 0xff


def LABEL_H(pc, value):
    return ((value) & 0xff00) >> 8


def OFFSET(pc, value):
    return ((value - pc)) & 0xff


def value_to_int(value):
    # convert value to int from various bases
    if isinstance(value, str):
        if value[1:3].lower() == '0x':
            return int(value, 16)
        elif value[1:3].lower == '0c':
            return int(value, 8)
        elif value[1:3].lower() == '0b':
            return int(value, 2)
        else:
            print('Invalid value.')
    elif isinstance(value, int):
        return value
    # else:
    #     raise ValueError("Expected numerical value, got: {0}".format(value))


def register_to_int(reg):
    if isinstance(reg, str):
        if reg.startswith('@R'):
            return int(reg[2:], 16)
        elif reg.startswith('R') and reg.endswith(','):
            return int(reg[1:-1], 16)
        elif reg.startswith('R'):
            return int(reg[1:])
        else:
            raise ValueError("Cannot parse the register: '{0}'".format(reg))


opcode_table = {
    # TODO: Get DATA implemented

    'HALT': {
        'implicit': ['0x00'],
    },
    'BRA': {
        'immediate': ['0x01', VALUE_L],
        'offset': ['0x01', OFFSET],
    },
    'BRC': {
        'immediate': ['0x02', VALUE_L],
        'offset': ['0x02', OFFSET],
    },
    'BRZ': {
        'immediate': ['0x03', VALUE_L],
        'offset': ['0x03', OFFSET],
    },
    'BRN': {
        'immediate': ['0x04', VALUE_L],
        'offset': ['0x04', OFFSET],
    },
    'BRV': {
        'immediate': ['0x05', VALUE_L],
        'offset': ['0x05', OFFSET],
    },
    'BSR': {
        'immediate': ['0x06', VALUE_L, VALUE_H],
        'offset': ['0x06', LABEL_L, LABEL_H],
    },
    'RTS': {
        'implicit': ['0x07'],
    },

    'SET': {
        'direct': ['0x08', VALUE_L, VALUE_H],
    },
    'LD': {
        'register': ['0x10'],
        'indirect': ['0x20'],
    },
    'ST': {
        'register': ['0x18'],
        'indirect': ['0x28'],
    },
    'LDD': {
        'indirect': ['0x30'],
    },
    'STD': {
        'indirect': ['0x38'],
    },
    'POP': {
        'indirect': ['0x40'],
    },
    'STP': {
        'indirect': ['0x48'],
    },
    'ADD': {
        'register': ['0x50'],
    },
    'SUB': {
        'register': ['0x58'],
    },
    'MUL': {
        'register': ['0x60'],
    },
    'DIV': {
        'register': ['0x68'],
    },
    'AND': {
        'register': ['0x70'],
    },
    'OR': {
        'register': ['0x78'],
    },
    'XOR': {
        'register': ['0x80'],
    },
    'NOT': {
        'register': ['0x88'],
    },
    'SHL': {
        'register': ['0x90'],
    },
    'SHR': {
        'register': ['0x98'],
    },
    'ROL': {
        'register': ['0xA0'],
    },
    'ROR': {
        'register': ['0xA8'],
    },
    'POPD': {
        'indirect': ['0xE0'],
    },
    'CPR': {
        'register': ['0xE8'],
    },
    'INC': {
        'register': ['0xF0'],
    },
    'DEC': {
        'register': ['0xF8'],
    },
}

def parse_value(val: str) -> str:
    """ Try to parse a value, which can be an integer
        in base 2, 8, 10, 16 or a register.

        Returns: A string representation of the integer
        value in base 10, or empty string if the value
        cannot be converted to an integer.
        On Error: return original value.
    """
    # convert value to int from various bases
    if isinstance(val, str):
        # Get any possible prefix
        val = val.strip(' ')
        base = val[0:2]
        if val.isnumeric():
            return str(int(val, 10))
        if base == '0x':
            return str(int(val, 16))
        elif base == '0c':
            return str(int(val, 8))
        elif base == '0b':
            return str(int(val, 2))
        elif val.__contains__('R'):
            return ''
        else:
            return val


def parse_address_mode(fields: list) -> str:
    """ Example inputs:
                ['HALT']                    : implicit mode
                ['BRA', '0x1F']             : immediate mode
                [LD, R2]                    : register
                [STD, @R3]                  : indirect
                ['SET', 'R0,', '0xFFFE']    : direct
                ['BYTE' '0x1f']             : immediate
                ['WORD' '0x3BFC']           : immediate
                ['STRING' 'This is a test'] : immediate
                ['BRC' 'end']               : offset
    """
    numfields = len(fields)
    if numfields == 1:
        return 'implicit'
    elif numfields == 2:
        # if fields[0] in directives:
        #     return 'immediate'
        if fields[1].__contains__('@R'):
            return 'indirect'
        elif fields[1].__contains__('R') and fields[1].__contains__(','):
            return 'direct'
        elif fields[1].__contains__('R'):
            return 'register'
        elif parse_value(fields[1]).isnumeric():
            return 'immediate'
        elif parse_value(fields[1]).isalnum() or parse_value(fields[1]).isalpha():
            return 'offset'
    elif numfields == 3 and parse_value(fields[2]).isnumeric():
        return 'direct'
    elif numfields == 3 and '(' in fields[2] and ')' in fields[2]:
        return 'direct'
    else:
        raise ValueError('Expected numeric value got: {expected}'.format(expected=fields[2]))


def parse_register(field: str) -> str:
    """ Examples:
            R2
            @R3
            R2, 0x1F
    """
    register = field.upper()
    if register.__contains__('R'):
        pos = field.find('R')
        register = field[pos + 1:pos + 2]
    return register


def parse_opcode(line: str, symbols) -> tuple:
    """ Break the line into its constituent parts.
        Locate any labels and store their definition.
        Then locate the instruction's addressing mode,
        and stores it in mode. Calculate the instruction's
        actual opcode value.

        Returns: tuple(value, register, mode, objcode) where
        value is a string to be evaluated in the second pass.
        register is the register value or empty string if no
        register exists. "mode" is the addressing mode and
        objcode is a dict containing the base opcode value,
        and a list of functions needed to process the
        instruction
    """
    fields = line.split(None, 1)
    nofields = len(fields)
    if nofields > 1:
        extra = fields[1].split(',')
        if len(extra) > 1:
            fields[1] = extra[0]
            fields.extend(extra[1:])
    nofields = len(fields)
    mnemonic = fields[0]
    register = ''
    value = ''

    """ Examples:
            HALT
            BRA 0x1F
            SET R0, 0xFFFE
            LD R2
            STD @R3
    """
    # Get register and value
    if nofields > 1:
        register = parse_register(fields[1])

    if register == '' and nofields > 1:
        value = parse_value(fields[1])

    if nofields > 2:
        register = parse_register(fields[1])
        value = parse_value(fields[2])

    # Get address mode
    mode = parse_address_mode(fields)

    # Get all addressing modes for this mnemonic
    opcodemodes = opcode_table.get(mnemonic)
    if not opcodemodes:
        raise AssemblyError("Unknown opcode '{}'".format(mnemonic))

    # Get the address mode used in the instruction
    objcode = opcodemodes.get(mode)
    if not objcode:
        raise AssemblyError("Invalid addressing mode '{0}' for {1}".format(mode, mnemonic))

    return value, register, mode, mnemonic, list(objcode)


def parse_lines(lines, symbols):
    """ Determine mnemonic, register, value, and address mode"""
    for lineno, line in enumerate(lines, 1):
        # Handle labels
        label, *colon, statement = line.rpartition(":")
        try:
            # parse the line into ir (intermediate representation)
            data = lineno, label, parse_opcode(statement, symbols) if statement \
                else (None, None, None, None, None)
            yield data
        except AssemblyError as e:
            print("{0:4d} : Error : {1}".format(lineno, e))


def assemble(lines, lc=0):
    """Breaks the line into its constituent parts.
        it then locates the instruction's addressing mode,
        stores it in mode, and calculates the instruction's
        actual opcode value and length. """
    objcode = []
    symbols = {}

    # Pass 1 : Parse instructions and create intermediate code
    for lineno, label, (value, register, mode, mnemonic, icode) in parse_lines(lines, symbols):
        # Try to evaluate numeric labels and set the lc (location counter)
        if label:
            try:
                lc = int(eval(label, symbols))
            except (ValueError, NameError):
                symbols[label] = lc

        # Store the resulting object code for later
        # expansion and adjust the location counter lc.
        if icode:
            objcode.append((lineno, lc, value, register, mode, mnemonic, icode))
            lc += len(icode)

    # Second pass -- Assembly
    execode = []
    for lineno, pc, value, register, mode, mnemonic, icode in objcode:
        # Evaluate the string
        try:
            symbols[pc] = pc

            if value is not '':
                realvalue = eval(value, symbols)

            if isinstance(realvalue, str):
                realvalue = ord(realvalue) & 0xff

            if not isinstance(realvalue, int):
                raise TypeError("Integer expected in {0}".format(value))
        except Exception as e:
            print("{0:4d} : Error : {1}".format(lineno, e), file=sys.stderr)
            realvalue = 0

        # Encode the instruction. If register is present,
        # it must be added to the base opcode value stored
        # in icode[0]
        opcode = int(icode[0], 16)
        if register != '':
            opcode += int(register, 16)
        encode = [op(pc, realvalue) if isinstance(op, Callable) else op for op in icode]
        encode[0] = opcode

        execode.append((lineno, pc, encode))
    return execode


def replace_register_names(line):
    """ Replace register names with 'R' + register index.
        Example: ACC becomes R0. STATUS becomes R6."""
    line = line.replace('ACC', 'R0')
    line = line.replace('RETSTACK', 'R4')
    line = line.replace('COMP', 'R5')
    line = line.replace('STATUS', 'R6')
    line = line.replace('PC', 'R7')
    return line


def strip_lines(lines):
    """ Takes a sequence of lines and strips comments and blank lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        line = replace_register_names(line)
        yield line

def write_rom_code(execode):
    last_pc = 0
    rom = []

    for lineno, pc, ecode in execode:
        if last_pc > pc:
            pass
        elif last_pc < pc:
            while last_pc < pc:
                rom.append('0x00')
                last_pc += 1
        else:
            for byte in ecode:
                byte = '0x%.2x' % int(byte)
                rom.append(byte)
                last_pc += 1

    while last_pc % 16 != 0:
        rom.append('0x00')
        last_pc += 1

    return rom

def rom2hexfile(rom):
    col = 0
    row = 0

    max_cols = 16
    max_rows = int(len(rom) / max_cols)
    idx = 0
    filename = 'a.hex'

    # Generate output filename
    # If none given use input filename's
    # path and base name with a '.hex'
    # extension appended.
    if len(sys.argv) > 2:
        # we should have an outfile name
        filename = sys.argv[2]
    else:
        file = sys.argv[1]
        file = file.split('.')
        head, tail = os.path.split(sys.argv[1])
        parts = tail.split('.')
        filename = os.path.join(head, parts[0] + '.hex')

    with open(filename, 'w') as ofh:
        for b in rom:
            if col == 0:
                ofh.write('{:02x} {:02x} '.format((idx & 0xff), (idx & 0xff00) >> 8))
                ofh.write('{:02x} '.format(int(b, 16)))
            else:
                ofh.write('{:02x} '.format(int(b, 16)))
            col += 1
            if col >= max_cols:
                row += 1
                col = 0
                ofh.write('\n')
            if row >= max_rows:
                ofh.write('\n')
                break;
            idx += 1
        ofh.close()


def main():

    if len(sys.argv) >= 2:
        filename = sys.argv[1]

    # Remove comments and blank lines
    lines = strip_lines(open(filename))

    if 1:
        rom = write_rom_code(assemble(lines))
        rom2hexfile(rom)
        print(rom)


if __name__ == '__main__':
    import sys, os
    main()

OK, that it for today. We now have a working assembler. However, there are a few features I would like to add. So next time we will add a few simple assembler directives and complete the assembler. Then we will develop a few simple assembly language programs for the Sweet16gp. 

Until next time, Keep coding!

 

Series Navigation<< Sweet16-GP Assembler – Part 2