Header Image - Randall Morgan

Category Archives

13 Articles

Sweet16-GP Assembler – Part 3

by SysOps 0 Comments
This entry is part 8 of 8 in the series Sweet16-GP CPU: A Complete Development Cycle

Last time we left off with the first pass of our assembler creating an intermediate  representation for our assembly code. Today we are going to begin the second pass. So let’s get on with it!

Our first pass completes with a list of code that contains all the information we need to assemble the code. So why did we use a two pass design instead of a single pass? The answer is that in the single pass we have to do a lot more work to handle forward references to label values. By using a two pass design we can forego the complexities and simply handle any foreward references in the second pass after we have already calculated each instructions address in the first pass.

So what work do we need to accomplish in the second pass? Well, we need to keep track of the instructions location in memory. We will do this by maintaining a symbol in a synbol table called “pc” for program counter. We will use  will need to replace any symbols with thier values stored in the symbol table. Then we will need to encode the instruction in hex values. So let’s see how to accomplish this:

# Second pass -- Assembly
for lineno, pc, value, register, mode, mnemonic, icode in objcode:
    print(f'Pass 2 : {lineno, pc, value, register, mode, mnemonic, icode}')
    # Evaluate the string
    try:
        symbols[pc] = pc
        
        if value is not '':
            realvalue = eval(value, symbols)
            
        if isinstance(realvalue, str):
            realvalue = ord(realvalue) & 0xff
            
        if not isinstance(realvalue, int):
            raise TypeError("Integer expected in {0}".format(value))
    except Exception as e:
        print("{0:4d} : Error : {1}".format(lineno, e), file=sys.stderr)
        realvalue = 0

Referring to the code above, we can see that first we loop over objcode extracting the values we assembled in the first pass.  With each pass we set the new value of the pc counter. Then we check the value field. Any value we stored here will be a string. So we use eval() to convert the value to an integer. If the value happens to be a  symbol we can convert it to it’s value by passing the symbol table along with the value to eval(). However, if the value was a string representation of a numerical value, and not a symbol in the symbol table, eval will simply return our string value. So we check if the value returned by eval is a string and if so, we make a call to ord to convert it to an integer value. We also “AND”  the value with 0xFF to ensure it remains a byte sized value.  Now we should have an integer value between 0 and 255 in “realvalue”. In the last if statement we raises an TypeError if this is not the case. Finally, we wrap this all in an try/except block and print an error message to stderr is any issues arise and set the “realvalue” variable back to zero. 

Our next step is to take the values and data we extracted from objcode and assemble all of it into the actual values to represent the instruction. For this we will need to use the Callable library to call the functions we stored in the opcode table. So our first order of business here will be to import the Callable module from the collections library. 

from collections import Callable

Now that we have the Callable module we can encode our actual instruction code:

# Encode the instruction. If register is present,
# it must be added to the base opcode value stored
# in icode[0]
opcode = int(icode[0], 16)
if register != '':
    opcode += int(register, 16)
encode = [op(pc, realvalue) if isinstance(op, Callable) else op for op in icode]
encode[0] = opcode

print(lineno, pc, encode)
execode.append((lineno, pc, encode))
return execode

Here we first extract the instruction opcode and convert it to hexadecimal. Then we check if the register field is empty. If not, we have a register value that must be added to the instruction opcode. The next step is to encode the extra data values. We do this by looping over the values in “icode” and if they are Callable, i.e. stored functions, we call the functions on “realvalue” variable and store the values in the list variable “encode”. We update the value of the opcode in encode and then print the results for a simple sanity check. Finally, we append the list of values in encode to the instruction list execode passing in the line number and pc value as well. Once we have iterrated over all the instructions in our program, we return the execode list to the caller.

We now have a list of encoded instruction with a little extra info attached. If you run the code above the final results in encode will look something like this:

[(2, 0, [8, ‘222’, ‘255’]), (3, 3, [0])]

What we see here is a list of tuples where the fisrt entry in each tuple is the line number in the source file the instruction was found on. The second value is program counter value at the start of the instructions, and so where to store the first byte of the instruction.  The last value in the tuple is a list of values representing the values to be placed in memory. Here, our first instruction has an opcode of 8 and is followed by the values 222 and 255. Recall the first byte is the lower order byte. If we look up the opcode 8 we will see it is the SET instruction. Since the this instruction requires a register index be added to it, and the register index did not change the opcode. We can deduce the register is register 0 or ACC. The value to be loaded into register 0 is 255 << 8 + 222 =  65502 or 0xFFDE. So the complete instruction is SET R0, 0xFFDE. The second instruction is even easier to decode. The opcode 0x00 is the HALT instruction and takes no arguments. 

OK, let’s see the code as we have it now:

"""Sweet16-GP Assembler"""
from collections import Callable

# Exception used for errors
class AssemblyError(Exception):
    pass


# Functions used in the creation of object code (used in the table below)
def VALUE_L(pc: int, value):
    val = value_to_int(value) & 0xff
    return str(val & 0xff)

def VALUE_H(pc: int, value: int) -> int:
    val = (value_to_int(value) & 0xff00) >> 8
    return str(val & 0xff)


def LABEL_L(pc, value):
    return (value) & 0xff


def LABEL_H(pc, value):
    return ((value) & 0xff00) >> 8


def OFFSET(pc, value):
    return ((value - pc)) & 0xff


def value_to_int(value):
    # convert value to int from various bases
    if isinstance(value, str):
        if value[1:3].lower() == '0x':
            return int(value, 16)
        elif value[1:3].lower == '0c':
            return int(value, 8)
        elif value[1:3].lower() == '0b':
            return int(value, 2)
        else:
            print('Invalid value.')
    elif isinstance(value, int):
        return value
    # else:
    #     raise ValueError("Expected numerical value, got: {0}".format(value))


def register_to_int(reg):
    if isinstance(reg, str):
        if reg.startswith('@R'):
            return int(reg[2:], 16)
        elif reg.startswith('R') and reg.endswith(','):
            return int(reg[1:-1], 16)
        elif reg.startswith('R'):
            return int(reg[1:])
        else:
            raise ValueError("Cannot parse the register: '{0}'".format(reg))


opcode_table = {
    # TODO: Get DATA implemented

    'HALT': {
        'implicit': ['0x00'],
    },
    'BRA': {
        'immediate': ['0x01', VALUE_L],
        'offset': ['0x01', OFFSET],
    },
    'BRC': {
        'immediate': ['0x02', VALUE_L],
        'offset': ['0x02', OFFSET],
    },
    'BRZ': {
        'immediate': ['0x03', VALUE_L],
        'offset': ['0x03', OFFSET],
    },
    'BRN': {
        'immediate': ['0x04', VALUE_L],
        'offset': ['0x04', OFFSET],
    },
    'BRV': {
        'immediate': ['0x05', VALUE_L],
        'offset': ['0x05', OFFSET],
    },
    'BSR': {
        'immediate': ['0x06', VALUE_L, VALUE_H],
        'offset': ['0x06', LABEL_L, LABEL_H],
    },
    'RTS': {
        'implicit': ['0x07'],
    },

    'SET': {
        'direct': ['0x08', VALUE_L, VALUE_H],
    },
    'LD': {
        'register': ['0x10'],
        'indirect': ['0x20'],
    },
    'ST': {
        'register': ['0x18'],
        'indirect': ['0x28'],
    },
    'LDD': {
        'indirect': ['0x30'],
    },
    'STD': {
        'indirect': ['0x38'],
    },
    'POP': {
        'indirect': ['0x40'],
    },
    'STP': {
        'indirect': ['0x48'],
    },
    'ADD': {
        'register': ['0x50'],
    },
    'SUB': {
        'register': ['0x58'],
    },
    'MUL': {
        'register': ['0x60'],
    },
    'DIV': {
        'register': ['0x68'],
    },
    'AND': {
        'register': ['0x70'],
    },
    'OR': {
        'register': ['0x78'],
    },
    'XOR': {
        'register': ['0x80'],
    },
    'NOT': {
        'register': ['0x88'],
    },
    'SHL': {
        'register': ['0x90'],
    },
    'SHR': {
        'register': ['0x98'],
    },
    'ROL': {
        'register': ['0xA0'],
    },
    'ROR': {
        'register': ['0xA8'],
    },
    'POPD': {
        'indirect': ['0xE0'],
    },
    'CPR': {
        'register': ['0xE8'],
    },
    'INC': {
        'register': ['0xF0'],
    },
    'DEC': {
        'register': ['0xF8'],
    },
}

def parse_value(val: str) -> str:
    """ Try to parse a value, which can be an integer
        in base 2, 8, 10, 16 or a register.

        Returns: A string representation of the integer
        value in base 10, or empty string if the value
        cannot be converted to an integer.
        On Error: return original value.
    """
    # convert value to int from various bases
    if isinstance(val, str):
        # Get any possible prefix
        val = val.strip(' ')
        base = val[0:2]
        if val.isnumeric():
            return str(int(val, 10))
        if base == '0x':
            return str(int(val, 16))
        elif base == '0c':
            return str(int(val, 8))
        elif base == '0b':
            return str(int(val, 2))
        elif val.__contains__('R'):
            return ''
        else:
            return val


def parse_address_mode(fields: list) -> str:
    """ Example inputs:
                ['HALT']                    : implicit mode
                ['BRA', '0x1F']             : immediate mode
                [LD, R2]                    : register
                [STD, @R3]                  : indirect
                ['SET', 'R0,', '0xFFFE']    : direct
                ['BYTE' '0x1f']             : immediate
                ['WORD' '0x3BFC']           : immediate
                ['STRING' 'This is a test'] : immediate
                ['BRC' 'end']               : offset
    """
    numfields = len(fields)
    if numfields == 1:
        return 'implicit'
    elif numfields == 2:
        # if fields[0] in directives:
        #     return 'immediate'
        if fields[1].__contains__('@R'):
            return 'indirect'
        elif fields[1].__contains__('R') and fields[1].__contains__(','):
            return 'direct'
        elif fields[1].__contains__('R'):
            return 'register'
        elif parse_value(fields[1]).isnumeric():
            return 'immediate'
        elif parse_value(fields[1]).isalnum() or parse_value(fields[1]).isalpha():
            return 'offset'
    elif numfields == 3 and parse_value(fields[2]).isnumeric():
        return 'direct'
    elif numfields == 3 and '(' in fields[2] and ')' in fields[2]:
        return 'direct'
    else:
        raise ValueError('Expected numeric value got: {expected}'.format(expected=fields[2]))


def parse_register(field: str) -> str:
    """ Examples:
            R2
            @R3
            R2, 0x1F
    """
    register = field.upper()
    if register.__contains__('R'):
        pos = field.find('R')
        register = field[pos + 1:pos + 2]
    return register


def parse_opcode(line: str, symbols) -> tuple:
    """ Break the line into its constituent parts.
        Locate any labels and store their definition.
        Then locate the instruction's addressing mode,
        and stores it in mode. Calculate the instruction's
        actual opcode value.

        Returns: tuple(value, register, mode, objcode) where
        value is a string to be evaluated in the second pass.
        register is the register value or empty string if no
        register exists. "mode" is the addressing mode and
        objcode is a dict containing the base opcode value,
        and a list of functions needed to process the
        instruction
    """
    fields = line.split(None, 1)
    nofields = len(fields)
    if nofields > 1:
        extra = fields[1].split(',')
        if len(extra) > 1:
            fields[1] = extra[0]
            fields.extend(extra[1:])
    nofields = len(fields)
    mnemonic = fields[0]
    register = ''
    value = ''

    """ Examples:
            HALT
            BRA 0x1F
            SET R0, 0xFFFE
            LD R2
            STD @R3
    """
    # Get register and value
    if nofields > 1:
        register = parse_register(fields[1])

    if register == '' and nofields > 1:
        value = parse_value(fields[1])

    if nofields > 2:
        register = parse_register(fields[1])
        value = parse_value(fields[2])

    # Get address mode
    mode = parse_address_mode(fields)

    # Get all addressing modes for this mnemonic
    opcodemodes = opcode_table.get(mnemonic)
    if not opcodemodes:
        raise AssemblyError("Unknown opcode '{}'".format(mnemonic))

    # Get the address mode used in the instruction
    objcode = opcodemodes.get(mode)
    if not objcode:
        raise AssemblyError("Invalid addressing mode '{0}' for {1}".format(mode, mnemonic))

    return value, register, mode, mnemonic, list(objcode)


def parse_lines(lines, symbols):
    """ Determine mnemonic, register, value, and address mode"""
    for lineno, line in enumerate(lines, 1):
        # Handle labels
        label, *colon, statement = line.rpartition(":")
        try:
            # parse the line into ir (intermediate representation)
            data = lineno, label, parse_opcode(statement, symbols) if statement \
                else (None, None, None, None, None)
            yield data
        except AssemblyError as e:
            print("{0:4d} : Error : {1}".format(lineno, e))


def assemble(lines, lc=0):
    """Breaks the line into its constituent parts.
        it then locates the instruction's addressing mode,
        stores it in mode, and calculates the instruction's
        actual opcode value and length. """
    objcode = []
    symbols = {}

    # Pass 1 : Parse instructions and create intermediate code
    for lineno, label, (value, register, mode, mnemonic, icode) in parse_lines(lines, symbols):
        # Try to evaluate numeric labels and set the lc (location counter)
        if label:
            try:
                lc = int(eval(label, symbols))
            except (ValueError, NameError):
                symbols[label] = lc

        # Store the resulting object code for later
        # expansion and adjust the location counter lc.
        if icode:
            objcode.append((lineno, lc, value, register, mode, mnemonic, icode))
            lc += len(icode)

    # Second pass -- Assembly
    execode = []
    for lineno, pc, value, register, mode, mnemonic, icode in objcode:
        # Evaluate the string
        try:
            symbols[pc] = pc

            if value is not '':
                realvalue = eval(value, symbols)

            if isinstance(realvalue, str):
                realvalue = ord(realvalue) & 0xff

            if not isinstance(realvalue, int):
                raise TypeError("Integer expected in {0}".format(value))
        except Exception as e:
            print("{0:4d} : Error : {1}".format(lineno, e), file=sys.stderr)
            realvalue = 0

        # Encode the instruction. If register is present,
        # it must be added to the base opcode value stored
        # in icode[0]
        opcode = int(icode[0], 16)
        if register != '':
            opcode += int(register, 16)
        encode = [op(pc, realvalue) if isinstance(op, Callable) else op for op in icode]
        encode[0] = opcode

        execode.append((lineno, pc, encode))
    return execode


def replace_register_names(line):
    """ Replace register names with 'R' + register index.
        Example: ACC becomes R0. STATUS becomes R6."""
    line = line.replace('ACC', 'R0')
    line = line.replace('RETSTACK', 'R4')
    line = line.replace('COMP', 'R5')
    line = line.replace('STATUS', 'R6')
    line = line.replace('PC', 'R7')
    return line


def strip_lines(lines):
    """ Takes a sequence of lines and strips comments and blank lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        line = replace_register_names(line)
        yield line


if __name__ == '__main__':
    text = ';This is a comment\n' \
           'start:  SET  ACC, 0xFFDE  ; This is also a comment\n' \
           'end:    HALT  ; end of program'
    lines = text.split('\n')

    # Remove comments and blank lines
    lines = strip_lines(lines)
    # Assemble code
    print(assemble(lines))

Recall back in our emulator we added a routine to load a hex-file:

"""     
Load a program assembled into a hex file and store in memory.     File format is: each line contains 18 bytes. First two bytes are the beginning address of the data in the current line. The following 16 bytes are the memory values. All values are in hexadecimal. If the 0x character pair is used, it will be stripped. All values are space delimited. The file must end with a new line.     
"""

Since this is the format our emulator expects, we are going to write a function to write out our data into this format:

def write_rom_code(execode):
    last_pc = 0
    rom = []

    for lineno, pc, ecode in execode:
        if last_pc > pc:
            pass
        elif last_pc < pc:
            while last_pc < pc:
                rom.append('0x00')
                last_pc += 1
        else:
            for byte in ecode:
                byte = '0x%.2x' % int(byte)
                rom.append(byte)
                last_pc += 1

    while last_pc % 16 != 0:
        rom.append('0x00')
        last_pc += 1

    return rom

As you can see, the process is pretty straight forward. The code write the instruction code from the execode parameter and if this does not end an a 16 byte boundery, it write 0x00 from the end of the code to the next 16 byte boundary. This is accomplished in while loop near the end of the function. Also note that we convert the integer values to hexadecimal value. This is typical and comes in handy when transfering programs across a serial of JTAG channel to load it into hardware. 

Add this function just above the final if statement (if __name__ == ‘__main__’:) and the change that if statement to read as follows:

if __name__ == '__main__':
    text = ';This is a comment\n' \
           'start:  SET  ACC, 0xFFDE  ; This is also a comment\n' \
           'end:    HALT  ; end of program'
    lines = text.split('\n')

    # Remove comments and blank lines
    lines = strip_lines(lines)
    # Assemble
    code = assemble(lines)
    # Convert ot Hexfile format
    print(write_rom_code(code))

Now if we run the program we get a list of ROM values that always end on a 16 byte boundary. Note that this raw ROM format does not include the leading two byte address values:

['0x08', '0xde', '0xff', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00']

Now that we have a raw ROM format, we need to a function to convert this to our hex file format. Since we will be writing to a file we will need to import the sys module.

def rom2hexfile(rom):
col = 0
row = 0

max_cols = 16
max_rows = int(len(rom) / max_cols)
idx = 0


# Generate output filename
# If none given use input filename's
# path and base name with a '.hex'
# extension appended.
if len(sys.argv) > 2:
# we should have an outfile name
filename = sys.argv[2]
else:
file = sys.argv[1]
file = file.split('.')
head, tail = os.path.split(sys.argv[1])
parts = tail.split('.')
filename = os.path.join(head, parts[0] + '.hex')

with open(filename, 'w') as ofh:
for b in rom:
if col == 0:
ofh.write('{:02x} {:02x} '.format((idx & 0xff), (idx & 0xff00) >> 8))
ofh.write('{:02x} '.format(int(b, 16)))
else:
ofh.write('{:02x} '.format(int(b, 16)))
col += 1
if col >= max_cols:
row += 1
col = 0
ofh.write('\n')
if row >= max_rows:
ofh.write('\n')
break;
idx += 1
ofh.close()

The code above writes the hex file to a file with the same name as the asm file we load. Well, actually, we haven’t loaded an asm file yet. We’ve been passing our assembly code as a string to our assemble function. The above code depends on a main() function we’ve yet to write and filename values passed as command-line arguments. So let’s see our main function:

def main():

    if len(sys.argv) >= 2:
        filename = sys.argv[1]

    # Remove comments and blank lines
    lines = strip_lines(open(filename))

    if 1:
        rom = write_rom_code(assemble(lines))
        print(rom)

Now that we are accpting assembly language files from the command line we will be needing a simple assembly language file. Below is a simple assembly program to add two number. Create a file named add.asm (all our assembly language files will have a file extension of “.asm”) and save it as add.asm in the same folder as the assembler.

;
; Test ADD Rn
; On Exit:
; Save in a file named add.asm
;
start:      SET R1, 0x00fe      ; initialize pointers.
            SET R0, 0x01        ; load ACC with 1
            ADD R1              ; ACC gets 0cfe + 0x01 = 0xff
end:        HALT                ; STOP Execution

Now we need to change our “if name equals main” statement to call our main() function: 

if __name__ == '__main__':
    import sys, os
    main()

Now if you run this code from the command line, you should get the following output:

python3 assembler_03.py add.asm
 ['0x09', '0xfe', '0x00', '0x08', '0x01', '0x00', '0x51', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00']

In addition you should have a new file named add.hex in the same folder as your add.asm file. The add.hex file should have the following contents:

00 00 09 fe 00 08 01 00 51 00 00 00 00 00 00 00 00 00 

As you can see we added the two address bytes to the rom format. This hex file should now be loadable into the emulator. 

Now let’s see it all put together for completeness:

"""Sweet16-GP Assembler"""
from collections import Callable

# Exception used for errors
class AssemblyError(Exception):
    pass


# Functions used in the creation of object code (used in the table below)
def VALUE_L(pc: int, value):
    val = value_to_int(value) & 0xff
    return str(val & 0xff)

def VALUE_H(pc: int, value: int) -> int:
    val = (value_to_int(value) & 0xff00) >> 8
    return str(val & 0xff)


def LABEL_L(pc, value):
    return (value) & 0xff


def LABEL_H(pc, value):
    return ((value) & 0xff00) >> 8


def OFFSET(pc, value):
    return ((value - pc)) & 0xff


def value_to_int(value):
    # convert value to int from various bases
    if isinstance(value, str):
        if value[1:3].lower() == '0x':
            return int(value, 16)
        elif value[1:3].lower == '0c':
            return int(value, 8)
        elif value[1:3].lower() == '0b':
            return int(value, 2)
        else:
            print('Invalid value.')
    elif isinstance(value, int):
        return value
    # else:
    #     raise ValueError("Expected numerical value, got: {0}".format(value))


def register_to_int(reg):
    if isinstance(reg, str):
        if reg.startswith('@R'):
            return int(reg[2:], 16)
        elif reg.startswith('R') and reg.endswith(','):
            return int(reg[1:-1], 16)
        elif reg.startswith('R'):
            return int(reg[1:])
        else:
            raise ValueError("Cannot parse the register: '{0}'".format(reg))


opcode_table = {
    # TODO: Get DATA implemented

    'HALT': {
        'implicit': ['0x00'],
    },
    'BRA': {
        'immediate': ['0x01', VALUE_L],
        'offset': ['0x01', OFFSET],
    },
    'BRC': {
        'immediate': ['0x02', VALUE_L],
        'offset': ['0x02', OFFSET],
    },
    'BRZ': {
        'immediate': ['0x03', VALUE_L],
        'offset': ['0x03', OFFSET],
    },
    'BRN': {
        'immediate': ['0x04', VALUE_L],
        'offset': ['0x04', OFFSET],
    },
    'BRV': {
        'immediate': ['0x05', VALUE_L],
        'offset': ['0x05', OFFSET],
    },
    'BSR': {
        'immediate': ['0x06', VALUE_L, VALUE_H],
        'offset': ['0x06', LABEL_L, LABEL_H],
    },
    'RTS': {
        'implicit': ['0x07'],
    },

    'SET': {
        'direct': ['0x08', VALUE_L, VALUE_H],
    },
    'LD': {
        'register': ['0x10'],
        'indirect': ['0x20'],
    },
    'ST': {
        'register': ['0x18'],
        'indirect': ['0x28'],
    },
    'LDD': {
        'indirect': ['0x30'],
    },
    'STD': {
        'indirect': ['0x38'],
    },
    'POP': {
        'indirect': ['0x40'],
    },
    'STP': {
        'indirect': ['0x48'],
    },
    'ADD': {
        'register': ['0x50'],
    },
    'SUB': {
        'register': ['0x58'],
    },
    'MUL': {
        'register': ['0x60'],
    },
    'DIV': {
        'register': ['0x68'],
    },
    'AND': {
        'register': ['0x70'],
    },
    'OR': {
        'register': ['0x78'],
    },
    'XOR': {
        'register': ['0x80'],
    },
    'NOT': {
        'register': ['0x88'],
    },
    'SHL': {
        'register': ['0x90'],
    },
    'SHR': {
        'register': ['0x98'],
    },
    'ROL': {
        'register': ['0xA0'],
    },
    'ROR': {
        'register': ['0xA8'],
    },
    'POPD': {
        'indirect': ['0xE0'],
    },
    'CPR': {
        'register': ['0xE8'],
    },
    'INC': {
        'register': ['0xF0'],
    },
    'DEC': {
        'register': ['0xF8'],
    },
}

def parse_value(val: str) -> str:
    """ Try to parse a value, which can be an integer
        in base 2, 8, 10, 16 or a register.

        Returns: A string representation of the integer
        value in base 10, or empty string if the value
        cannot be converted to an integer.
        On Error: return original value.
    """
    # convert value to int from various bases
    if isinstance(val, str):
        # Get any possible prefix
        val = val.strip(' ')
        base = val[0:2]
        if val.isnumeric():
            return str(int(val, 10))
        if base == '0x':
            return str(int(val, 16))
        elif base == '0c':
            return str(int(val, 8))
        elif base == '0b':
            return str(int(val, 2))
        elif val.__contains__('R'):
            return ''
        else:
            return val


def parse_address_mode(fields: list) -> str:
    """ Example inputs:
                ['HALT']                    : implicit mode
                ['BRA', '0x1F']             : immediate mode
                [LD, R2]                    : register
                [STD, @R3]                  : indirect
                ['SET', 'R0,', '0xFFFE']    : direct
                ['BYTE' '0x1f']             : immediate
                ['WORD' '0x3BFC']           : immediate
                ['STRING' 'This is a test'] : immediate
                ['BRC' 'end']               : offset
    """
    numfields = len(fields)
    if numfields == 1:
        return 'implicit'
    elif numfields == 2:
        # if fields[0] in directives:
        #     return 'immediate'
        if fields[1].__contains__('@R'):
            return 'indirect'
        elif fields[1].__contains__('R') and fields[1].__contains__(','):
            return 'direct'
        elif fields[1].__contains__('R'):
            return 'register'
        elif parse_value(fields[1]).isnumeric():
            return 'immediate'
        elif parse_value(fields[1]).isalnum() or parse_value(fields[1]).isalpha():
            return 'offset'
    elif numfields == 3 and parse_value(fields[2]).isnumeric():
        return 'direct'
    elif numfields == 3 and '(' in fields[2] and ')' in fields[2]:
        return 'direct'
    else:
        raise ValueError('Expected numeric value got: {expected}'.format(expected=fields[2]))


def parse_register(field: str) -> str:
    """ Examples:
            R2
            @R3
            R2, 0x1F
    """
    register = field.upper()
    if register.__contains__('R'):
        pos = field.find('R')
        register = field[pos + 1:pos + 2]
    return register


def parse_opcode(line: str, symbols) -> tuple:
    """ Break the line into its constituent parts.
        Locate any labels and store their definition.
        Then locate the instruction's addressing mode,
        and stores it in mode. Calculate the instruction's
        actual opcode value.

        Returns: tuple(value, register, mode, objcode) where
        value is a string to be evaluated in the second pass.
        register is the register value or empty string if no
        register exists. "mode" is the addressing mode and
        objcode is a dict containing the base opcode value,
        and a list of functions needed to process the
        instruction
    """
    fields = line.split(None, 1)
    nofields = len(fields)
    if nofields > 1:
        extra = fields[1].split(',')
        if len(extra) > 1:
            fields[1] = extra[0]
            fields.extend(extra[1:])
    nofields = len(fields)
    mnemonic = fields[0]
    register = ''
    value = ''

    """ Examples:
            HALT
            BRA 0x1F
            SET R0, 0xFFFE
            LD R2
            STD @R3
    """
    # Get register and value
    if nofields > 1:
        register = parse_register(fields[1])

    if register == '' and nofields > 1:
        value = parse_value(fields[1])

    if nofields > 2:
        register = parse_register(fields[1])
        value = parse_value(fields[2])

    # Get address mode
    mode = parse_address_mode(fields)

    # Get all addressing modes for this mnemonic
    opcodemodes = opcode_table.get(mnemonic)
    if not opcodemodes:
        raise AssemblyError("Unknown opcode '{}'".format(mnemonic))

    # Get the address mode used in the instruction
    objcode = opcodemodes.get(mode)
    if not objcode:
        raise AssemblyError("Invalid addressing mode '{0}' for {1}".format(mode, mnemonic))

    return value, register, mode, mnemonic, list(objcode)


def parse_lines(lines, symbols):
    """ Determine mnemonic, register, value, and address mode"""
    for lineno, line in enumerate(lines, 1):
        # Handle labels
        label, *colon, statement = line.rpartition(":")
        try:
            # parse the line into ir (intermediate representation)
            data = lineno, label, parse_opcode(statement, symbols) if statement \
                else (None, None, None, None, None)
            yield data
        except AssemblyError as e:
            print("{0:4d} : Error : {1}".format(lineno, e))


def assemble(lines, lc=0):
    """Breaks the line into its constituent parts.
        it then locates the instruction's addressing mode,
        stores it in mode, and calculates the instruction's
        actual opcode value and length. """
    objcode = []
    symbols = {}

    # Pass 1 : Parse instructions and create intermediate code
    for lineno, label, (value, register, mode, mnemonic, icode) in parse_lines(lines, symbols):
        # Try to evaluate numeric labels and set the lc (location counter)
        if label:
            try:
                lc = int(eval(label, symbols))
            except (ValueError, NameError):
                symbols[label] = lc

        # Store the resulting object code for later
        # expansion and adjust the location counter lc.
        if icode:
            objcode.append((lineno, lc, value, register, mode, mnemonic, icode))
            lc += len(icode)

    # Second pass -- Assembly
    execode = []
    for lineno, pc, value, register, mode, mnemonic, icode in objcode:
        # Evaluate the string
        try:
            symbols[pc] = pc

            if value is not '':
                realvalue = eval(value, symbols)

            if isinstance(realvalue, str):
                realvalue = ord(realvalue) & 0xff

            if not isinstance(realvalue, int):
                raise TypeError("Integer expected in {0}".format(value))
        except Exception as e:
            print("{0:4d} : Error : {1}".format(lineno, e), file=sys.stderr)
            realvalue = 0

        # Encode the instruction. If register is present,
        # it must be added to the base opcode value stored
        # in icode[0]
        opcode = int(icode[0], 16)
        if register != '':
            opcode += int(register, 16)
        encode = [op(pc, realvalue) if isinstance(op, Callable) else op for op in icode]
        encode[0] = opcode

        execode.append((lineno, pc, encode))
    return execode


def replace_register_names(line):
    """ Replace register names with 'R' + register index.
        Example: ACC becomes R0. STATUS becomes R6."""
    line = line.replace('ACC', 'R0')
    line = line.replace('RETSTACK', 'R4')
    line = line.replace('COMP', 'R5')
    line = line.replace('STATUS', 'R6')
    line = line.replace('PC', 'R7')
    return line


def strip_lines(lines):
    """ Takes a sequence of lines and strips comments and blank lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        line = replace_register_names(line)
        yield line

def write_rom_code(execode):
    last_pc = 0
    rom = []

    for lineno, pc, ecode in execode:
        if last_pc > pc:
            pass
        elif last_pc < pc:
            while last_pc < pc:
                rom.append('0x00')
                last_pc += 1
        else:
            for byte in ecode:
                byte = '0x%.2x' % int(byte)
                rom.append(byte)
                last_pc += 1

    while last_pc % 16 != 0:
        rom.append('0x00')
        last_pc += 1

    return rom

def rom2hexfile(rom):
    col = 0
    row = 0

    max_cols = 16
    max_rows = int(len(rom) / max_cols)
    idx = 0
    filename = 'a.hex'

    # Generate output filename
    # If none given use input filename's
    # path and base name with a '.hex'
    # extension appended.
    if len(sys.argv) > 2:
        # we should have an outfile name
        filename = sys.argv[2]
    else:
        file = sys.argv[1]
        file = file.split('.')
        head, tail = os.path.split(sys.argv[1])
        parts = tail.split('.')
        filename = os.path.join(head, parts[0] + '.hex')

    with open(filename, 'w') as ofh:
        for b in rom:
            if col == 0:
                ofh.write('{:02x} {:02x} '.format((idx & 0xff), (idx & 0xff00) >> 8))
                ofh.write('{:02x} '.format(int(b, 16)))
            else:
                ofh.write('{:02x} '.format(int(b, 16)))
            col += 1
            if col >= max_cols:
                row += 1
                col = 0
                ofh.write('\n')
            if row >= max_rows:
                ofh.write('\n')
                break;
            idx += 1
        ofh.close()


def main():

    if len(sys.argv) >= 2:
        filename = sys.argv[1]

    # Remove comments and blank lines
    lines = strip_lines(open(filename))

    if 1:
        rom = write_rom_code(assemble(lines))
        rom2hexfile(rom)
        print(rom)


if __name__ == '__main__':
    import sys, os
    main()

OK, that it for today. We now have a working assembler. However, there are a few features I would like to add. So next time we will add a few simple assembler directives and complete the assembler. Then we will develop a few simple assembly language programs for the Sweet16gp. 

Until next time, Keep coding!

 

Sweet16-GP Assembler – Part 2

by SysOps 0 Comments
This entry is part 7 of 8 in the series Sweet16-GP CPU: A Complete Development Cycle

Last time we left our assembler with only two functions. The strip_lines() function which removes comments and blank lines and the rename_registers() function which replaces the register’s friendly names with the cononical names i.e. Rx where x is a number between 0 and 7 (the register’s index value into the register file).

With these to pre-processing functions out of the way we are ready to start processing our assembly code.  We will use a two-pass method of assembly. The first pass will break each line into its constituent parts and then locate the instruction’s addressing mode, stores it in a field called mode, and calculate the instruction’s actual opcode value and length. In addition it will store a labels in a symbol table to record their value for later use in the second pass or the assembler.

Our main assembler method will take our clean lines, one at a time, and parse them as described above. We will need a table of instructions that includes details about each instruct including it’s addressing mode(s), opcode(s) and length. You may be wondering why the “(s)” above. This is because some of our instructions have multiple addressing modes and therefore multipple opcodes and perhaps even multiple instruction lengths. We will use our table to disambiguate the instruction format into the proper values. 

We will implement our opcode_table as a dict. Each entry will be  indexed (key) by the instruction mnemonic.  The value for each key in the table will be a dict indexed by the addressing mode. Each addressing mode will then have a list of values containing the instruction’s opcode. If the opcode requires a one or two by value, they will be provided as VALUE_L and VALUE_H in the list. This description sounds much more complex than it really is. Refere to the code for our table below and re-read the above paragraph. It should all become clear.

opcode_table = {
    'HALT': {
        'implicit': ['0x00'],
    },
    'BRA': {
        'immediate': ['0x01', VALUE_L],
        'offset': ['0x01', OFFSET],
    },
    'BRC': {
        'immediate': ['0x02', VALUE_L],
        'offset': ['0x02', OFFSET],
    },
    'BRZ': {
        'immediate': ['0x03', VALUE_L],
        'offset': ['0x03', OFFSET],
    },
    'BRN': {
        'immediate': ['0x04', VALUE_L],
        'offset': ['0x04', OFFSET],
    },
    'BRV': {
        'immediate': ['0x05', VALUE_L],
        'offset': ['0x05', OFFSET],
    },
    'BSR': {
        'immediate': ['0x06', VALUE_L, VALUE_H],
        'offset': ['0x06', LABEL_L, LABEL_H],
    },
    'RTS': {
        'implicit': ['0x07'],
    },

    'SET': {
        'direct': ['0x08', VALUE_L, VALUE_H],
    },
    'LD': {
        'register': ['0x10'],
        'indirect': ['0x20'],
    },
    'ST': {
        'register': ['0x18'],
        'indirect': ['0x28'],
    },
    'LDD': {
        'indirect': ['0x30'],
    },
    'STD': {
        'indirect': ['0x38'],
    },
    'POP': {
        'indirect': ['0x40'],
    },
    'STP': {
        'indirect': ['0x48'],
    },
    'ADD': {
        'register': ['0x50'],
    },
    'SUB': {
        'register': ['0x58'],
    },
    'MUL': {
        'register': ['0x60'],
    },
    'DIV': {
        'register': ['0x68'],
    },
    'AND': {
        'register': ['0x70'],
    },
    'OR': {
        'register': ['0x78'],
    },
    'XOR': {
        'register': ['0x80'],
    },
    'NOT': {
        'register': ['0x88'],
    },
    'SHL': {
        'register': ['0x90'],
    },
    'SHR': {
        'register': ['0x98'],
    },
    'ROL': {
        'register': ['0xA0'],
    },
    'ROR': {
        'register': ['0xA8'],
    },
    'POPD': {
        'indirect': ['0xE0'],
    },
    'CPR': {
        'register': ['0xE8'],
    },
    'INC': {
        'register': ['0xF0'],
    },
    'DEC': {
        'register': ['0xF8'],
    },
}

As you can see, our instructions can fall into one of four addressing modes. These include implicit (when the instruction itself indicates the addressing mode),  immediate (when the value needed by the instruction directly follows it), indirect (when the register specified contains the value needed to complete the instructions, and register (when the instruction operates directly on the register specified). To tell the truth, these could be further broken down but that would only complicate things.

OK, now we have our opcode_table we are going to start writing our parser. Our parser is a little bit unorthodox. Since our input language is so constrained we can forego all the formal theory and much of the discipline or parser development. 

I do believe however, that any developer worth their weight should write at least a simple parser at some point. Understanding even a very simple parser expands your understanding of how your tools work whether you’re a web developer, AI master, ETL programmer, or Embedded Systems developer. If you call yourself a Software Engineer or Programmer and you have written at least a simple parser. compiler, or interpreter, then you owe it to yourself to do so! To get you started I recomend DR. Jack Crenshaw’s “Let’s Build A Compiler” and Ruslans’s “Let’s Build A Simple Interpreter” blog serries. If you’re a seasoned developer checkout the book “Language Implementation Patterns”. For those mathematically oriented, checkout the Dragon Book

So what I am going to do is simply plow forward. We will tackle one issue at a time and add functions to handle each issue as it arises. Note that this technique wont work if you’re trying to write a high level language compiler or interpreter. There you will need all the formal methods and a have good design before writing code. What we are going for here is a quick and simple boot-strap assembler. We can write better tools later. For now, we just need something to get us up and running.

OK, the first thing we are going to need is some set of methods to break up the line of source code we feed to our assembler into it’s various fields. We will create an entry point function called assemble which will do our heavy lifting by calling other functions.

We will need to produce some data about the current source line. This will include any label, register, addressing mode, and value. Additionally, we may also want to keep track of the original instruction mnemonic and any intermediate code we produce. So we will need a supporting function to handle this. We will also need some storage for any lables or symbols we encounter during parsing.

The code below is our entry point. It will be responsible for handling our two apasses over our code and  will call the functions that do the heavy lifting.

def assemble(lines, lc=0):
    """Breaks the line into its constituent parts.
        it then locates the instruction's addressing mode,
        stores it in mode, and calculates the instruction's
        actual opcode value and length. """
    objcode = []
    symbols = {}

    # Pass 1 : Parse instructions and create intermediate code
    for lineno, label, (value, register, mode, mnemonic, icode) in parse_lines(lines, symbols):
        # Try to evaluate numeric labels and set the lc (location counter)
        if label:
            try:
                lc = int(eval(label, symbols))
            except (ValueError, NameError):
                symbols[label] = lc

        # Store the resulting object code for later expansion
        if icode:
            objcode.append((lineno, lc, value, register, mode, mnemonic, icode))
            lc += len(icode

Note the call to parse_lines() we will write this function next. But, let’s step back and think about what we need it to do…

Given a line of code like:

start:    SET   R1, 0xFA

We can see we need to break up this line into it’s smaller parts. Luckily, this is pretty easy to do. Also, since each line will follow a similar pattern for the fields, we can easily deduce what each field contains. 

# Exception used for errors
class AssemblyError(Exception):
    pass

def parse_lines(lines, symbols):
    """ Determine line number, mnemonic, register, value, and address mode"""
    for lineno, line in enumerate(lines, 1):
        # Handle labels
        label, *colon, statement = line.rpartition(":")
        try:
            # parse the line into ir (intermediate representation)
            data = lineno, label, parse_opcode(statement, symbols) if statement else (None, None, None, None, None)
            yield data
        except AssemblyError as e:
            print("{0:4d} : Error : {1}".format(lineno, e))

Refering to the code above, we use the enumerate function to keep track of our line numbers. One drawback to doing this approach is that our comment and blank lines wont be counted. Something we could fix by tracking line numbers in the strip_lines() function. But for now, we will do it this way.

Enumerate returns to us an integer value for the line number (lineno) and the line of source code. Next, we use rpartition(“:”) on on the source line to break it into three parts. These parts are return in a tuple containing the part before the seperator (colon in our case, which ends a label declaration), the seperator, and the part after the seperator. If any part is not found (for example, if there is no colon in the source line) then that part returns an empty string value. So if our line does not contains a colon, we will get an empty string for the label and for the seperator followed by the initial line of code which we save in the statement variable.

Once we have gotten any label, we need to take our statement and further parse it into its various parts. If it turns out that our statement is empty, we need to return a default tuple of ‘None’ values otherwise, we make a call to parse_opcode() passing in our statement and our symbol table so that any symbols found in our source line can be handled. If this all goes horribly wrong, we throw an exception and print the error message. We defined a simple exception class above for this purpose.

Our next task is to implement a function to parse the line into opcodes. We called this function pasre_opcode() above. The implementation is shown below:

def parse_opcode(line: str, symbols) -> tuple:
    """ Break the line into its constituent parts.
        Locate any labels and store their definition.
        Then locate the instruction's addressing mode,
        and stores it in mode. Calculate the instruction's
        actual opcode value.

        Returns: tuple(value, register, mode, objcode) where
        value is a string to be evaluated in the second pass.
        register is the register value or empty string if no
        register exists. "mode" is the addressing mode and
        objcode is a dict containing the base opcode value,
        and a list of functions needed to process the
        instruction
    """
    fields = line.split(None, 1)
    nofields = len(fields)
    if nofields > 1:
        extra = fields[1].split(',')
        if len(extra) > 1:
            fields[1] = extra[0]
            fields.extend(extra[1:])
    nofields = len(fields)
    mnemonic = fields[0]
    register = ''
    value = ''

    """ Examples:
            HALT
            BRA 0x1F
            SET R0, 0xFFFE
            LD R2
            STD @R3  
    """
    # Get register and value
    if nofields > 1:
        register = parse_register(fields[1])

    if register == '' and nofields > 1:
        value = parse_value(fields[1])

    if nofields > 2:
        register = parse_register(fields[1])
        value = parse_value(fields[2])

    # Get address mode
    mode = parse_address_mode(fields)

    # Get all addressing modes for this mnemonic
    opcodemodes = opcode_table.get(mnemonic)
    if not opcodemodes:
        raise AssemblyError("Unknown opcode '{}'".format(mnemonic))

    # Get the address mode used in the instruction
    objcode = opcodemodes.get(mode)
    if not objcode:
        raise AssemblyError("Invalid addressing mode '{0}' for {1}".format(mode, mnemonic))

    return value, register, mode, mnemonic, list(objcode)

There is a lot going on here so let’s unpack it. First, we use the split() method to split the statement into two portions. The first parameter to split() gives the character to split on and the second parameter is used to limit the number of splits. What we are doing here is seperating the instruction mnemonic from the rest of the statement. Since some instructions (i.e.: HALT) only have the mnemonic we need to check the number of fields we get back. If it’s only one, then we have an instruction like HALT or RTS. 

However, if we have more than one field returned from the split we need to further decompose the line. We now have a mnemonic in fields[0] and the remainder of the line in fields[1]. We now know we may have something like Rx, @Rx, or Rx, 0xFA following the instruction mnemonic. So the next move is to tray and seperate the remaining portion of the statement on the comma seperator. If the comma is found, one of two cases will be returned i.e.: (register, value) or @register, value). These are stored in the extra variable. We then add the extra fields to the fields variable and assign mnemonic the value in fields[0] which contains the instruction mnemonic.

Next, using the value of ‘nofields’ which indicate what we should expect next, we sort out the remaining fields and store their values into their perspective variables. In each case, we call parse functions which perform a specific sub-task.

Our first sub-task is to locate any register in the fields. At this point we know if a register exists it should be in fields[1]. So we pass this value to the parse_register() function shown below:

def parse_register(field: str) -> str:
    """ Examples:
            R2
            @R3
            R2, 0x1F
    """
    register = field.upper()
    if register.__contains__('R'):
        pos = field.find('R')
        register = field[pos + 1:pos + 2]
    return register

The first thing we do is convert the register value to upper case. Then we check if the string contains ‘R’ and return the character following the ‘R’ which should be the register index value (a single digit between 0 and 7 inclusively). We then return that value to the caller.

It’s possible that we didn’t have a register. In this case, there may be a value in the fields[1] position. So we check if the call to parse_register() returned a value of an empty string. If the latter was returned, we try and parse a value using the function parse_value() passing in fields[1] as a parameter. This situation occurs with the branch instructions where the mnemonic is directly followed by the offset value for the jump.

def parse_value(val: str) -> str:
    """ Try to parse a value, which can be an integer
        in base 2, 8, 10, 16 or a register.

        Returns: A string representation of the integer
        value in base 10, or empty string if the value
        cannot be converted to an integer.
        On Error: return original value.
    """
    # convert value to int from various bases
    if isinstance(val, str):
        # Get any possible prefix
        val = val.strip(' ')
        base = val[0:2]
        if val.isnumeric():
            return str(int(val, 10))
        if base == '0x':
            return str(int(val, 16))
        elif base == '0c':
            return str(int(val, 8))
        elif base == '0b':
            return str(int(val, 2))
        elif val.__contains__('R'):
            return ''
        else:
            return val

As you can see above the parse_value() function must handle many types of values. It first strips off any prefix to the value and tests if this matches any of the supported numerical base indicators. If a match is found the value is converted to an integer and then cast into a string and returned to the caller.

Next, we call parse_address_mode() passing in all the fields as we may need more than one to identify the addressing mode. The parse_address_mode() function can only be understood in the context of the opcode_table. So refer to the table as you gork this code:

def parse_address_mode(fields: list) -> str:
    """ Example inputs:
                ['HALT']                    : implicit mode
                ['BRA', '0x1F']             : immediate mode
                [LD, R2]                    : register
                [STD, @R3]                  : indirect
                ['SET', 'R0,', '0xFFFE']    : direct
                ['BRC' 'end']               : offset
    """
    numfields = len(fields)
    if numfields == 1:
        return 'implicit'
    elif numfields == 2:
        if fields[0] in directives:
            return 'immediate'
        if fields[1].__contains__('@R'):
            return 'indirect'
        elif fields[1].__contains__('R') and fields[1].__contains__(','):
            return 'direct'
        elif fields[1].__contains__('R'):
            return 'register'
        elif parse_value(fields[1]).isnumeric():
            return 'immediate'
        elif parse_value(fields[1]).isalnum() or parse_value(fields[1]).isalpha():
            return 'offset'
    elif numfields == 3 and parse_value(fields[2]).isnumeric():
        return 'direct'
    elif numfields == 3 and '(' in fields[2] and ')' in fields[2]:
        return 'direct'
    else:
        raise ValueError('Expected numeric value got: {expected}'.format(expected=fields[2]))

The parse_address_mode() function uses the data contained in the fields to sort out the addressing mode of the instruction and return it to the caller, parse_opcode() which then stores this value in the mode variable. 

Next we use the mnemonic value to get the instruction data from the opcode_table and then use the mode variable to get the proper opcode value and instruction format.

If all has gone well, we return a tuple containing the value, register, mode, mnemonic, variables and the object code taken from the opcode_table.

At this poiont our first pass is complete. We have all the data necessary to assemble our instruction code with the possible exception of any forward declared symbol. 

We have one last issue before we can try this out. Our opcode_table has method names in some of the fields. We need to implemnt these helper methods:

All of these methods are pretty strainght forward. They simply take the value from the source code and separe it out into high bytes and low bytes in the case of VALUE_H, VALUE_L, LABEL_H, and LABEL_L. The OFFSET function calculates the offset from the current position.

# Functions used in the creation of object code (used in the table below)
def VALUE_L(pc: int, value):
    val = value_to_int(value) & 0xff
    return str(val & 0xff)


def VALUE_H(pc: int, value: int) -> int:
    val = (value_to_int(value) & 0xff00) >> 8
    return str(val & 0xff)
 
  
def LABEL_L(pc, value):
    print(f'LABEL_L PC: {pc}, value: {value}')
    return (value) & 0xff


def LABEL_H(pc, value):
    print(f'LABEL_H PC: {pc}, value: {value}')
    return ((value) & 0xff00) >> 8


def OFFSET(pc, value):
    print(f'OFFSET: {value}')
    return ((value - pc)) & 0xff


def value_to_int(value):
    # convert value to int from various bases
    if isinstance(value, str):
        if value[1:3].lower() == '0x':
            return int(value, 16)
        elif value[1:3].lower == '0c':
            return int(value, 8)
        elif value[1:3].lower() == '0b':
            return int(value, 2)
        else:
            print('Invalid value.')
    elif isinstance(value, int):
        return value
    # else:
    #     raise ValueError("Expected numerical value, got: {0}".format(value))


def register_to_int(reg):
    if isinstance(reg, str):
        if reg.startswith('@R'):
            return int(reg[2:], 16)
        elif reg.startswith('R') and reg.endswith(','):
            return int(reg[1:-1], 16)
        elif reg.startswith('R'):
            return int(reg[1:])
        else:
            raise ValueError("Cannot parse the register: '{0}'".format(reg))

PK, I think we’re ready to try this! Below is the final code. Note I made some adjustments to the code from the last installment so we could test what we’ve done so far. 

"""Sweet16-GP Assembler"""


# Exception used for errors
class AssemblyError(Exception):
    pass


# Functions used in the creation of object code (used in the table below)
def VALUE_L(pc: int, value):
    # print(f'VALUE_L value: {value}')
    #val = value_to_int(value) & 0xff
    #print(f'VALUE_L val: {val}')
    #return str(val & 0xff)
    pass


def VALUE_H(pc: int, value: int) -> int:
    val = (value_to_int(value) & 0xff00) >> 8
    return str(val & 0xff)


def LABEL_L(pc, value):
    return (value) & 0xff


def LABEL_H(pc, value):
    return ((value) & 0xff00) >> 8


def OFFSET(pc, value):
    return ((value - pc)) & 0xff


def value_to_int(value):
    # convert value to int from various bases
    if isinstance(value, str):
        if value[1:3].lower() == '0x':
            return int(value, 16)
        elif value[1:3].lower == '0c':
            return int(value, 8)
        elif value[1:3].lower() == '0b':
            return int(value, 2)
        else:
            print('Invalid value.')
    elif isinstance(value, int):
        return value
    # else:
    #     raise ValueError("Expected numerical value, got: {0}".format(value))


def register_to_int(reg):
    if isinstance(reg, str):
        if reg.startswith('@R'):
            return int(reg[2:], 16)
        elif reg.startswith('R') and reg.endswith(','):
            return int(reg[1:-1], 16)
        elif reg.startswith('R'):
            return int(reg[1:])
        else:
            raise ValueError("Cannot parse the register: '{0}'".format(reg))


opcode_table = {
    'HALT': {
        'implicit': ['0x00'],
    },
    'BRA': {
        'immediate': ['0x01', VALUE_L],
        'offset': ['0x01', OFFSET],
    },
    'BRC': {
        'immediate': ['0x02', VALUE_L],
        'offset': ['0x02', OFFSET],
    },
    'BRZ': {
        'immediate': ['0x03', VALUE_L],
        'offset': ['0x03', OFFSET],
    },
    'BRN': {
        'immediate': ['0x04', VALUE_L],
        'offset': ['0x04', OFFSET],
    },
    'BRV': {
        'immediate': ['0x05', VALUE_L],
        'offset': ['0x05', OFFSET],
    },
    'BSR': {
        'immediate': ['0x06', VALUE_L, VALUE_H],
        'offset': ['0x06', LABEL_L, LABEL_H],
    },
    'RTS': {
        'implicit': ['0x07'],
    },

    'SET': {
        'direct': ['0x08', VALUE_L, VALUE_H],
    },
    'LD': {
        'register': ['0x10'],
        'indirect': ['0x20'],
    },
    'ST': {
        'register': ['0x18'],
        'indirect': ['0x28'],
    },
    'LDD': {
        'indirect': ['0x30'],
    },
    'STD': {
        'indirect': ['0x38'],
    },
    'POP': {
        'indirect': ['0x40'],
    },
    'STP': {
        'indirect': ['0x48'],
    },
    'ADD': {
        'register': ['0x50'],
    },
    'SUB': {
        'register': ['0x58'],
    },
    'MUL': {
        'register': ['0x60'],
    },
    'DIV': {
        'register': ['0x68'],
    },
    'AND': {
        'register': ['0x70'],
    },
    'OR': {
        'register': ['0x78'],
    },
    'XOR': {
        'register': ['0x80'],
    },
    'NOT': {
        'register': ['0x88'],
    },
    'SHL': {
        'register': ['0x90'],
    },
    'SHR': {
        'register': ['0x98'],
    },
    'ROL': {
        'register': ['0xA0'],
    },
    'ROR': {
        'register': ['0xA8'],
    },
    'POPD': {
        'indirect': ['0xE0'],
    },
    'CPR': {
        'register': ['0xE8'],
    },
    'INC': {
        'register': ['0xF0'],
    },
    'DEC': {
        'register': ['0xF8'],
    },
}

def parse_value(val: str) -> str:
    """ Try to parse a value, which can be an integer
        in base 2, 8, 10, 16 or a register.

        Returns: A string representation of the integer
        value in base 10, or empty string if the value
        cannot be converted to an integer.
        On Error: return original value.
    """
    # convert value to int from various bases
    if isinstance(val, str):
        # Get any possible prefix
        val = val.strip(' ')
        base = val[0:2]
        if val.isnumeric():
            return str(int(val, 10))
        if base == '0x':
            return str(int(val, 16))
        elif base == '0c':
            return str(int(val, 8))
        elif base == '0b':
            return str(int(val, 2))
        elif val.__contains__('R'):
            return ''
        else:
            return val


def parse_address_mode(fields: list) -> str:
    """ Example inputs:
                ['HALT']                    : implicit mode
                ['BRA', '0x1F']             : immediate mode
                [LD, R2]                    : register
                [STD, @R3]                  : indirect
                ['SET', 'R0,', '0xFFFE']    : direct
                ['BYTE' '0x1f']             : immediate
                ['WORD' '0x3BFC']           : immediate
                ['STRING' 'This is a test'] : immediate
                ['BRC' 'end']               : offset
    """
    numfields = len(fields)
    if numfields == 1:
        return 'implicit'
    elif numfields == 2:
        # if fields[0] in directives:
        #     return 'immediate'
        if fields[1].__contains__('@R'):
            return 'indirect'
        elif fields[1].__contains__('R') and fields[1].__contains__(','):
            return 'direct'
        elif fields[1].__contains__('R'):
            return 'register'
        elif parse_value(fields[1]).isnumeric():
            return 'immediate'
        elif parse_value(fields[1]).isalnum() or parse_value(fields[1]).isalpha():
            return 'offset'
    elif numfields == 3 and parse_value(fields[2]).isnumeric():
        return 'direct'
    elif numfields == 3 and '(' in fields[2] and ')' in fields[2]:
        return 'direct'
    else:
        raise ValueError('Expected numeric value got: {expected}'.format(expected=fields[2]))


def parse_register(field: str) -> str:
    """ Examples:
            R2
            @R3
            R2, 0x1F
    """
    register = field.upper()
    if register.__contains__('R'):
        pos = field.find('R')
        register = field[pos + 1:pos + 2]
    return register


def parse_opcode(line: str, symbols) -> tuple:
    """ Break the line into its constituent parts.
        Locate any labels and store their definition.
        Then locate the instruction's addressing mode,
        and stores it in mode. Calculate the instruction's
        actual opcode value.

        Returns: tuple(value, register, mode, objcode) where
        value is a string to be evaluated in the second pass.
        register is the register value or empty string if no
        register exists. "mode" is the addressing mode and
        objcode is a dict containing the base opcode value,
        and a list of functions needed to process the
        instruction
    """
    fields = line.split(None, 1)
    nofields = len(fields)
    if nofields > 1:
        extra = fields[1].split(',')
        if len(extra) > 1:
            fields[1] = extra[0]
            fields.extend(extra[1:])
    nofields = len(fields)
    mnemonic = fields[0]
    register = ''
    value = ''

    """ Examples:
            HALT
            BRA 0x1F
            SET R0, 0xFFFE
            LD R2
            STD @R3
    """
    # Get register and value
    if nofields > 1:
        register = parse_register(fields[1])

    if register == '' and nofields > 1:
        value = parse_value(fields[1])

    if nofields > 2:
        register = parse_register(fields[1])
        value = parse_value(fields[2])

    # Get address mode
    mode = parse_address_mode(fields)

    # Get all addressing modes for this mnemonic
    opcodemodes = opcode_table.get(mnemonic)
    if not opcodemodes:
        raise AssemblyError("Unknown opcode '{}'".format(mnemonic))

    # Get the address mode used in the instruction
    objcode = opcodemodes.get(mode)
    if not objcode:
        raise AssemblyError("Invalid addressing mode '{0}' for {1}".format(mode, mnemonic))

    return value, register, mode, mnemonic, list(objcode)


def parse_lines(lines, symbols):
    """ Determine mnemonic, register, value, and address mode"""
    for lineno, line in enumerate(lines, 1):
        # Handle labels
        label, *colon, statement = line.rpartition(":")
        try:
            # parse the line into ir (intermediate representation)
            data = lineno, label, parse_opcode(statement, symbols) if statement \
                else (None, None, None, None, None)
            yield data
        except AssemblyError as e:
            print("{0:4d} : Error : {1}".format(lineno, e))


def assemble(lines, lc=0):
    """Breaks the line into its constituent parts.
        it then locates the instruction's addressing mode,
        stores it in mode, and calculates the instruction's
        actual opcode value and length. """
    objcode = []
    symbols = {}

    # Pass 1 : Parse instructions and create intermediate code
    for lineno, label, (value, register, mode, mnemonic, icode) in parse_lines(lines, symbols):
        # Try to evaluate numeric labels and set the lc (location counter)
        if label:
            try:
                lc = int(eval(label, symbols))
            except (ValueError, NameError):
                symbols[label] = lc

        # Store the resulting object code for later
        # expansion and adjust the location counter lc.
        if icode:
            objcode.append((lineno, lc, value, register, mode, mnemonic, icode))
            lc += len(icode)
    return objcode


def replace_register_names(line):
    """ Replace register names with 'R' + register index.
        Example: ACC becomes R0. STATUS becomes R6."""
    line = line.replace('ACC', 'R0')
    line = line.replace('RETSTACK', 'R4')
    line = line.replace('COMP', 'R5')
    line = line.replace('STATUS', 'R6')
    line = line.replace('PC', 'R7')
    return line


def strip_lines(lines):
    """ Takes a sequence of lines and strips comments and blank lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        line = replace_register_names(line)
        yield line


if __name__ == '__main__':
    text = ';This is a comment\n' \
           'start:  SET  ACC, 0xFFDE  ; This is also a comment\n' \
           'end:    HALT  ; end of program'
    lines = text.split('\n')

    # Remove comments and blank lines
    lines = strip_lines(lines)
    print(assemble(lines))

If you run the above code you should get an output like the following:

 

sweet16-articles-code/articles/part-07/assembler_02.py
 [
  (2, 0, '65502', '0', 'direct', 'SET', ['0x08', , ]), 
  (3, 3, '', '', 'implicit', 'HALT', ['0x00']) 
 ]

As you can see we have the intermediate representation for two instructions. All the info we will need to assemble these instructions into actual machine code. Reading the tuples the first value (2) is the line number from the source. The second value (0) is the location counter at the start of the instruction. The next value (65502) is the value provided in the instruction, followed by the register index (0), the addressing mode (direct), and the mnemonic. The final fields holds the opcode information from the opcode_table.  Try this out on several instructions and see that the results make sense. Write some unit test for the various functions and for the assemble as it stands. 

This has been a long post. Next time we will tackle the second pass of our assembler. Until then, keep coding!

 

Sweet16-GP Assembler

by SysOps 0 Comments
This entry is part 6 of 8 in the series Sweet16-GP CPU: A Complete Development Cycle

In my first post I showed you the instruction set of the Sweet16-GP. Within the instruction set presented we saw examples of the Sweet16-GP assembly language. However, we didn’t talk about the assembly language at that time. It’s now time to discuss the Sweet16-GP’s assembly language.

Statements: Statements in the Sweet16’s assembly language can be borken down into four basic parts; Label, Operation, Operand, and Comment. of all these parts only the Operation is required in all statements. Most statements however also require an operand. The operand can be a register, literal value or label reference. The label at the beginning of the statement is the label declaration.

Comments: The assembly language used with the Sweet16-GP allows comments. Comments begin with a semicolon and end at the end of the line. 

Labels: Labels give an human readable name to the line. This name can then be used to reference the line at any point in the program. Labels must begin with a letter or underscore character and may contain alphanumeric and underscore character. Labels declarations must end with a colon. Label declarations occur when the label is located at the beginning of a line. Labels are referenced when they are not the first item in the line. If a label declaration occurs on a line by itself, it is associated with the following line.

Numbers: The assembler can accept numerical values in the following bases: 2, 8, 10, 16. These correspond to binary, octal, decimal, and hexadecimal. Binary values must begin with “0b”; a zero followed by a lower case “B”. Octal values must begin with “0c”, and hexadecimal values must begin with “0x”. Decimal numbers are written normally. Note that all values must be integers. Real numbers are not supported. 

Here are some examples of numerical values:

  • 0b11101011 – binary
  • 0c47 – octal
  • 0xfecd – hexadecimal
  • 65535 – decimal
  • 0fae – not allowed
  • 34.92 – not allowed

Style: Assembly language code typically follows a simple coding style. Below is an example of a typical piece of Sweet16-GP assembly code:

;
; Test for BRN
; On Entry: 
; On Exit: ACC = 0x8000, R1 = 0x0010, R2 = 0x01, PC = 0x13, N and V flags Set
;
start:  SET     R0,     0x7FF0      ; Load ACC with 0xFFF0
        SET     R2,     0x0001      ; Load R2 with 0x1F00
        SET     R1,     0x0000      ; Load R1 with 0x00
        SET     R3,     0x0000
calc:   ADD     R2                  ; ADD R2 to ACC and place results in ACC
        INC     R1                  ; Count how many times we Add R2 to ACC (R0)
        BRV     end                 ; Branch to HALT instruction
        BRA     calc                ; Data block FB
end:    HALT                        ; STOP Execution

As you can see, there is a comment block at the top that documents the code. It is suggested that each block of code be documented this way. Always include the conditions that are expected on entry into the code, and the exit state. List any modified registers, stack pointers, or memory locations that are left in an alter state by the program. 

Declare your line labels at the beginning of the line. Make then human readable. If you need long labels place them above the line it should be associated with. Always end a label declaration with a colon. Here are some examples of label declarations:

  • start:
  • end:
  • _loop_01:
  • _cond_true:
  • _cond_false:
  • loop_iter:

Instructions should all be placed in a single column after the label declarations as show above. White space should be used to place the instruction operands in a single column after the instructions. Finally, comments should all line up into a single column. There is nothing in the assembler to enforce this columnization.  However, you’ll find your code much easier to read and groc if you follow these guidelines.

Notes: Many assemblers allow for the operands to be mathematical expressions. They also, typically offer special features like a characters that reference the program counter at the point in the program at which the special character is used. The Sweet16-GP assembler does not include such features. Perhaps a later version will add these feature. However, for this project I wanted to maintain simplicity. Lastly, the assembler also does not support macros or predefined routines. 

Assemblers are unique in the parser/translator world in that their source (assembly) language has a nearly one-to-one relationship with its output language (machine code). This fact makes it simple to create basic assemblers using crude pattern matching parsers. Often regular expressions are employed to recognize the various language constructs.

The only difficulty encounter is usually label references. Since labels can be referenced before they are declared. This, however, can be dealt with by breaking the parser into two parts and allowing each part to separately scan the source code.

Our assembler will use a simple two-pass process in which the first pass (scan of the source code) creates a simple intermediate representation. The second pass will take the results of the first pass and convert it to actual machine code. We will also use a table of opcodes and addressing modes to help us build our intermediate code.

The first thing we will need is a simple framework with a line of code for testing. Let’s get that setup. Our assembler will expect our assembly code in an interrable form as it will work line-by-line. So we’ll need to pass our code as a list of lines of assembly.

"""Sweet16-GP Assembler"""

if __name__ == '__main__':
    text = """
        ;This is a comment
        start:  SET  ACC, 0xFFDE  ; This is also a comment
        end:    HALT  ; end of program
    """
    lines = text.split('\n')

Next. to make parsing easier we will write a simple routine to remove comments from our assembly source code.

def strip_lines(self, lines):
    """ Takes a sequence of lines and strips comments and blank lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        yield line

As you can see we simply loop over the lines in our text and if we find the semicolon character that begins a comment, we remove everything from the semicolon to the end of the line. Next, we call strip() on the line to remove all leading and trailing whitespace, then we yield the line to the caller.

If you’re not familiar with the yield keyword in python 3 I suggest you lookup python 3 generators for a detailed explanation. In a nutshell, however, yield basically stops the loop on each iteration and returns the currently processed line to the caller. When the caller returns back to the strip_lines() method, it will re-enter the method with the state that existed the last time yield was called. In other words, the method remembers where it left off in the loop and starts from there on subsequent calls. You can learn more about the yield keyword here.

It’s time to test this. Change the code to read:

def strip_lines(lines):
    """ Takes a sequence of lines and strips comments and blank lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        yield line


if __name__ == '__main__':
    text = """
        ;This is a comment
        start:  SET  ACC, 0xFFDE  ; This is also a comment
        end:    HALT  ; end of program
    """
    lines = text.split('\n')

    # Remove comments and blank lines
    for line in strip_lines(lines):
        print(line)

If you run this code now you should get something like the following output:

start:  SET  ACC, 0xFFDE
end:    HALT

Process finished with exit code 0

Not bad for the few minutes it took us to write that. We can now take in assembly code and clean it of blank lines and comments. 

Now we have one more step of preprocessing before we can move on to assembling our code. Recall that some registers can be referred to by name. For example, R0 is also known as the accumulator and is named ACC. This naming is great for reading and writing code but we can make assembly easier if we replace these friendly names with the canonical register name Rx where x is the numerical index into the register file. The best place to do this is within our strip_lines() method. However, to keep the code modular and easy to understand we’ll write a subroutine for this and make a call to it in strip_lines().

def replace_register_names(self, line):
    """ Replace register names with 'R' + register index.
        Example: ACC becomes R0. STATUS becomes R6."""
    line = line.replace('ACC', 'R0')
    line = line.replace('RETSTACK', 'R4')
    line = line.replace('COMP', 'R5')
    line = line.replace('STATUS', 'R6')
    line = line.replace('PC', 'R7')
    return line

As you can see we simply take a line of text and replace any register names found in the line with the R-value. Now let’s place the call to replace_register_names() in strip_lines().

def strip_lines(self, lines):
    """ Takes a sequence of lines and strips comments and blank lines.
        Also, calls replace_register_names on stripped lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        line = replace_register_names(line)
        yield line

Our new code now looks like this:

"""Sweet16-GP Assembler"""

def replace_register_names(line):
    """ Replace register names with 'R' + register index.
        Example: ACC becomes R0. STATUS becomes R6."""
    line = line.replace('ACC', 'R0')
    line = line.replace('RETSTACK', 'R4')
    line = line.replace('COMP', 'R5')
    line = line.replace('STATUS', 'R6')
    line = line.replace('PC', 'R7')
    return line

def strip_lines(lines):
    """ Takes a sequence of lines and strips comments and blank lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        line = replace_register_names(line)
        yield line


if __name__ == '__main__':
    text = """
        ;This is a comment
        start:  SET  ACC, 0xFFDE  ; This is also a comment
        end:    HALT  ; end of program
    """
    lines = text.split('\n')

If you run the program again, you will see that the register name ACC has been replaced by R0. Exactly what we want. Your output should look like the following:

start:  SET  R0, 0xFFDE
 end:    HALT

 Process finished with exit code 0

That’s it for today, next time we are going to write a method to take our cleaned code and parse it line by line. 

Sweet16 CPU Emulator

Part 4

I hope you played around with the emulator and tried to implement some of the CPU instructions yourself. As promised, in this post we will continue our work on the emulator. First, let me show you my implementation of decode_register_type_instr(). 

def decode_register_instr(self, opcode: int, reg_id: int):
    if opcode == 0x08:
        self.exec_set(opcode, reg_id)
    elif opcode == 0x10:
        self.exec_ld(opcode, reg_id)
    elif opcode == 0x18:
        self.exec_st(opcode, reg_id)
    elif opcode == 0x20:
        self.exec_ld_ea(opcode, reg_id)
    elif opcode == 0x28:
        self.exec_st_ea(opcode, reg_id)
    elif opcode == 0x30:
        self.exec_ldd_ea(opcode, reg_id)
    elif opcode == 0x38:
        self.exec_std_ea(opcode, reg_id)
    elif opcode == 0x40:
        self.exec_pop_ea(opcode, reg_id)
    elif opcode == 0x48:
        self.exec_stp_ea(opcode, reg_id)
    elif opcode == 0x50:
        self.exec_add(opcode, reg_id)
    elif opcode == 0x58:
        self.exec_sub(opcode, reg_id)
    elif opcode == 0x60:
        self.exec_mul(opcode, reg_id)
    elif opcode == 0x68:
        self.exec_div(opcode, reg_id)
    elif opcode == 0x70:
        self.exec_and(opcode, reg_id)
    elif opcode == 0x78:
        self.exec_or(opcode, reg_id)
    elif opcode == 0x80:
        self.exec_xor(opcode, reg_id)
    elif opcode == 0x88:
        self.exec_not(opcode, reg_id)
    elif opcode == 0x90:
        self.exec_shl(opcode, reg_id)
    elif opcode == 0x98:
        self.exec_shr(opcode, reg_id)
    elif opcode == 0xA0:
        self.exec_rol(opcode, reg_id)
    elif opcode == 0xA8:
        self.exec_ror(opcode, reg_id)
    elif opcode == 0xE0:
        self.exec_popd_ea(opcode, reg_id)
    elif opcode == 0xE8:
        self.exec_cpr(opcode, reg_id)
    elif opcode == 0xF0:
        self.exec_inc(opcode, reg_id)
    elif opcode == 0xF8:
        self.exec_dec(opcode, reg_id)
    else:
        print(f'Unknown OpCode code {opcode}')
        self.halt_flag = True

As you can see, this is nothing but a long set of “if”, “elif” statements checking for a particular opcode and then dispatching to the exec_xxx method. Finally, if we reach the “else” clause then we have an error. So we print a message and halt. Pretty simple right. The harder part is in the exec_xxx methods. But, only slightly harder. We spent some time building up a set of housekeeping methods that do all the heavy lifting for us. So we only need to call a subset of them passing the correct values to implement any of the exec_xxx methods. 

I’ll present the code for each of the register type instructions below with a short description of what it does. We’ve already seen the code for SET and LD so I’ll skip those.

ST is our next instruction that needs implemented. The description says:

 

The contents of ACC are stored in Rn and the branch conditions are set to reflect the new value of Rn. The carry flag is cleared, and the contents of ACC are left undisturbed.

So this just loads the value in ACC (R0) into Rn. Then it sets the flags to reflect the new value of Rn. Let’s see how we might do this:

def exec_st(self, opcode: int, reg_id: int):
    val = self.get_register(self.ACC)
    self.set_register(reg_id, val)
    self.set_value_flags(val)
    self.clear_flags('C')

Moving on, next we need to handle LD @Rn. Any instruction that uses the @ symbol is using indirect addressing. This means the register Rn is storing an address that we will either write new content into or read the content from. For example, let’s say that R3 contains the value 0x0A (10 in decimal) and we want to get the value located at address 0x0A. We could use LD @R3. R3 contains the memory address we want to get the value from. This shouldn’t be too difficult either:

def exec_ld_ea(self, opcode: int, reg_id: int):
    val = self.peek_byte(self.get_register(reg_id))
    self.inc_register(reg_id)
    self.set_register(self.ACC, val)
    self.set_value_flags(val)
    self.clear_flags('C')

I’ve appended _ea to all instructions that use indirect or effective addressing modes. 

Our housekeeping methods make implementation quick and simple. Now on to ST @Rn. 

As you might have guessed, ST @Rn simply stores the low byte value in the ACC into the address pointed to by Rn and increments Rn. Here’s the official description.

The low order byte of ACC is stored into memory location whose address is resides in Rn. Branch conditions are set to reflect the two byte contents of ACC (R0). The carry flag is cleared and Rn is incremented by 1.

Here’s the code:

def exec_st_ea(self, opcode: int, reg_id: int):
    val = self.get_register_lsb(self.ACC)
    self.poke_byte(self.get_register(reg_id), val)
    self.inc_register(reg_id)
    self.set_value_flags(val)
    self.clear_flags('C')

Most of the register instructions come in two forms, byte and word and are distinguished by the trailing “D” for the word sized version. For example LD is load a byte and LDD is load a word. Respectively, ST is store a byte and STD is store a word. 

Our nest set of instructions are just word versions of their byte sized versions we’ve already dealt with. Their main difference is the need to fetch two bytes and combine them together to form the correct word value. So we’ll just present the code and leave it to you to compare the two versions.

def exec_ldd_ea(self, opcode: int, reg_id: int):
    val = self.peek_word(self.get_register(reg_id))
    self.inc_register(reg_id)
    self.inc_register(reg_id)
    self.set_register(self.ACC, val)
    self.set_value_flags(val)
    self.clear_flags('C')

def exec_std_ea(self, opcode: int, reg_id: int):
    val = self.get_register(self.ACC)
    self.poke_word(self.get_register(reg_id), val)
    self.inc_register(reg_id)
    self.inc_register(reg_id)
    self.set_value_flags(val)
    self.clear_flags('C')

Our next instruction is the POP @Rn instruction. It’s description is:

Rn is decremented by 1. Then the low order byte of ACC (R0) is loaded from the memory locations whose address resides in (the decremented) Rn. The high order byte of ACC is cleared and the status bits are set to reflect the final value of ACC whose content will always be positive. The carry flag is cleared. Because Rn is decremented before prior to loading the ACC, single byte stacks can be implemented using ST @Rn and POP @Rn.

And here’s the code to implement it:

def exec_pop_ea(self, opcode: int, reg_id: int):
    self.dec_register(reg_id)
    val = self.peek_byte(self.get_register(reg_id))
    self.set_register(self.ACC, val)
    self.set_value_flags(val)
    self.clear_flags('C')

STP is a bit unique in that it has a special purpose. It is used for moving memory. It’s a store instruction with indirect addressing followed by a decrement of the pointer value. Here’s the description:

Rn is decremented by 1 and the low order byte of ACC (R0) is stored in to the memory location that resides in Rn. Status bits are set to reflect the final 16 bit value of ACC. The contents of ACC are left undisturbed. STP @Rn and POP @Rn can be used together to move blocks of memory beginning with the greatest address and working downward. 

And the code:

def exec_stp_ea(self, opcode: int, reg_id: int):
    lo = self.get_register_lsb(self.ACC)
    self.dec_register(reg_id)
    self.poke_byte(self.get_register(reg_id), lo)
    self.set_value_flags(lo)
    self.clear_flags('C')

The rest of the register type instructions are pretty easy to figure out. So in the interest of saving time and space I’ll just show the code below. It’s important, however, that you read them and see how they are implemented. Truly, if you read just the first couple you’ll understand them all. So here is the code for the remaining register type instructions:

def exec_add(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    isum = v1 + v2
    self.set_value_flags(isum)
    self.set_overflow_flags(v1, v2, isum)
    self.set_carry(v1, v2, isum)
    val = isum & 0xffff
    self.set_register(self.ACC, val)

def exec_sub(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    isum = v1 - v2
    self.set_value_flags(isum)
    self.set_overflow_flags(v1, v2, isum)
    self.set_borrow(v1, v2)
    val = isum & 0xffff
    self.set_register(self.ACC, val)

def exec_mul(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    isum = v1 * v2
    self.set_value_flags(isum)
    self.set_overflow_flags(v1, v2, isum)
    val = isum & 0xffff
    self.set_register(self.ACC, val)

def exec_div(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    isum = v1 // v2
    self.set_value_flags(isum)
    self.set_overflow_flags(v1, v2, isum)
    val = isum & 0xffff
    self.set_register(self.ACC, val)

def exec_and(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    val = (v1 & v2) & 0xffff
    self.set_register(self.ACC, val)
    self.set_value_flags(val)

def exec_or(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    val = (v1 | v2) & 0xffff
    self.set_register(self.ACC, val)
    self.set_value_flags(val)

def exec_xor(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    val = (v1 ^ v2) & 0xffff
    self.set_register(self.ACC, val)
    self.set_value_flags(val)

def exec_not(self, opcode: int, reg_id: int):
    val = ~self.get_register(reg_id)
    self.set_register(reg_id, val)
    self.set_value_flags(val)
    self.clear_flags('C')

def exec_shl(self, opcode: int, reg_id: int):
    val = self.get_register(reg_id) << 1
    self.set_value_flags(val)
    if ((val & 0x10000) >> 16) == 1:
        self.set_flags('C')
    else:
        self.clear_flags('C')
    self.set_register(reg_id, val)

def exec_shr(self, opcode: int, reg_id: int):
    v1 = self.get_register(reg_id)
    val = v1 >> 1
    self.set_value_flags(val)
    if (v1 & 0x1) == 1:
        self.set_flags('C')
    else:
        self.clear_flags('C')
    self.set_register(reg_id, val)

def exec_rol(self, opcode: int, reg_id: int):
    val = self.get_register(reg_id) << 1
    if ((val & 0x10000) >> 16) == 1:
        val |= 0x1
    self.set_value_flags(val)
    self.clear_flags('C')
    self.set_register(reg_id, val)

def exec_ror(self, opcode: int, reg_id: int):
    v1 = self.get_register(reg_id)
    val = v1 >> 1
    if (val & 0x1) == 1:
        val |= 0b1000_0000_0000_0000
    val = val & 0xffff
    self.set_value_flags(val)
    self.clear_flags('C')
    self.set_register(reg_id, val)

def exec_popd_ea(self, opcode: int, reg_id: int):
    self.dec_register(reg_id)
    hi = self.peek_byte(self.get_register(reg_id))
    self.dec_register(reg_id)
    lo = self.peek_byte(self.get_register(reg_id))
    val = (hi << 8) + lo
    self.set_register(self.ACC, val)
    self.set_value_flags(val)
    self.clear_flags('C')

def exec_cpr(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    val = (v1 & 0xFFFF) - (v2 & 0xFFFF)
    self.set_value_flags(val)
    self.set_borrow(v1, v2)
    val = val & 0xFFFF
    self.set_register(self.COMP, val)

def exec_inc(self, opcode: int, reg_id: int):
    self.inc_register(reg_id)
    val = self.get_register(reg_id)
    self.set_value_flags(val)
    self.clear_flags('C')

def exec_dec(self, opcode: int, reg_id: int):
    self.dec_register(reg_id)
    val = self.get_register(reg_id)
    self.set_value_flags(val)
    self.clear_flags('C')

Ok, include this code into your skeleton and try some of the instructions out. Remember to write tests for each of them. Both succeeding and failing tests. 

I’ll end this post here today. Next time we’ll implement the remaining 6 or 7 instruction (which are all non register types) and gain a working emulator. 

For completeness, here’s the complete sweet16gp.py code for this post:

"""
 Sweet16GP an implementation of an 8/16 bit CPU Emulator
 based on Steve Wozniak's SWEET16 virtual machine.
"""

from time import sleep, time
import status_bits


class Sweet16GP:
    MEM = []
    REGFILE = [0, 0, 0, 0, 0, 0, 0, 0]
    ACC = 0
    RETSTACK = 4
    COMP = 5
    STATUS = 6
    PC = 7
    cur_instr = 0
    halt_flag = False

    def __init__(self):
        self.cold_boot()

    # booting routines
    def cold_boot(self, mem_size=256):
        self.init_ram(mem_size)
        self.warm_boot()

    def warm_boot(self):
        self.init_regfile()

    # Memory handling routines
    def init_ram(self, mem_size=265):
        self.MEM = []
        for addr in range(mem_size):
            self.MEM.append(0x00)

    def poke_byte(self, addr: int, val: int):
        self.MEM[addr] = val & 0xff

    def peek_byte(self, addr: int) -> int:
        return self.MEM[addr] & 0xff

    def poke_word(self, addr: int, val: int):
        self.poke_byte(addr, (val & 0xff))
        self.poke_byte(addr+1, (val & 0xff00) >> 8)

    def peek_word(self, addr: int) -> int:
        lo = self.peek_byte(addr)
        hi = self.peek_byte(addr+1)
        val = ((hi << 8) + lo) & 0xffff
        return (hi << 8) + lo

    # Register handling routines
    def init_regfile(self):
        for id in range(len(self.REGFILE)):
            self.REGFILE[id] = 0x00

    def get_register(self, id: int) -> int:
        return self.REGFILE[id] & 0xffff

    def get_register_lsb(self, id: int) -> int:
        return self.REGFILE[id] & 0xff

    def get_register_msb(self, id: int) -> int:
        return (self.REGFILE[id] & 0xff00) >> 8

    def set_register(self, id: int, val: int):
        self.REGFILE[id] = val & 0xffff

    def set_register_lsb(self, id: int, val: int):
        hi = self.REGFILE[id] & 0xff00
        self.REGFILE[id] = (hi | (val & 0xff)) & 0xffff

    def set_register_msb(self, id: int, val: int):
        lo = self.REGFILE[id] & 0xff
        self.REGFILE[id] = ((val & 0xff) << 8) | lo

    def inc_register(self, _id: int):
        val = self.get_register(_id)
        val = (val + 1) & 0xffff
        self.set_register(_id, val)

    def dec_register(self, id: int):
        val = self.get_register(id)
        val = (val - 1) & 0xffff
        self.set_register(id, val)

    # Memory handling routines
    def init_ram(self, mem_size=265):
        self.MEM = []
        for addr in range(mem_size):
            self.MEM.append(0x00)

    def poke_byte(self, addr: int, val: int):
        self.MEM[addr] = val & 0xff

    def peek_byte(self, addr: int) -> int:
        return self.MEM[addr] & 0xff

    def poke_word(self, addr: int, val: int):
        self.poke_byte(addr, (val & 0xff))
        self.poke_byte(addr + 1, (val & 0xff00) >> 8)

    def peek_word(self, addr: int) -> int:
        lo = self.peek_byte(addr)
        hi = self.peek_byte(addr + 1)
        val = ((hi << 8) + lo) & 0xffff
        return (hi << 8) + lo

    # Bit twiddling routines
    def bit_set(self, val: int, bit: int) -> int:
        return val | (1 << bit)

    def bit_clear(self, val: int, bit: int) -> int:
        return val & ~(1 << bit)

    def bit_toggle(self, val: int, bit: int) -> int:
        return val ^ (1 << bit)

    def bit_test(self, val: int, bit: int) -> int:
        return (val & (1 << bit)) >> bit

    # STATUS Flags routines
    def set_flags(self, flags: str):
        flags = flags.upper()
        if flags.__contains__('C'):
            self.REGFILE[self.STATUS] |= (1 << status_bits.C)
        if flags.__contains__('Z'):
            self.REGFILE[self.STATUS] |= (1 << status_bits.Z)
        if flags.__contains__('V'):
            self.REGFILE[self.STATUS] |= (1 << status_bits.V)
        if flags.__contains__('N'):
            self.REGFILE[self.STATUS] |= (1 << status_bits.N)

    def clear_flags(self, flags: str):
        flags = flags.upper()
        if flags.__contains__('C'):
            self.REGFILE[self.STATUS] &= ~(1 << status_bits.C)
        if flags.__contains__('Z'):
            self.REGFILE[self.STATUS] &= ~(1 << status_bits.Z)
        if flags.__contains__('V'):
            self.REGFILE[self.STATUS] &= ~(1 << status_bits.V)
        if flags.__contains__('N'):
            self.REGFILE[self.STATUS] &= ~(1 << status_bits.N)

    def test_flag(self, flag: str) -> int:
        flags = flag.upper()
        if flags.__contains__('C'):
            return (self.REGFILE[self.STATUS] & (1 << status_bits.C)) >> status_bits.C
        if flags.__contains__('Z'):
            return (self.REGFILE[self.STATUS] & (1 << status_bits.Z)) >> status_bits.Z
        if flags.__contains__('V'):
            return (self.REGFILE[self.STATUS] & (1 << status_bits.V)) >> status_bits.V
        if flags.__contains__('N'):
            return (self.REGFILE[self.STATUS] & (1 << status_bits.N)) >> status_bits.N

    def set_value_flags(self, val: int):
        # Note we can't set the overflow
        # flag here as we need both input
        # and result values to computer
        # overflow.
        if val == 0:
            self.set_flags('Z')
        else:
            self.clear_flags('Z')

        if val > 0b0111_1111_1111_1111:
            self.set_flags('N')
        else:
            self.clear_flags('N')

        if (val & 0x10000) >> 17:
            self.set_flags('C')
        else:
            self.clear_flags('C')

    def set_overflow_flags(self, v1: int, v2: int, result: int):
        if ((v1 & (1 << 15)) >> 15 & (v2 & (1 << 15)) >> 15) == 1:
            # Both values have the sign bit set
            # So the result should have the sign
            # bit unset. If not, we have overflow
            if not (result & (1 << 15)) > 0:
                self.set_flags('V')
            else:
                self.clear_flags('V')
        elif ((v1 & (1 << 15)) >> 15 | (v2 & (1 << 15)) >> 15) == 0:
            # Both values have the sign bits unset
            # So, the result should have the sign
            # bit unset.
            if ((result & (1 << 15)) >> 15) == 1:
                self.set_flags('V')
            else:
                self.clear_flags('V')

    def set_carry(self, v1, v2, result):
        # value must be unmasked or it won't include the 17th bit.
        if ((result & (1 << 16)) >> 16) == 1:
            self.set_flags('C')
        else:
            self.clear_flags('C')

    def set_borrow(self, v1, v2):
        if v1 < v2:
            self.set_flags('C')
        else:
            self.clear_flags('C')

    # Instruction Decoding
    def decode_inst(self, instr: int):
        if instr > 7:
            opcode = instr & 0b11111000
            reg_id = instr & 0b00000111
            self.decode_register_instr(opcode, reg_id)
        else:
            self.decode_non_register_instr(instr)

    def decode_register_instr(self, opcode: int, reg_id: int):
        if opcode == 0x08:
            self.exec_set(opcode, reg_id)
        elif opcode == 0x10:
            self.exec_ld(opcode, reg_id)
        elif opcode == 0x18:
            self.exec_st(opcode, reg_id)
        elif opcode == 0x20:
            self.exec_ld_ea(opcode, reg_id)
        elif opcode == 0x28:
            self.exec_st_ea(opcode, reg_id)
        elif opcode == 0x30:
            self.exec_ldd_ea(opcode, reg_id)
        elif opcode == 0x38:
            self.exec_std_ea(opcode, reg_id)
        elif opcode == 0x40:
            self.exec_pop_ea(opcode, reg_id)
        elif opcode == 0x48:
            self.exec_stp_ea(opcode, reg_id)
        elif opcode == 0x50:
            self.exec_add(opcode, reg_id)
        elif opcode == 0x58:
            self.exec_sub(opcode, reg_id)
        elif opcode == 0x60:
            self.exec_mul(opcode, reg_id)
        elif opcode == 0x68:
            self.exec_div(opcode, reg_id)
        elif opcode == 0x70:
            self.exec_and(opcode, reg_id)
        elif opcode == 0x78:
            self.exec_or(opcode, reg_id)
        elif opcode == 0x80:
            self.exec_xor(opcode, reg_id)
        elif opcode == 0x88:
            self.exec_not(opcode, reg_id)
        elif opcode == 0x90:
            self.exec_shl(opcode, reg_id)
        elif opcode == 0x98:
            self.exec_shr(opcode, reg_id)
        elif opcode == 0xA0:
            self.exec_rol(opcode, reg_id)
        elif opcode == 0xA8:
            self.exec_ror(opcode, reg_id)
        elif opcode == 0xE0:
            self.exec_popd_ea(opcode, reg_id)
        elif opcode == 0xE8:
            self.exec_cpr(opcode, reg_id)
        elif opcode == 0xF0:
            self.exec_inc(opcode, reg_id)
        elif opcode == 0xF8:
            self.exec_dec(opcode, reg_id)
        else:
            print(f'Unknown OpCode code {opcode}')
            self.halt_flag = True

    def exec_set(self, opcode: int, reg_id: int):
        self.inc_register(self.PC)
        lo = self.peek_byte(self.get_register(self.PC))
        self.inc_register(self.PC)
        hi = self.peek_byte(self.get_register(self.PC))
        val = (hi << 8) + lo
        self.set_register(reg_id, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_ld(self, opcode: int, reg_id: int):
        val = self.get_register(reg_id)
        self.set_register(self.ACC, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_st(self, opcode: int, reg_id: int):
        val = self.get_register(self.ACC)
        self.set_register(reg_id, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_ld_ea(self, opcode: int, reg_id: int):
        val = self.peek_byte(self.get_register(reg_id))
        self.inc_register(reg_id)
        self.set_register(self.ACC, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_st_ea(self, opcode: int, reg_id: int):
        val = self.get_register_lsb(self.ACC)
        self.poke_byte(self.get_register(reg_id), val)
        self.inc_register(reg_id)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_ldd_ea(self, opcode: int, reg_id: int):
        val = self.peek_word(self.get_register(reg_id))
        self.inc_register(reg_id)
        self.inc_register(reg_id)
        self.set_register(self.ACC, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_std_ea(self, opcode: int, reg_id: int):
        val = self.get_register(self.ACC)
        self.poke_word(self.get_register(reg_id), val)
        self.inc_register(reg_id)
        self.inc_register(reg_id)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_pop_ea(self, opcode: int, reg_id: int):
        self.dec_register(reg_id)
        val = self.peek_byte(self.get_register(reg_id))
        self.set_register(self.ACC, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_stp_ea(self, opcode: int, reg_id: int):
        lo = self.get_register_lsb(self.ACC)
        self.dec_register(reg_id)
        self.poke_byte(self.get_register(reg_id), lo)
        self.set_value_flags(lo)
        self.clear_flags('C')

    def exec_add(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        isum = v1 + v2
        self.set_value_flags(isum)
        self.set_overflow_flags(v1, v2, isum)
        self.set_carry(v1, v2, isum)
        val = isum & 0xffff
        self.set_register(self.ACC, val)

    def exec_sub(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        isum = v1 - v2
        self.set_value_flags(isum)
        self.set_overflow_flags(v1, v2, isum)
        self.set_borrow(v1, v2)
        val = isum & 0xffff
        self.set_register(self.ACC, val)

    def exec_mul(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        isum = v1 * v2
        self.set_value_flags(isum)
        self.set_overflow_flags(v1, v2, isum)
        val = isum & 0xffff
        self.set_register(self.ACC, val)

    def exec_div(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        isum = v1 // v2
        self.set_value_flags(isum)
        self.set_overflow_flags(v1, v2, isum)
        val = isum & 0xffff
        self.set_register(self.ACC, val)

    def exec_and(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        val = (v1 & v2) & 0xffff
        self.set_register(self.ACC, val)
        self.set_value_flags(val)

    def exec_or(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        val = (v1 | v2) & 0xffff
        self.set_register(self.ACC, val)
        self.set_value_flags(val)

    def exec_xor(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        val = (v1 ^ v2) & 0xffff
        self.set_register(self.ACC, val)
        self.set_value_flags(val)

    def exec_not(self, opcode: int, reg_id: int):
        val = ~self.get_register(reg_id)
        self.set_register(reg_id, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_shl(self, opcode: int, reg_id: int):
        val = self.get_register(reg_id) << 1
        self.set_value_flags(val)
        if ((val & 0x10000) >> 16) == 1:
            self.set_flags('C')
        else:
            self.clear_flags('C')
        self.set_register(reg_id, val)

    def exec_shr(self, opcode: int, reg_id: int):
        v1 = self.get_register(reg_id)
        val = v1 >> 1
        self.set_value_flags(val)
        if (v1 & 0x1) == 1:
            self.set_flags('C')
        else:
            self.clear_flags('C')
        self.set_register(reg_id, val)

    def exec_rol(self, opcode: int, reg_id: int):
        val = self.get_register(reg_id) << 1
        if ((val & 0x10000) >> 16) == 1:
            val |= 0x1
        self.set_value_flags(val)
        self.clear_flags('C')
        self.set_register(reg_id, val)

    def exec_ror(self, opcode: int, reg_id: int):
        v1 = self.get_register(reg_id)
        val = v1 >> 1
        if (val & 0x1) == 1:
            val |= 0b1000_0000_0000_0000
        val = val & 0xffff
        self.set_value_flags(val)
        self.clear_flags('C')
        self.set_register(reg_id, val)

    def exec_popd_ea(self, opcode: int, reg_id: int):
        self.dec_register(reg_id)
        hi = self.peek_byte(self.get_register(reg_id))
        self.dec_register(reg_id)
        lo = self.peek_byte(self.get_register(reg_id))
        val = (hi << 8) + lo
        self.set_register(self.ACC, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_cpr(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        val = (v1 & 0xFFFF) - (v2 & 0xFFFF)
        self.set_value_flags(val)
        self.set_borrow(v1, v2)
        val = val & 0xFFFF
        self.set_register(self.COMP, val)

    def exec_inc(self, opcode: int, reg_id: int):
        self.inc_register(reg_id)
        val = self.get_register(reg_id)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_dec(self, opcode: int, reg_id: int):
        self.dec_register(reg_id)
        val = self.get_register(reg_id)
        self.set_value_flags(val)
        self.clear_flags('C')

    def decode_non_register_instr(self, opcode: int):
        if opcode == 0x00:
            self.exec_halt()

    def exec_halt(self):
        self.halt_flag = True

    # Misc Methods
    def dump(self):
        print('\n\nSweet16-GP CPU')
        print('---------------------------------------------------------')
        self.dump_status()
        self.dump_registers()
        self.dump_memory()

    def dump_status(self):
        result = ''
        status = self.REGFILE[self.STATUS]
        # print('Status 0b{:016b}'.format(status))
        if self.bit_test(status, status_bits.C):
            result += 'C, '
        if self.bit_test(status, status_bits.Z):
            result += 'Z, '
        if self.bit_test(status, status_bits.V):
            result += 'V, '
        if self.bit_test(status, status_bits.N):
            result += 'N, '
        result = result[:-2]
        print('Status Flags: 0b{:04b}'.format(self.REGFILE[self.STATUS]))
        print('---------------------------------------------------------')
        print(str(result) + ' \n')

    def dump_registers(self):
        idx = 0
        col = 0
        max_cols = 4
        print('Registers')
        print('---------------------------------------------------------')
        for r in self.REGFILE:
            if idx == 0:
                print('ACC (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            elif idx == 4:
                print('RETPTR (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            elif idx == 5:
                print('COMPARE (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            elif idx == 6:
                print('STATUS (R{:02}):0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            elif idx == 7:
                print('PC (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            else:
                # General purpose register
                print('R{:02}: 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            idx += 1
            col += 1
            if col >= max_cols:
                col = 0
                print()
        print()

    def dump_memory(self, max_bytes=256):
        col = 0
        row = 0

        max_cols = 16
        max_rows = int(max_bytes / max_cols)
        idx = 0
        print('\nAddr   Data                                  MEMORY DUMP')
        print('------------------------------------------------------------------------------------------------------')
        for b in self.MEM:
            if col == 0:
                print('0x{:04x}'.format(idx), end=':  ')
                print('0x{:02x}'.format(b), end=', ')
            else:
                print('0x{:02x}'.format(b), end=', ')
            col += 1
            if col >= max_cols:
                row += 1
                col = 0
                print()
            if row >= max_rows:
                print()
                break;
            idx += 1
        print()

    # CPU control routines
    def run(self):
        while not self.halt_flag and self.get_register(self.PC) < len(self.MEM):
            self.cur_instr = self.peek_byte(self.get_register(self.PC))
            print(f'PC: {hex(self.get_register(self.PC))}, cur_instr: {hex(self.cur_instr)}')
            self.decode_inst(self.cur_instr)
            sleep(1)
            self.inc_register(self.PC)
            

def main():
    cpu = Sweet16GP()
    cpu.set_register(0, 0xffae)
    cpu.poke_byte(0, 0x08)
    cpu.poke_byte(1, 0x05)
    cpu.poke_byte(2, 0xff)
    cpu.run()
    cpu.dump()

if __name__ == '__main__':
    # This method is for simple testing
    # and demonstration purposes.
    main()

Sweet16 CPU Emulator

Part 2

Last time I presented the Sweet16-GP and told you I would present an emulated version of the processor. That’s what I intend to do, so let’s get started.

First, what is a software CPU emulator. A software emulator is software that runs on a computer (called the host) and enables that computer to behave like another computer (called the guest). An emulator typically enables the host system to run software or use peripheral devices designed for the guest system. In our case, our emulator will enable our system to run software written for the Sweet16-GP on any computer capable of running python 3.x. In the near future I may rewrite this emulator in PHP and host it so that students it in my workshops can use it online. But for now, I’ll present the python 3 version. 

So why would we want an emulator? Why not just build the hardware? Having an emulator will give us a reference design for the hardware. It will allow us to work out issues that might arise in our instruction set and our cpu design concept. Additionally, it will allow software to be developed while the hardware is being developed. This means that we can have some working software to test the hardware on it’s first boot up.

So what do we need to create an emulator? Well, we need the same things a real system will need. First, we need a CPU. The CPU consists of the REGISTER FILE, ALU, and CONTROL LOGIC. In addition we need a place to store our code and data (memory). Memory is simple. We can use an array or list. The ALU consists mainly of logic routines, the register file is again just a list. Then we need a few helper functions.  Writing an emulator sounds like it would be really hard. However, once you see the code you realize it’s not difficult at all.

Let’s start by defining a CPU class for the Sweet16-GP.

class Sweet16GP:

def __init__(self):
pass

The first thing our CPU will do is boot. The initial boot up that occurs when power is first applied is called a cold boot. To preform the cold boot we’ll simply write a method. We don’t have this method yet so let’s add it and call it from the __init__() method. 

class Sweet16GP:

def __init__(self):
self.cold_boot()

# booting routines
def cold_boot(self, mem_size=256):
pass

Ok, that’s pretty simple. Now our CPU doesn’t do anything yet and it’s still lacking important parts. First, we need some RAM and our Register File. So let’s give our little CPU 16 bytes of RAM and create our Register File to store our registers R0 – R7. We will implement both the register file and RAM as a simple python list. We’ll also include some constants to allow us to refer to the registers by name (ACC, RETSTACK, COMP, STATUS, and PC).

class Sweet16GP:
MEM = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
REGFILE = [0, 0, 0, 0, 0, 0, 0, 0]
ACC = 0
RETSTACK = 4
COMP = 5
STATUS = 6
PC = 7

def __init__(self):
self.cold_boot()

# booting routines
def cold_boot(self, mem_size=256):
pass

Now it’s starting to take shape. Recall from part 1 that our accumulator is register 0, the subroutine return stack pointer is register 4, the compare results register is register 5, the status register is register 6 and the program counter is register 7. We define constants within the class for these values so we can refer to these registers by name when it is desirable to do so.

So now we have 16 bytes of ram we can place instructions in, and our 8 registers. Now we need a way to fetch the instructions we place in memory. For this we’ll need a flag variable to indicate if the CPU is in a halt state and a variable to store the current instruction.

class Sweet16GP:
MEM = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
REGFILE = [0, 0, 0, 0, 0, 0, 0, 0]
ACC = 0
RETSTACK = 4
COMP = 5
STATUS = 6
PC = 7
cur_instr = 0
halt_flag = False

def __init__(self):
self.cold_boot()

# booting routines
def cold_boot(self, mem_size=256):
pass


The halt_flag will be used to halt processor execution and cur_instr will be used to store our current instruction.The PC will be updated to point to the next instruction/data location.

Our next task is to write some code for fetching instructions from memory. We’ll call this the run method as it’s job will be to walk through memory grabbing the instructions pointed to be the PC. 

# Run methods
def run(self):
while not self.halt_flag and self.get_register(self.PC) < len(self.MEM):
self.cur_instr = self.peek_byte(self.get_register(self.PC))
print(f'PC: {hex(self.get_register(self.PC))}, cur_instr: {hex(self.cur_instr)}')
self.decode_inst(self.cur_instr)
sleep(1)
self.inc_register(self.PC)

Now this method is making use of methods we don’t have but will soon need. I tend to code what I want and then implement it. Looking at the run method we can see we’ll need a few new methods. But before we talk about them let’s walk through what the run method is doing here.

Let’s see what we have so far:

First we enter a while loop and test if the halt flag is set and if the PC points to a valid memory location. Next, I included a print statement to display the value of the program counter and the current instruction. Then we make a call to decode_inst() passing this method the current instruction. decode_inst will be used to decode our instruction. Remember that our hardware processor includes an instruction decoder. We’ll be implementing all the parts of our hardware processor as method in our CPU class.

class Sweet16GP:
    MEM = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    REGFILE = [0, 0, 0, 0, 0, 0, 0, 0]
    ACC = 0
    RETSTACK = 4
    COMP = 5
    STATUS = 6
    PC = 7
    cur_instr = 0
    halt_flag = False

    def __init__(self):
        self.cold_boot()

    # booting routines
    def cold_boot(self, mem_size=256):
        pass

    # Register routines
    def self.get_register(self, reg):
        pass

    def inc_register(self, reg):
        pass

    # Memory access routines
    def peek_byte(self, addr):
        pass

    # Instruction decoding routines
    def decode_inst(self, instr):
        pass

    # CPU control routines
    def run(self):
        while not self.halt_flag and self.get_register(self.PC) < len(self.MEM):
            self.cur_instr = self.peek_byte(self.get_register(self.PC))
            print(f'PC: {hex(self.get_register(self.PC))}, cur_instr: {hex(self.cur_instr)}')
            self.decode_inst(self.cur_instr)
            sleep(1)
            self.inc_register(self.PC)

OK, at this point we can start to fill in a couple of these methods. But before we do there is an important issue we need to discuss. Real hardware has a finite number of bits used to store it’s contents. Python however, can store almost any value in our registers and memory. This isn’t good as it doesn’t represent a real CPU. We need a way to restrict the values stored in our memory and registers to only those values that can be stored in our hardware version of the CPU.

To accomplish this we will use python’s bitwise logical operators. If you’re not familiar with them, you may want to head on over to the python website and read up them: &, |, ~, <<, and >>. First, & is the bitwise logical AND operator. The pipe | character is the bitwise OR operator, and the tilde is the bitwise NOT operator. The double less-than is a bitwise shift left, and the double greater than is a bitwise shift right operator. If you don’t understand these completely and their typical uses, then read up on them.

We can use these operators to manipulate the values we store in our memory and register lists. If we & (and) the values we place in memory or registers with the values 0xff or 0xffff we can limit the values stored to 8 and 16 bits respectively. This is just what we need to restrict our values to a range we can store on our hardware. Let’s implement the get_register() method. This method simply needs to take in an integer value and store it in the proper location in our register file.

def get_register(self, id: int) -> int:
    return self.REGFILE[id] & 0xffff

Simple, isn’t it!

Now we’ve left a couple of things undone. Remember our cold_boot() method? Well it doesn’t do anything right now. In an real processor this would reset things like clocks and counters and put everything into a known state so we have a regular starting point. In our emulator we need it to do the same task. However, most systems also have what is called a warm boot. A warm boot occurs when you reset the system using the reset signal. This signal causes the CPU registers to be set into a known initial state. Just as the cold boot does. In real hardware there may be some differences. In our software version we will take a little bit of leeway here do something the real hardware won’t do. We will clear the ram (set it to all zeros values) in the cold boot and leave it untouched during a warm boot. This will help us in keeping track of what values in ram we’ve changed and make debugging easier. However, it’s important to document this deviation from true emulation of the real hardware which will not zero memory for us. So let’s write the code to initialize the memory to all zero values.

# Memory handling routines
def init_ram(self, mem_size=265):
    self.MEM = []
    for addr in range(mem_size):
        self.MEM.append(0x00)

We also need to fill in our peek_byte() method and while we’re at it, we should just complete our memory manipulation methods. We know we’re going to need to be able to look (peek) at memory addresses and see what value they contain. We will also need to be able to store (poke) values into an address location, and we should be able to do this with both 8 and 16 bit values. So that leaves us needing the following methods: peek_byte(), poke_byte(), peek_word(), and poke_word(). So our full memory manipulation repertoire now looks as follows: 

# Memory handling routines
def init_ram(self, mem_size=265):
    self.MEM = []
    for addr in range(mem_size):
        self.MEM.append(0x00)

def poke_byte(self, addr: int, val: int):
    self.MEM[addr] = val & 0xff

def peek_byte(self, addr: int) -> int:
    return self.MEM[addr] & 0xff

def poke_word(self, addr: int, val: int):
    self.poke_byte(addr, (val & 0xff))
    self.poke_byte(addr+1, (val & 0xff00) >> 8)

def peek_word(self, addr: int) -> int:
    lo = self.peek_byte(addr)
    hi = self.peek_byte(addr+1)
    val = ((hi << 8) + lo) & 0xffff
    return (hi << 8) + lo

For our boot process we will have __init__() call cold_boot() which will call our init_ram() method to clear memory. But we still need to implement a method to clear the register files. We will have cold_boot() call our warm_boot() method which will in turn call init_regfile() which will zero out all our registers. We could do this all in one method but it’s important to keep each step separate as when the hardware is built some aspects of this process may change, and we’ll want the structure in place to support those changes. I realize to some of you with a lot of programming experience you may be saying “Wait! We shouldn’t be trying to anticipate the future. We should just write the least amount of code to handle the current situation.” And in most cases this is true. However, experience has taught me that we want to call out each process our hardware may go through or down the road we may end up refactoring a lot of code. Also, it’s important from a teaching perspective to have each of these processes in place so those just learning can see each process represented. So let’s see what our boot code looks like now.

def __init__(self):
    self.cold_boot()

# booting routines
def cold_boot(self, mem_size=256):
    self.init_ram(mem_size)
    self.warm_boot()

def warm_boot(self):
    self.init_regfile()

# Register handling routines
def init_regfile(self):
    for id in range(len(self.REGFILE)):
        self.REGFILE[id] = 0x00

Ok, while we’re at it, we are going to need a handful of register manipulation methods similar to those for memory manipulation. Recall our registers are 16 bits wide but some instructions work on byte values. We will need to be able to read and write our registers in both word and byte modes. In byte mode we need to be able to read/write both the high and low order bytes in the registers.  We also need to be able to increment and decrement registers. Finally, don’t forget we need to restrict any value going into a register to it’s byte or word size. Below is our complete set of register methods:

# Register handling routines
def init_regfile(self):
    for id in range(len(self.REGFILE)):
        self.REGFILE[id] = 0x00

def get_register(self, id: int) -> int:
    return self.REGFILE[id] & 0xffff

def get_register_lsb(self, id: int) -> int:
    return self.REGFILE[id] & 0xff

def get_register_msb(self, id: int) -> int:
    return (self.REGFILE[id] & 0xff00) >> 8

def set_register(self, id: int, val: int):
    self.REGFILE[id] = val & 0xffff

def set_register_lsb(self, id: int, val: int):
    hi = self.REGFILE[id] & 0xff00
    self.REGFILE[id] = (hi | (val & 0xff)) & 0xffff

def set_register_msb(self, id: int, val: int):
    lo = self.REGFILE[id] & 0xff
    self.REGFILE[id] = ((val & 0xff) << 8) | lo

def inc_register(self, id: int):
    val = self.get_register(id)
    val = (val + 1) & 0xffff
    self.set_register(id, val)

def dec_register(self, id: int):
    val = self.get_register(id)
    val = (val - 1) & 0xffff
    self.set_register(id, val)

OK, all through this process I’ve been keeping something from you. I’ve presented code as if I have been magically writing each method and simply know instinctively that it works! Well, most of these methods did indeed work the first time I wrote them, but not all of them. What I have been leaving out is the testing that was done during development of each method. I am a big fan of Test Driven Development (TDD, however it isn’t for every project).  While developing the emulator, I wrote tests even before writing my methods. This allows you to slow down and think about what you need and how you’re going to implement it. 

I use “unittest” to write unit tests for each method. I test the methods with tests that should succeed and tests that should fail. I try to work out edge cases where there may be issues. Like what happens if I multiply two 16 bit values and place the results into a register without limiting the value to an 8 or 16 bit range as needed? Unit tests are very important in developing code that can be proven to work. However, remember your code is proven only as far as you test it. If you don’t test for a particular input, you can’t prove your code won’t fail given that input. Being human, we will rarely test for all possible values of input. In many cases this would be impractical and in some cases impossible. But we should do our best, within reason, to test each method in every class and every function in our code. I bring this up here as we are going to get quite a bit of code built up here and it’s best to know that your methods work properly before adding more complexity to the project. Below I show a couple test methods. However, I’d like you to write your own tests to test each method we have so far and each method I present in the future. Writing tests is a good habit you should get into doing for all your code.

def test_warm_boot(self):
    from sweet16gp import Sweet16GP
    cpu = Sweet16GP()

    cpu.REGFILE[cpu.ACC] = 0xffff
    cpu.REGFILE[cpu.RETSTACK] = 0xffff
    cpu.REGFILE[cpu.COMP] = 0xffff
    cpu.REGFILE[cpu.STATUS] = 0xffff
    cpu.REGFILE[cpu.PC] = 0xffff

    cpu.warm_boot()
    self.assertEqual(0, cpu.REGFILE[cpu.ACC], 'Failed to clear ACC register (R0)')
    self.assertEqual(0, cpu.REGFILE[cpu.RETSTACK], 'Failed to clear RETSTACK register (R4)')
    self.assertEqual(0, cpu.REGFILE[cpu.COMP], 'Failed to clear COMP register (R5)')
    self.assertEqual(0, cpu.REGFILE[cpu.STATUS], 'Failed to clear STATUS register (R6)')
    self.assertEqual(0, cpu.REGFILE[cpu.PC], 'Failed to clear PC register (R7)')

def test_init_regfile(self):
    from sweet16gp import Sweet16GP
    cpu = Sweet16GP()
    self.assertEqual(8, len(cpu.REGFILE), 'Failed to initialize the register file')

def test_get_register(self):
    from sweet16gp import Sweet16GP
    cpu = Sweet16GP()
    cpu.REGFILE[cpu.ACC] = 0xfafc
    self.assertEqual(0xfafc, cpu.get_register(cpu.ACC), 'Failed to get register ACC (R0)')

The above code is only a short snippet from my tests, which contain multiple tests for each method written. Testing your code this way will make development easier and it can also help people get up to speed with your code by providing some sample calls.  However, you shouldn’t take the tests as absolute proof the code is without error. It’s only proven as far as it is tested! 

Also, note that when possible I try to limit the use of additional methods to setup the pre-test conditions. For example, I directly set the values of registers in the test_warm_boot() method and then call warm_boot() which should reset them all to zero. Then I test with assert methods whether the registers were indeed reset to zero. If I had used the set_register() method to place the pre-test values into the registers and a test failed, I wouldn’t know if it was the set_register() method or the warm_boot()/init_register() methods that failed. However, once I have a good set of tests in place to prove that set_register() works as expected, I can then use it in later tests. This also brings up the point that if you are new to a code base, you should be careful about how you use test code to interpret how to use the code base. In production code you wouldn’t want to directly set values in the registers, as I did in the warm_boot() method above. You would instead want to call the set_register() method to do that for you and get the added bonus of being guaranteed that the value will be limited to the acceptable range of values for that call.

 

Ok, enough of the sidebar on testing. Let’s get back to implementation. If we look over the instructions in part 1 we can see some instructions will require bit manipulation, so we’re going to need a set of method for performing a set of common bit twiddling tasks. For example, given a value we will need methods to set, clear, toggle, and test a particular bit in the value. Let’s see what these look like:

# Bit twiddling routines
def bit_set(self, val: int, bit: int) -> int:
    return val | (1 << bit)

def bit_clear(self, val: int, bit: int) -> int:
    return val & ~(1 << bit)

def bit_toggle(self, val: int, bit: int) -> int:
    return val ^ (1 << bit)

def bit_test(self, val: int, bit: int) -> int:
    return (val & (1 << bit)) >> bit

Go ahead and write some tests for the above routines. Make sure you know what they do and how to use them. These routines mirror the processes that occur in real hardware so it is important to have a good grasp of bit manipulations and how to use them.


These routines lead me to a closely related subject. That of status bits. Recall from part 1 that the CPU contains a STATUS register. This register contains bits that act as flags for certain predefined conditions. These flags almost always represent the state or value of the accumulator (ACC, R0) register. However, they may also represent the results of a CPR instruction. The CPR (Compare) instruction places it’s results in the COMP register (R5) and sets the flags in the Status register to represent the results. The flags bits consist of the N flag which is set if the result of an operation is negative, the V flag which is set if the result of an operation overflows the range of values that can be stored in the machine, the C flag which is set when a carry or borrow occurs in an arithmetic operation, and the Z or zero flag which indicates the previous operation resulted in a zero value.

These flags are used to control program flow via branch instructions. The BEQ instructions is usually preceded with a CPR instruction which compares two values by subtracting one from the other. If the result is zero, the two values were equal and the branch occurs. The overflow and carry flags are used in mathematical operations to test if a result is valid. The N flag can be used to indicate if one value is less or greater than another. 

We need to implement methods to set and test these bits in the STATUS register. First, we need to define in which bit of the STATUS register each flags resides. To do this I created a new file called status_bits.py. It simply holds the definition of each flag’s bit position.

C: int = 0
Z: int = 1
V: int = 6
N: int = 7

As you can see, the two flags reside at each end of the low order byte of the STATUS register. Once we import this file into our main CPU file we will have access to these values. We’ll need these in our STATUS flag routines which follow:

# STATUS Flags routines
def set_flags(self, flags: str):
    flags = flags.upper()
    if flags.__contains__('C'):
        self.REGFILE[self.STATUS] |= (1 << status_bits.C)
    if flags.__contains__('Z'):
        self.REGFILE[self.STATUS] |= (1 << status_bits.Z)
    if flags.__contains__('V'):
        self.REGFILE[self.STATUS] |= (1 << status_bits.V)
    if flags.__contains__('N'):
        self.REGFILE[self.STATUS] |= (1 << status_bits.N)

def clear_flags(self, flags: str):
    flags = flags.upper()
    if flags.__contains__('C'):
        self.REGFILE[self.STATUS] &= ~(1 << status_bits.C)
    if flags.__contains__('Z'):
        self.REGFILE[self.STATUS] &= ~(1 << status_bits.Z)
    if flags.__contains__('V'):
        self.REGFILE[self.STATUS] &= ~(1 << status_bits.V)
    if flags.__contains__('N'):
        self.REGFILE[self.STATUS] &= ~(1 << status_bits.N)

def test_flag(self, flag: str) -> int:
    flags = flag.upper()
    if flags.__contains__('C'):
        return (self.REGFILE[self.STATUS] & (1 << status_bits.C)) >> status_bits.C
    if flags.__contains__('Z'):
        return (self.REGFILE[self.STATUS] & (1 << status_bits.Z)) >> status_bits.Z
    if flags.__contains__('V'):
        return (self.REGFILE[self.STATUS] & (1 << status_bits.V)) >> status_bits.V
    if flags.__contains__('N'):
        return (self.REGFILE[self.STATUS] & (1 << status_bits.N)) >> status_bits.N

def set_value_flags(self, val: int):
    # Note we can't set the overflow
    # flag here as we need both input
    # and result values to computer
    # overflow.
    if val == 0:
        self.set_flags('Z')
    else:
        self.clear_flags('Z')

    if val > 0b0111_1111_1111_1111:
        self.set_flags('N')
    else:
        self.clear_flags('N')

    if (val & 0x10000) >> 17:
        self.set_flags('C')
    else:
        self.clear_flags('C')

def set_overflow_flags(self, v1: int, v2: int, result: int):
    if ((v1 & (1 << 15)) >> 15 & (v2 & (1 << 15)) >> 15) == 1:
        # Both values have the sign bit set
        # So the result should have the sign
        # bit unset. If not, we have overflow
        if not (result & (1 << 15)) > 0:
            self.set_flags('V')
        else:
            self.clear_flags('V')
    elif ((v1 & (1 << 15)) >> 15 | (v2 & (1 << 15)) >> 15) == 0:
        # Both values have the sign bits unset
        # So, the result should have the sign
        # bit unset.
        if ((result & (1 << 15)) >> 15) == 1:
            self.set_flags('V')
        else:
            self.clear_flags('V')

def set_carry(self, v1, v2, result):
    # value must be unmasked or it won't include the 17th bit.
    if ((result & (1 << 16)) >> 16) == 1:
        self.set_flags('C')
    else:
        self.clear_flags('C')

def set_borrow(self, v1, v2):
    if v1 < v2:
        self.set_flags('C')
    else:
        self.clear_flags('C')

Again, write some tests for these routines. Read up on overflow condition detection and binary arithmetic and be sure you understand what is going on here.  Note that for overflow we can’t just do a test and set, as the overflow condition occurs in two’s complement arithmetic and requires that we know the state of the values previous to the operation performed in order to properly calculate the overflow.

Also, the carry flag can only be set using the raw (unmasked) result of an operation, as it depends on the 17th bit of the result being tested. If we use a mask value to limit our results to 16 bits then there will be no carry bit to test, so this method must be called before masking the results to 16 bits.

Now it would be nice to have some methods to allow us to peek into the processor and see its state. The debugger in your IDE is good for this, but having a nice dump of the registers, memory and flags would allow us to visually check the results of an operation without needing the debugger.  Let’s implement a few routines for this just to make our lives easier.

# Misc Methods
def dump(self):
    print('\n\nSweet16-GP CPU')
    print('---------------------------------------------------------')
    self.dump_status()
    self.dump_registers()
    self.dump_memory()

def dump_status(self):
    result = ''
    status = self.REGFILE[self.STATUS]
    #print('Status 0b{:016b}'.format(status))
    if self.bit_test(status,status_bits.C):
        result += 'C, '
    if self.bit_test(status, status_bits.Z):
        result += 'Z, '
    if self.bit_test(status, status_bits.V):
        result += 'V, '
    if self.bit_test(status, status_bits.N):
        result += 'N, '
    result = result[:-2]
    print('Status Flags: 0b{:04b}'.format(self.REGFILE[self.STATUS]))
    print('---------------------------------------------------------')
    print(str(result) +' \n')

def dump_registers(self):
    idx = 0
    col = 0
    max_cols = 4
    print('Registers')
    print('---------------------------------------------------------')
    for r in self.REGFILE:
        if idx == 0:
            print('ACC (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
        elif idx == 4:
            print('RETPTR (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
        elif idx == 5:
            print('COMPARE (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)),end=',  ')
        elif idx == 6:
            print('STATUS (R{:02}):0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
        elif idx == 7:
            print('PC (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
        else:
            # General purpose register
            print('R{:02}: 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
        idx += 1
        col += 1
        if col >= max_cols:
            col = 0
            print()
    print()

def dump_memory(self, max_bytes=256):
    col = 0
    row = 0

    max_cols = 16
    max_rows = int(max_bytes/max_cols)
    idx = 0
    print('\nAddr   Data                                  MEMORY DUMP')
    print('------------------------------------------------------------------------------------------------------')
    for b in self.MEM:
        if col == 0:
            print('0x{:04x}'.format(idx), end=':  ')
            print('0x{:02x}'.format(b), end=', ')
        else:
            print('0x{:02x}'.format(b), end=', ')
        col += 1
        if col >= max_cols:
            row += 1
            col = 0
            print()
        if row >= max_rows:
            print()
            break;
        idx += 1
    print()

That’s quite a bit of code for a post. So let’s see it all together.

"""
 Sweet16GP an implementation of an 8/16 bit CPU Emulator
 based on Steve Wozniak's SWEET16 virtual machine.
"""

from time import sleep, time
import status_bits


class Sweet16GP:
    MEM = []
    REGFILE = [0, 0, 0, 0, 0, 0, 0, 0]
    ACC = 0
    RETSTACK = 4
    COMP = 5
    STATUS = 6
    PC = 7
    cur_instr = 0
    halt_flag = False

    def __init__(self):
        self.cold_boot()

    # booting routines
    def cold_boot(self, mem_size=256):
        self.init_ram(mem_size)
        self.warm_boot()

    def warm_boot(self):
        self.init_regfile()

    # Memory handling routines
    def init_ram(self, mem_size=265):
        self.MEM = []
        for addr in range(mem_size):
            self.MEM.append(0x00)

    def poke_byte(self, addr: int, val: int):
        self.MEM[addr] = val & 0xff

    def peek_byte(self, addr: int) -> int:
        return self.MEM[addr] & 0xff

    def poke_word(self, addr: int, val: int):
        self.poke_byte(addr, (val & 0xff))
        self.poke_byte(addr+1, (val & 0xff00) >> 8)

    def peek_word(self, addr: int) -> int:
        lo = self.peek_byte(addr)
        hi = self.peek_byte(addr+1)
        val = ((hi << 8) + lo) & 0xffff
        return (hi << 8) + lo

    # Register handling routines
    def init_regfile(self):
        for id in range(len(self.REGFILE)):
            self.REGFILE[id] = 0x00

    def get_register(self, id: int) -> int:
        return self.REGFILE[id] & 0xffff

    def get_register_lsb(self, id: int) -> int:
        return self.REGFILE[id] & 0xff

    def get_register_msb(self, id: int) -> int:
        return (self.REGFILE[id] & 0xff00) >> 8

    def set_register(self, id: int, val: int):
        self.REGFILE[id] = val & 0xffff

    def set_register_lsb(self, id: int, val: int):
        hi = self.REGFILE[id] & 0xff00
        self.REGFILE[id] = (hi | (val & 0xff)) & 0xffff

    def set_register_msb(self, id: int, val: int):
        lo = self.REGFILE[id] & 0xff
        self.REGFILE[id] = ((val & 0xff) << 8) | lo

    def inc_register(self, id: int):
        val = self.get_register(id)
        val = (val + 1) & 0xffff
        self.set_register(id, val)

    def dec_register(self, id: int):
        val = self.get_register(id)
        val = (val - 1) & 0xffff
        self.set_register(id, val)

    # Memory handling routines
    def init_ram(self, mem_size=265):
        self.MEM = []
        for addr in range(mem_size):
            self.MEM.append(0x00)

    def poke_byte(self, addr: int, val: int):
        self.MEM[addr] = val & 0xff

    def peek_byte(self, addr: int) -> int:
        return self.MEM[addr] & 0xff

    def poke_word(self, addr: int, val: int):
        self.poke_byte(addr, (val & 0xff))
        self.poke_byte(addr + 1, (val & 0xff00) >> 8)

    def peek_word(self, addr: int) -> int:
        lo = self.peek_byte(addr)
        hi = self.peek_byte(addr + 1)
        val = ((hi << 8) + lo) & 0xffff
        return (hi << 8) + lo

    # Bit twiddling routines
    def bit_set(self, val: int, bit: int) -> int:
        return val | (1 << bit)

    def bit_clear(self, val: int, bit: int) -> int:
        return val & ~(1 << bit)

    def bit_toggle(self, val: int, bit: int) -> int:
        return val ^ (1 << bit)

    def bit_test(self, val: int, bit: int) -> int:
        return (val & (1 << bit)) >> bit

    # STATUS Flags routines
    def set_flags(self, flags: str):
        flags = flags.upper()
        if flags.__contains__('C'):
            self.REGFILE[self.STATUS] |= (1 << status_bits.C)
        if flags.__contains__('Z'):
            self.REGFILE[self.STATUS] |= (1 << status_bits.Z)
        if flags.__contains__('V'):
            self.REGFILE[self.STATUS] |= (1 << status_bits.V)
        if flags.__contains__('N'):
            self.REGFILE[self.STATUS] |= (1 << status_bits.N)

    def clear_flags(self, flags: str):
        flags = flags.upper()
        if flags.__contains__('C'):
            self.REGFILE[self.STATUS] &= ~(1 << status_bits.C)
        if flags.__contains__('Z'):
            self.REGFILE[self.STATUS] &= ~(1 << status_bits.Z)
        if flags.__contains__('V'):
            self.REGFILE[self.STATUS] &= ~(1 << status_bits.V)
        if flags.__contains__('N'):
            self.REGFILE[self.STATUS] &= ~(1 << status_bits.N)

    def test_flag(self, flag: str) -> int:
        flags = flag.upper()
        if flags.__contains__('C'):
            return (self.REGFILE[self.STATUS] & (1 << status_bits.C)) >> status_bits.C
        if flags.__contains__('Z'):
            return (self.REGFILE[self.STATUS] & (1 << status_bits.Z)) >> status_bits.Z
        if flags.__contains__('V'):
            return (self.REGFILE[self.STATUS] & (1 << status_bits.V)) >> status_bits.V
        if flags.__contains__('N'):
            return (self.REGFILE[self.STATUS] & (1 << status_bits.N)) >> status_bits.N

    def set_value_flags(self, val: int):
        # Note we can't set the overflow
        # flag here as we need both input
        # and result values to computer
        # overflow.
        if val == 0:
            self.set_flags('Z')
        else:
            self.clear_flags('Z')

        if val > 0b0111_1111_1111_1111:
            self.set_flags('N')
        else:
            self.clear_flags('N')

        if (val & 0x10000) >> 17:
            self.set_flags('C')
        else:
            self.clear_flags('C')

    def set_overflow_flags(self, v1: int, v2: int, result: int):
        if ((v1 & (1 << 15)) >> 15 & (v2 & (1 << 15)) >> 15) == 1:
            # Both values have the sign bit set
            # So the result should have the sign
            # bit unset. If not, we have overflow
            if not (result & (1 << 15)) > 0:
                self.set_flags('V')
            else:
                self.clear_flags('V')
        elif ((v1 & (1 << 15)) >> 15 | (v2 & (1 << 15)) >> 15) == 0:
            # Both values have the sign bits unset
            # So, the result should have the sign
            # bit unset.
            if ((result & (1 << 15)) >> 15) == 1:
                self.set_flags('V')
            else:
                self.clear_flags('V')

    def set_carry(self, v1, v2, result):
        # value must be unmasked or it won't include the 17th bit.
        if ((result & (1 << 16)) >> 16) == 1:
            self.set_flags('C')
        else:
            self.clear_flags('C')

    def set_borrow(self, v1, v2):
        if v1 < v2:
            self.set_flags('C')
        else:
            self.clear_flags('C')

    # Instruction decoding routines
    def decode_inst(self, instr):
        pass

    # Misc Methods
    def dump(self):
        print('\n\nSweet16-GP CPU')
        print('---------------------------------------------------------')
        self.dump_status()
        self.dump_registers()
        self.dump_memory()

    def dump_status(self):
        result = ''
        status = self.REGFILE[self.STATUS]
        # print('Status 0b{:016b}'.format(status))
        if self.bit_test(status, status_bits.C):
            result += 'C, '
        if self.bit_test(status, status_bits.Z):
            result += 'Z, '
        if self.bit_test(status, status_bits.V):
            result += 'V, '
        if self.bit_test(status, status_bits.N):
            result += 'N, '
        result = result[:-2]
        print('Status Flags: 0b{:04b}'.format(self.REGFILE[self.STATUS]))
        print('---------------------------------------------------------')
        print(str(result) + ' \n')

    def dump_registers(self):
        idx = 0
        col = 0
        max_cols = 4
        print('Registers')
        print('---------------------------------------------------------')
        for r in self.REGFILE:
            if idx == 0:
                print('ACC (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            elif idx == 4:
                print('RETPTR (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            elif idx == 5:
                print('COMPARE (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            elif idx == 6:
                print('STATUS (R{:02}):0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            elif idx == 7:
                print('PC (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            else:
                # General purpose register
                print('R{:02}: 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            idx += 1
            col += 1
            if col >= max_cols:
                col = 0
                print()
        print()

    def dump_memory(self, max_bytes=256):
        col = 0
        row = 0

        max_cols = 16
        max_rows = int(max_bytes / max_cols)
        idx = 0
        print('\nAddr   Data                                  MEMORY DUMP')
        print('------------------------------------------------------------------------------------------------------')
        for b in self.MEM:
            if col == 0:
                print('0x{:04x}'.format(idx), end=':  ')
                print('0x{:02x}'.format(b), end=', ')
            else:
                print('0x{:02x}'.format(b), end=', ')
            col += 1
            if col >= max_cols:
                row += 1
                col = 0
                print()
            if row >= max_rows:
                print()
                break;
            idx += 1
        print()

    # CPU control routines
    def run(self):
        while not self.halt_flag and self.get_register(self.PC) < len(self.MEM):
            self.cur_instr = self.peek_byte(self.get_register(self.PC))
            print(f'PC: {hex(self.get_register(self.PC))}, cur_instr: {hex(self.cur_instr)}')
            self.decode_inst(self.cur_instr)
            sleep(1)
            self.inc_register(self.PC)
            

if __name__ == '__main__':
    cpu = Sweet16GP()
    cpu.set_register(0, 0xffae)
    cpu.poke_byte(0, 0xbc)
    cpu.poke_byte(1, 0xd7)
    cpu.poke_word(4, 0xabcd)
    cpu.dump()

Now test this code. You only need to write some code to set values in the registers and memory, then call the dump methods and visually inspect the results to ensure they are as expected. This will give us a nice visual for the results of a program run.

Ok, that’s a lot for today. We now have a large set of housekeeping methods. Next time we will revisit the run() method and learn how to decode our instructions .

Laser Communications With Arduino Developed MPA Workshop Series

by SysOps 0 Comments

I’ve been giving a series of workshops for several months now. These workshops have been focused on understanding the basics of the underlying computer hardware. But much of that material can get rather technical and so we needed some hardware projects that didn’t involve building logic gates from transistors or designing CPU hardware in Logisim or Digital. We needed a fun project to break up the more technical material we’ve been covering. So, I discussed some ideas with the host and we came up with a simple Laser Communications System using the Arduino Nano. The project is simple enough to complete in  an evening. Exciting enough to gain interest. After all, what geek wouldn’t want to play with a laser? So I spend a bit of time developing this simple project and thought I’d share it here.

Before we begin, we’ll need a few supplies.  Tools, components and a computing device capable of running the Arduino IDE. Any laptop, table, or desktop machine should do. Some larger cellphones may even work. Check and see if you can install the Arduino IDE on your phone first, if you are intending to use your phone.

Since this is a communications system, you’ll need two systems so you have a transmitter and receiver. Each system actually functions as both so two way communications is possible. But you still need something to communicate with. The parts list is for a single unit. So, if you are building this by yourself you’ll need to build two. So let’s see what we’ll need:

  • A Solder-less Breadboard

An Arduino board with a Usb connector and headers. We used the Nano

  • Assorted jumper wires.
  • A 2n2222P NPN Transistor. T0-92 case preferred.
  • Two 39K or 47K through-hole, axial-lead 1/4 watt resistors.
  • A DTOL 10 X Mini Laser Dot Diode Module Head WL Red 650nm, 5v, 5mW. Can be found on Amazon and eBay. Part number: DTOL-SHUS1517
  • A Small Signal Photo Transistor in T-1 5mm Case. Must be sensitive to lights at 650nm. I used some clear 400nm to 1000nm Photo transistors from Amazon. They worked well.
  • A 220 or 330 ohm 1/4 watt through-hole, axial-lead resistor.

So, first let’s look at the schematic for the Laser. Here we can see a simple NPN2N2222P transistor being used in switching mode. The base of the transistor is connected to the Arduino board’s Digital Pin 2 (D2) through a 39K ohm resistor. The resistor limits the current to the base and ensures proper operation of the transistor. The current applied to the base of the transistor is multiplied by the transistor and the multiple is passed through the collector – emitter circuit. A typical 2N2222P should have a gain (multiplying effect) or around 200 or more. So what ever current we place on the base will be multiplied by ~200 and that amount of current will pass through the collector – emitter circuit. This all actually over simplified but it gives a basic understanding of the circuit. By using the transistor we can control much more current than the Arduino board’s output pins are capa

Laser Control Circuit

ble of supplying. If we were to connect the Laser to the Arduino directly we could cause damage to the Arduino. Since it’s possible that the transistor could pass more current than our Laser diode is capable of using, we could also cause damage to the laser if we did not limit the current flow through the laser with R2 a 330 Ohm resistor. The value of this resistor may be as low as 100 ohms or as high as 400 ohms. The lower the rating the more current will flow and the brighter the laser. However, the risk of damaging the laser or the surrounding circuit also increases as this number is lowered. I recommend you stick with values between 180 ohms and 330 ohms.

 

Let’s see how this sub-circuit looks on the breadboard. Note that we don’t yet have R4 connected to pin D2 of the Nano. We need the Nano for power at this time but we don’t want to connect our control circuit until we know it works as expected.

Wire up the board as shown. I recommend you use the same color wires if possible. Always use RED for positive power and BLACK for 0-Volts or ground. Make sure you use the 5-Volt pin on the Arduino board and not the 3.3-volt pin.

Laser Control Circuit

When placing the Laser on the board the long leg (Anode) should be placed so it connects to R2 the 330 ohm resistor. The short leg (on the side with the flat spot on the rim) is called the Cathode. The cathode should connect to the collector of the transistor.

When placing the transistor the pins out are as shown for most 2N2222 transistors. If your’s has a part number that begins with a “P” then the collector and emitter may be switched. But most will be exactly as shown. This should put the flat side of the transistor facing you if you wire the board as I did.

Transistor Pin Out

Once you have the transistor placed and the board wired as show, double check everything. If there is someone around who can double check your work for you, that is always helpful. Be particularly cautious of the connections to the Arduino board.

Now, first we will be a manual test of our sub-circuit to ensure it works as expected. Apply a jumper wire from the open (non connected) end of R4 to ground. Now plug the Arduino board into a USB port on your computer. The Laser should be off at this point.

Next, move the jumper wire to make a connection from the open end of R4 to the positive rail (5-volts). Now the Laser should turn on. If this works as expected you’re ready to write some test code. If not, then you need to double check everything. Make sure you have +5 volts to the positive (RED rail of the breadboard and 0-volts on the blue or black rail). Also, check that your Laser is placed with the long led connected to R2 and the other end of R2 is connected to +5 volts. Next, make sure your Transistor is placed properly and try turning it around if the part number begins with a “P”.  Have someone else check the circuit for you. A new set of eyes often sees more than the tired set that’s been looking at the same thing for hours.

Above showing R4 jumper to GND. Laser should be off. Arduino must be powered.

Above showing R4 jumper to Vcc +5 Volts. Laser should be on. Arduino must be powered.

 

Test The Laser Driver

OK, now that we have the laser working manually, it’s time to write some code. If you don’t already have Arduino IDE installed, go on over to www.arduino.cc go to the Software menu and select download. Follow their instructions for installing the IDE and getting your Arduino board up and running with the Blink example provided in the IDE. Once you have that done, or if you already have your IDE installed, move on to the next step.

Our first step will be to connect the open end of R4 to the D2 pin on the Arduino board. Next, open a new sketch in the Arduino IDE and enter the following code exactly as shown below:

Listing #1


/*
   LASER Blink
   BY: Randall Morgan

   Turns the LASER on for one second, then off for one second, repeatedly.

   We use D2 to drive the base of the switching transistor.
   D3 will be used later to read the Phototransistor.
   
   This example code is in the public domain.
   www.randallmorgan.me
   www.eastern-montana-tech.com

*/

int LASER_OUT = 2; // Laser control pin
int DELAY_MS = 1000; // 2000 ms (2 Seconds) cycle period

// the setup function runs once when you press reset or power the board
void setup() {
   // initialize digital pin 2 as an output.
   pinMode(LASER_OUT, OUTPUT);
}

// the loop function runs over and over again forever
void loop() {
   // Loop turning LASER on and off
   digitalWrite(LASER_OUT, HIGH); // turn the LED on (HIGH is the voltage level)
   delay(DELAY_MS); // wait for a second
   digitalWrite(LASER_OUT, LOW); // turn the LED off by making the voltage LOW
   delay(DELAY_MS); // wait for a second
}

Load and run this code by pressing the right facing triangle in the IDE. You should see the Laser Diode turning on and off. This should continue until you kill the power to the board.

 

Receiving Light Signals

OK, so now we have a way to send light pulses. What we need now is a way to receive light pulses. To do this we will use the Photo-transistor.  This is a transistor who’s base is exposed as a light sensitive element under the plastic package. Actually, all transistors are sensitive to light but a Photo-transistor is manufactured to take advantage of this phenomenon where other types of transistors are designed to minimize this phenomenon. Photo-transistors aren’t the only light sensitive devices we could use. Photo resistors are slower but work very well for sensing ambient light. Photo diodes work almost like Photo-transistors and can switch very fast for high speed communications.

In our project we chose the photo transistor as they are easy to work with and readily available. There is a three legged photo-transistor. The third leg is an exposed base. Basically, you can use the exposed base to adjust the sensitivity of the device. That’s a real nice feature but it complicates the design. So we chose not to use this type of photo transistor. If you purchased one of these it may work fine with the base left floating (unconnected). Sp give it a try before going out and purchasing the two legged type.

Complete Schematic

Looking at the schematic we can see that the photo transistor has it’s emitter tied to ground. The collector is connected to Vcc +5 volts through a 39K ohm resistor. This arrangement is known as a PULL-UP configuration. When the photo transistor is in the dark (or at least not receiving light on it’s base element), it is turned off and no current can flow through the transistor. This allows R1 to pull the input pin D3 on the Arduino to 5 volts. This creates a logic hi on the input. When the transistor receives light it turns on and pulls the collector down to near ground. This applies a logic lo or zero on pin D3. In this way we can sense the incoming light pulses.   Please ignore the dashed line. Fritzing kept replacing it after I removed or rerouted it. Very frustrating…

Photo Transistor Circuit Added

 

Now lets write some code to test this circuit. If you’re wondering why I’m not doing a manual test first, it’s because the circuit is very simple and about the only two things that may go wrong are that perhaps you put the photo transistor in backwards or you connect it to the wrong pin on the Arduino. If you have a multi-meter you can manually test the circuit by measuring the voltage on the collector of the photo-transistor. Cut a short piece of soda straw and place it over the photo-transistor. Then place the Laser Diode in the other end. SO that the laser shines on the end of the photo transistor. The voltage should drop on the collector to somewhere around 0.1-0.2 volts of less. If it does, you’re good to go. If it doesn’t, try turning the transistor around.

We’re going to write a little test code before we go all out and start sending messages to Mars and beyond. We use a little program to flash the laser on and off, one second in each state. During the middle of the state we’ll read the photo transistor on the digital input pin 3.

Now we need a way to communicate back to user just what the value read from the input is. We could setup an led and turn it on if the input is one or, we could just use the serial port to send the value back to the serial monitor in the Arduino IDE. I think this second approach is best and requires no additional hardware. So let’s see what this looks like:



/*
  LightReader
  By: Randall Morgan

  This test circuit simply toggles the laser connected
  to digital pin D3 on and off at a rate of 1/2 Hertz.
  After toggling the laser's state it then reads the
  input from the photo transistor on digital pin 2.

  To run this program you'll need an Arduino board
  with a laser diode connected via an npn switching
  transistor to digital pin 3.

  In addition you'll need a photo transistor connected
  to a resistor in a pull-down arrangment and the
  junction of the pull-up resistor and the photo
  transistor collector connected to digital pin 2 on
  the Arduino board.

  See the article at: http://www.randallmorgan.me/blog/
  Search for Arduino Laser Communications

  This code is placed in the public domain
*/

int LASER_OUT = 5; // Laser control pin
int PHOTOTRANS_IN = 3; // Photo transitor sense pin
int DELAY_MS = 1000; // 2000 ms (2 Seconds) cycle period
int val = 0; // input value
char buf[100];

// the setup function runs once when you press reset or power the board
void setup() {
  // initialize digital pin LED_BUILTIN as an output.
  pinMode(LASER_OUT, OUTPUT);
  pinMode(PHOTOTRANS_IN, INPUT);

  // Set up serial port
  Serial.begin(9600);
  // Wait for hardware serial to connect
  while(!Serial) {
    // wiating for connection....
  }
  Serial.print("Connected\n");
}

// the loop function runs over and over again forever
void loop() {
  // Turn on laser and read photo transistor
  digitalWrite(LASER_OUT, HIGH); // turn the LED on (HIGH is the voltage level)
  delay(DELAY_MS/2); // wait for a second
  val = digitalRead(PHOTOTRANS_IN); // read the photo transistor
  sprintf(buf, " Val: %d \n", val);
  Serial.print(buf);
  delay(DELAY_MS/2);

  // turn off laser and read photo transistor
  digitalWrite(LASER_OUT, LOW); // turn the LED off by making the voltage LOW
  delay(DELAY_MS/2);
  val = digitalRead(PHOTOTRANS_IN);
  sprintf(buf, " Val: %d \n", val);
  Serial.print(buf);
  delay(DELAY_MS/2);
}


Reviewing the code we can see two interesting additions. First, we call Serial.begin() in the setup() function and pass it the baud rate we want to use. Since the default baud rate for the serial monitor in the IDE is 9600, we use that. Next we wait for the serial library to get a connection to the host computer. The while loop performs this wait for us and exits when Serial returns true. This occurs when the serial library has a connection to the host.

In the function loop() we added calls to SerialWrite() and pass in a buffer containing a string of characters to send to the host. We cheated a little and initialized a character array of 100 characters. Ideally, you’d want to safe guard against buffer overflow here. If you don’t know what that is, do a little googling and read up on it.

Before we can send the value read from the digital input pin to the host we must convert it to a string representation. What we read is actually a number not a character. So, we pass the buffer and the value to the C function sprintf() for formatting and conversion. You can find good information on many C and C++ functions and libraries that can be used with Arduino googling the “Standard C/C++ libraries”.

OK, so now we can control our laser and sense when a laser is on or off when pointed at out photo transistor. The only piece we are lacking now is the code to allow us to communicate with another unit. To accomplish this, we only need to setup a software serial port. Software serial ports work similar to hardware serial ports except they are just routines running to read and send bits on the I/O pins.

The serial library requires the use of interrupts. An interrupt is a signal to the processor to stop what it is doing and service another routine. Since the SoftwareSerial routine needs to be called regualrly so it doesn’t miss a bit, it requires inputs that are interrupt capable. Both D2 and D3 are interrupt capable pins but they are not the only ones on the Arduino. So if you used other pins, you’ll need to move your control signals to pins with interrupt support. The SoftwareSerial library is shipped with Arduino IDE so there is no need to download and install another library.

So, let’s talk about what we need to accomplish with our code. First, we need to send data on the laser. We can do this by setting the laser control pin as the software serial libraries transmit pin. In order to receive data we simple need to set the photo transistor’s sense pin as the software serial libraries receive pin. Then we simple use the hardware serial port to send anything received on the software serial port to the host. Anything we receive from the host on the hardware port can be sent on by the software serial port. So let’s see how we can implement this:


/*
   LightReader
   By: Randall Morgan

   This test circuit simply toggles the laser connected
   to digital pin D3 on and off at a rate of 1/2 Hertz. 
   After toggling the laser's state it then reads the 
   input from the photo transistor on digital pin 2.

   To run this program you'll need an Arduino board
   with a laser  diode connected via an npn switching 
   transistor to digital pin 3.

   In addition you'll need a photo transistor connected 
   to a resistor in a pull-down arrangment and the 
   junction of the pull-up resistor and the photo 
   transistor collector connected to digital pin 2 on
   the Arduino board.

   See the article at: http://www.randallmorgan.me/blog/
   Search for Arduino Laser Communications 

   This code is placed in the public domain
*/

// Tell the IDE we want to use 
// the SoftwareSerial library
#include 

int LASER_OUT = 2;      // Laser control pin
int PHOTOTRANS_IN = 3;  // Photo transitor sense pin
int DELAY_MS = 1000;    // 2000 ms (2 Seconds) cycle period
int val = 0;            // input value
char buf[100];


SoftwareSerial laserSerial(PHOTOTRANS_IN, LASER_OUT);

// the setup function runs once when you press reset or power the board
void setup() {
  // initialize digital pin LED_BUILTIN as an output.
  pinMode(LASER_OUT, OUTPUT);
  pinMode(PHOTOTRANS_IN, INPUT);  

  // Set up serial port
  Serial.begin(9600);
  // Wait for hardware serial to connect
  while(!Serial) {
    // wiating for connection....
  }

  Serial.print("Connected\n");

  // Setup Software Serial
  laserSerial.begin(57600);

  // Now we start sending data to 
  // allow the users to line up the
  // lasers and photo transistors.
  // Once the user hits a key, we 
  // stop and begin transceiving 
  // data on the serial ports.
  while(!Serial.available()){ 
    laserSerial.write("Light is beautiful!...");  
  }
  
}


// the loop function runs over and over again forever
void loop() {
  // If the photo transistor has received 
  // and data, send it to the host
  if(laserSerial.available()) {
    Serial.write(laserSerial.read());
  }

  if(Serial.available()) {
    laserSerial.write(Serial.read());
  }
}


Now enter the code above into the Arduino IDE and upload it to your Arduino board. The system should begin sending the phrase “Light is beautiful!…” immediately. Use this phrase data to line up your laser with the second unit’s photo transistor. Once the alignment is complete, hit a key on the keyboard and the transmission should stop. The system is now in transceiver mode. Anything you type into the serial monitor will be sent to the second unit. Anything the second unit sends will be sent to the serial monitor for display.

Congratulations, you now have completed the project. But this is only the beginning. There are many improvements to both the code and the circuit that can be made. Take the circuit presented here and make changes to it. Share those changes with others and let’s see what you come up with.

Until next time, Happy Hacking!

Designing The MiniMips – Part 4 Computer Architecture and Data Paths

by SysOps 0 Comments

Ok, last time we completed the basic design of our ALU. I hope you took the time to build it in Logisim and play with it. It should be trivial to comfirm it works by doing simple operations on paper or a programmer’s calculator and the ALU and comparing the results. Once you have confirmed that your ALU works correctly,  it is time to save it as a sub-circuit in Logisim. We’ll take a short step back to discuss the basic operation of a CPU and support systems and then we’ll answer to question of where our ALU input values and control signals come from and where the output goes.

Computer Architecture

Computers all follow a similar architecture. At a minimum they have Input, Processing, Storage, and Output. These components can take various forms. For example, input could come from a keyboard as with the typical personal computer, or it may come from some type of sensor such as the oxigen sensor in your car. Output could be to a display screen such as the LED monitor or a relay used to control a well pump. Memory can take the form of registers, magnetic disks, solid state disks, integrated circuits, or even paper punch cards. Processing can be either analog, digital, or even quantum devices.  However, the typical computer most people think of is the PC or typical desktop or laptop machine.

Computers systems can be categorized into many categories. Here we will place them into two broad categories based on the system architecture. Systems using seperate stores for instructions and data are said to be of a Hardvard Architecture. These machine read cpu instructions from ROM (Read Only Memory) and execute them. All data is stored in the system RAM (Random Access Memory). A typical microcontroller such as the Atmel AtMega324 is an example of this architecture. Other machines store cpuinstructions and data in the same memory. This allows for programs to become dynamic, being altered or created by the running program. This type of achitecture is known as Von Nueman Architecture. The Von Nueman Architecture is a powerful because program instructions are treated just like data. It does have it’s down sides however. One such short coming is the issue of security. Since any running program can access the program instructions, it can alter not only it’s own code but the code of other programs. So a rouge program could for instance alter the program for calculating an aircraft’s altitude. This could result in personal injury or death to the passengers. Because of this short-coming a lot of work has gone into developing ways to limit programs from having complete access to all programs stored in memory. Modern Von Nueman machnes contain a large amount of circuitry to manage program access to memory so such things wont occur.

Here we will develop a system using the Hardvardarchitecture and use a seperate instruction and data memories. This will allow us to reduce a few complexities and focus on other issues. In the future I may write another article on developing a Von Nueman machine.

CPU Architecture and Data Paths

Ok, let’s talk a bit about how a digital computer system works.  The first thing that occurs in most systems when power is applied is that the major components are set to a predetermined state. This reset also occurs when a “Reset” button is pushed. If you’ve ever had to take a paper click and hold in a tiny button on your router, then you’ve used a reset button. After the system has been reset, the cpu begins to step through it’s cycles. The first cycle is the ‘Fetch’ cycle. In this cycle the CPU fetches the next instruction from the program memory. Next, the instruction decoder decodes the instruction into the appropriate control signals for the various cpu components. The next step is to ‘Execute’ the instruction. The last step is to ‘Store’ the results in memory if required by the instruction. Most cpu instructions follow this pattern. However, this cycle can be shortened or have wait states added to it for complex instructions. In simple processesors some of these instructions are combined. Usually the Fetch and decode cycles are made to occur in the same cycle. In a single cycle cpu are these steps occur in one clock cycle. This is often accomplished by multiplying the clock input to generate the needed signals. However, in some architectures such as the one we will build, these steps can occur in a single cycle without the need for clock multiplication.

Below is the logisim circuit for the SimpleComputer.  This is the basic machine we will be building.  I’ve included some 7-Segment Hex digit displays to allow use to view the the CPU operation. I also included four ports on the ALU that allow us to connect some of those displays directly to the registers in the registerfile. While the circuit looks pretty basic here, it is only because I have used logisim’s ability to can a subcircuit as a component. So in this view all the major components are actually subcircuits that vary in complexity. They are all made out of logic gates of logisim standard components. If you have not used the Subcircuit feature in logisim I recommend you break out the tutorials and learn to use them. OK, let’s describe the circuit from a 10,000 foot level. We’ll being seeing each componet in great detail as we move forward.

The machine must first be reset using the reset button in the lower left of the circuit. Then we can single step through programs by clicking the clock input. We can run programs from the simulation menu in logisim. Once reset, the PC (Program Counter) will be set to 0x00 hexidecimal (hex). This will cause the first program instruction to be placed on the data output of the ROM (Read Only Memory). The ROM instruction code is broken into various parts that control certain aspects of the CPU. For eample, bits 0-3 of the instruction contain the ALU OP code. These four bits are used directly to select the chosen ALU operation. For shift operations bits 4-7 select the shift distance. For example if bits 4 – 7 contain the value 0010b in binary, then the value on the ALU’s A input will be shifted by two bit positions. Many bits in the instruction perform multiple duties. For eaxample, although bits 0-3 and 4-7 are used for the ALUOP code and the SHIFTAMT they also do double duty as the lower order byte of an immediate address in an instruction that contains an address value.

Once the instruction code is broken into it’s constituent parts, the various signals are reouted to other parts of the circuit as needed. A portion of the instruction code defines the type of instruction being executed. This allows the proper signals to be generated to control the parts of the circuit that are not directly controled by the bits in the instruction. Next the if the instruction is a register type instruction, the two source registers and the destination register are selected. The values in those registers and posibly an immediate value are feed into the ALU A and B inputs. The ALU then produces the out for all the operations we provided hardware for. This is done in parallele so producing all output at once isn’t an issue. The ALUOP code selects the desired function results and places them on the ALU output. Depending on the instruction the ALU ouput could be routed back to the destination register selected in the RegFile or it may be stored in RAM or used as a RAM address. It could also be used to set condition flags the determine if a branch will be taken. Lastly, it can be used as the input to the PC though that portion of the circuit is not shown here.

Look over the circuit and trace through the signal paths. See if you can figure out how the various parts work to control the circuit. Next time we’ll talk about the Instruction Set and continue on about the architecture. Until then, I hope you all have a happy new year!

 

 

Newsletter Powered By : XYZScripts.com