Header Image - Randall Morgan

Tag Archives

4 Articles

Sweet16-GP Assembler – Part 2

by SysOps 0 Comments
This entry is part 7 of 8 in the series Sweet16-GP CPU: A Complete Development Cycle

Last time we left our assembler with only two functions. The strip_lines() function which removes comments and blank lines and the rename_registers() function which replaces the register’s friendly names with the cononical names i.e. Rx where x is a number between 0 and 7 (the register’s index value into the register file).

With these to pre-processing functions out of the way we are ready to start processing our assembly code.  We will use a two-pass method of assembly. The first pass will break each line into its constituent parts and then locate the instruction’s addressing mode, stores it in a field called mode, and calculate the instruction’s actual opcode value and length. In addition it will store a labels in a symbol table to record their value for later use in the second pass or the assembler.

Our main assembler method will take our clean lines, one at a time, and parse them as described above. We will need a table of instructions that includes details about each instruct including it’s addressing mode(s), opcode(s) and length. You may be wondering why the “(s)” above. This is because some of our instructions have multiple addressing modes and therefore multipple opcodes and perhaps even multiple instruction lengths. We will use our table to disambiguate the instruction format into the proper values. 

We will implement our opcode_table as a dict. Each entry will be  indexed (key) by the instruction mnemonic.  The value for each key in the table will be a dict indexed by the addressing mode. Each addressing mode will then have a list of values containing the instruction’s opcode. If the opcode requires a one or two by value, they will be provided as VALUE_L and VALUE_H in the list. This description sounds much more complex than it really is. Refere to the code for our table below and re-read the above paragraph. It should all become clear.

opcode_table = {
    'HALT': {
        'implicit': ['0x00'],
    },
    'BRA': {
        'immediate': ['0x01', VALUE_L],
        'offset': ['0x01', OFFSET],
    },
    'BRC': {
        'immediate': ['0x02', VALUE_L],
        'offset': ['0x02', OFFSET],
    },
    'BRZ': {
        'immediate': ['0x03', VALUE_L],
        'offset': ['0x03', OFFSET],
    },
    'BRN': {
        'immediate': ['0x04', VALUE_L],
        'offset': ['0x04', OFFSET],
    },
    'BRV': {
        'immediate': ['0x05', VALUE_L],
        'offset': ['0x05', OFFSET],
    },
    'BSR': {
        'immediate': ['0x06', VALUE_L, VALUE_H],
        'offset': ['0x06', LABEL_L, LABEL_H],
    },
    'RTS': {
        'implicit': ['0x07'],
    },

    'SET': {
        'direct': ['0x08', VALUE_L, VALUE_H],
    },
    'LD': {
        'register': ['0x10'],
        'indirect': ['0x20'],
    },
    'ST': {
        'register': ['0x18'],
        'indirect': ['0x28'],
    },
    'LDD': {
        'indirect': ['0x30'],
    },
    'STD': {
        'indirect': ['0x38'],
    },
    'POP': {
        'indirect': ['0x40'],
    },
    'STP': {
        'indirect': ['0x48'],
    },
    'ADD': {
        'register': ['0x50'],
    },
    'SUB': {
        'register': ['0x58'],
    },
    'MUL': {
        'register': ['0x60'],
    },
    'DIV': {
        'register': ['0x68'],
    },
    'AND': {
        'register': ['0x70'],
    },
    'OR': {
        'register': ['0x78'],
    },
    'XOR': {
        'register': ['0x80'],
    },
    'NOT': {
        'register': ['0x88'],
    },
    'SHL': {
        'register': ['0x90'],
    },
    'SHR': {
        'register': ['0x98'],
    },
    'ROL': {
        'register': ['0xA0'],
    },
    'ROR': {
        'register': ['0xA8'],
    },
    'POPD': {
        'indirect': ['0xE0'],
    },
    'CPR': {
        'register': ['0xE8'],
    },
    'INC': {
        'register': ['0xF0'],
    },
    'DEC': {
        'register': ['0xF8'],
    },
}

As you can see, our instructions can fall into one of four addressing modes. These include implicit (when the instruction itself indicates the addressing mode),  immediate (when the value needed by the instruction directly follows it), indirect (when the register specified contains the value needed to complete the instructions, and register (when the instruction operates directly on the register specified). To tell the truth, these could be further broken down but that would only complicate things.

OK, now we have our opcode_table we are going to start writing our parser. Our parser is a little bit unorthodox. Since our input language is so constrained we can forego all the formal theory and much of the discipline or parser development. 

I do believe however, that any developer worth their weight should write at least a simple parser at some point. Understanding even a very simple parser expands your understanding of how your tools work whether you’re a web developer, AI master, ETL programmer, or Embedded Systems developer. If you call yourself a Software Engineer or Programmer and you have written at least a simple parser. compiler, or interpreter, then you owe it to yourself to do so! To get you started I recomend DR. Jack Crenshaw’s “Let’s Build A Compiler” and Ruslans’s “Let’s Build A Simple Interpreter” blog serries. If you’re a seasoned developer checkout the book “Language Implementation Patterns”. For those mathematically oriented, checkout the Dragon Book

So what I am going to do is simply plow forward. We will tackle one issue at a time and add functions to handle each issue as it arises. Note that this technique wont work if you’re trying to write a high level language compiler or interpreter. There you will need all the formal methods and a have good design before writing code. What we are going for here is a quick and simple boot-strap assembler. We can write better tools later. For now, we just need something to get us up and running.

OK, the first thing we are going to need is some set of methods to break up the line of source code we feed to our assembler into it’s various fields. We will create an entry point function called assemble which will do our heavy lifting by calling other functions.

We will need to produce some data about the current source line. This will include any label, register, addressing mode, and value. Additionally, we may also want to keep track of the original instruction mnemonic and any intermediate code we produce. So we will need a supporting function to handle this. We will also need some storage for any lables or symbols we encounter during parsing.

The code below is our entry point. It will be responsible for handling our two apasses over our code and  will call the functions that do the heavy lifting.

def assemble(lines, lc=0):
    """Breaks the line into its constituent parts.
        it then locates the instruction's addressing mode,
        stores it in mode, and calculates the instruction's
        actual opcode value and length. """
    objcode = []
    symbols = {}

    # Pass 1 : Parse instructions and create intermediate code
    for lineno, label, (value, register, mode, mnemonic, icode) in parse_lines(lines, symbols):
        # Try to evaluate numeric labels and set the lc (location counter)
        if label:
            try:
                lc = int(eval(label, symbols))
            except (ValueError, NameError):
                symbols[label] = lc

        # Store the resulting object code for later expansion
        if icode:
            objcode.append((lineno, lc, value, register, mode, mnemonic, icode))
            lc += len(icode

Note the call to parse_lines() we will write this function next. But, let’s step back and think about what we need it to do…

Given a line of code like:

start:    SET   R1, 0xFA

We can see we need to break up this line into it’s smaller parts. Luckily, this is pretty easy to do. Also, since each line will follow a similar pattern for the fields, we can easily deduce what each field contains. 

# Exception used for errors
class AssemblyError(Exception):
    pass

def parse_lines(lines, symbols):
    """ Determine line number, mnemonic, register, value, and address mode"""
    for lineno, line in enumerate(lines, 1):
        # Handle labels
        label, *colon, statement = line.rpartition(":")
        try:
            # parse the line into ir (intermediate representation)
            data = lineno, label, parse_opcode(statement, symbols) if statement else (None, None, None, None, None)
            yield data
        except AssemblyError as e:
            print("{0:4d} : Error : {1}".format(lineno, e))

Refering to the code above, we use the enumerate function to keep track of our line numbers. One drawback to doing this approach is that our comment and blank lines wont be counted. Something we could fix by tracking line numbers in the strip_lines() function. But for now, we will do it this way.

Enumerate returns to us an integer value for the line number (lineno) and the line of source code. Next, we use rpartition(“:”) on on the source line to break it into three parts. These parts are return in a tuple containing the part before the seperator (colon in our case, which ends a label declaration), the seperator, and the part after the seperator. If any part is not found (for example, if there is no colon in the source line) then that part returns an empty string value. So if our line does not contains a colon, we will get an empty string for the label and for the seperator followed by the initial line of code which we save in the statement variable.

Once we have gotten any label, we need to take our statement and further parse it into its various parts. If it turns out that our statement is empty, we need to return a default tuple of ‘None’ values otherwise, we make a call to parse_opcode() passing in our statement and our symbol table so that any symbols found in our source line can be handled. If this all goes horribly wrong, we throw an exception and print the error message. We defined a simple exception class above for this purpose.

Our next task is to implement a function to parse the line into opcodes. We called this function pasre_opcode() above. The implementation is shown below:

def parse_opcode(line: str, symbols) -> tuple:
    """ Break the line into its constituent parts.
        Locate any labels and store their definition.
        Then locate the instruction's addressing mode,
        and stores it in mode. Calculate the instruction's
        actual opcode value.

        Returns: tuple(value, register, mode, objcode) where
        value is a string to be evaluated in the second pass.
        register is the register value or empty string if no
        register exists. "mode" is the addressing mode and
        objcode is a dict containing the base opcode value,
        and a list of functions needed to process the
        instruction
    """
    fields = line.split(None, 1)
    nofields = len(fields)
    if nofields > 1:
        extra = fields[1].split(',')
        if len(extra) > 1:
            fields[1] = extra[0]
            fields.extend(extra[1:])
    nofields = len(fields)
    mnemonic = fields[0]
    register = ''
    value = ''

    """ Examples:
            HALT
            BRA 0x1F
            SET R0, 0xFFFE
            LD R2
            STD @R3  
    """
    # Get register and value
    if nofields > 1:
        register = parse_register(fields[1])

    if register == '' and nofields > 1:
        value = parse_value(fields[1])

    if nofields > 2:
        register = parse_register(fields[1])
        value = parse_value(fields[2])

    # Get address mode
    mode = parse_address_mode(fields)

    # Get all addressing modes for this mnemonic
    opcodemodes = opcode_table.get(mnemonic)
    if not opcodemodes:
        raise AssemblyError("Unknown opcode '{}'".format(mnemonic))

    # Get the address mode used in the instruction
    objcode = opcodemodes.get(mode)
    if not objcode:
        raise AssemblyError("Invalid addressing mode '{0}' for {1}".format(mode, mnemonic))

    return value, register, mode, mnemonic, list(objcode)

There is a lot going on here so let’s unpack it. First, we use the split() method to split the statement into two portions. The first parameter to split() gives the character to split on and the second parameter is used to limit the number of splits. What we are doing here is seperating the instruction mnemonic from the rest of the statement. Since some instructions (i.e.: HALT) only have the mnemonic we need to check the number of fields we get back. If it’s only one, then we have an instruction like HALT or RTS. 

However, if we have more than one field returned from the split we need to further decompose the line. We now have a mnemonic in fields[0] and the remainder of the line in fields[1]. We now know we may have something like Rx, @Rx, or Rx, 0xFA following the instruction mnemonic. So the next move is to tray and seperate the remaining portion of the statement on the comma seperator. If the comma is found, one of two cases will be returned i.e.: (register, value) or @register, value). These are stored in the extra variable. We then add the extra fields to the fields variable and assign mnemonic the value in fields[0] which contains the instruction mnemonic.

Next, using the value of ‘nofields’ which indicate what we should expect next, we sort out the remaining fields and store their values into their perspective variables. In each case, we call parse functions which perform a specific sub-task.

Our first sub-task is to locate any register in the fields. At this point we know if a register exists it should be in fields[1]. So we pass this value to the parse_register() function shown below:

def parse_register(field: str) -> str:
    """ Examples:
            R2
            @R3
            R2, 0x1F
    """
    register = field.upper()
    if register.__contains__('R'):
        pos = field.find('R')
        register = field[pos + 1:pos + 2]
    return register

The first thing we do is convert the register value to upper case. Then we check if the string contains ‘R’ and return the character following the ‘R’ which should be the register index value (a single digit between 0 and 7 inclusively). We then return that value to the caller.

It’s possible that we didn’t have a register. In this case, there may be a value in the fields[1] position. So we check if the call to parse_register() returned a value of an empty string. If the latter was returned, we try and parse a value using the function parse_value() passing in fields[1] as a parameter. This situation occurs with the branch instructions where the mnemonic is directly followed by the offset value for the jump.

def parse_value(val: str) -> str:
    """ Try to parse a value, which can be an integer
        in base 2, 8, 10, 16 or a register.

        Returns: A string representation of the integer
        value in base 10, or empty string if the value
        cannot be converted to an integer.
        On Error: return original value.
    """
    # convert value to int from various bases
    if isinstance(val, str):
        # Get any possible prefix
        val = val.strip(' ')
        base = val[0:2]
        if val.isnumeric():
            return str(int(val, 10))
        if base == '0x':
            return str(int(val, 16))
        elif base == '0c':
            return str(int(val, 8))
        elif base == '0b':
            return str(int(val, 2))
        elif val.__contains__('R'):
            return ''
        else:
            return val

As you can see above the parse_value() function must handle many types of values. It first strips off any prefix to the value and tests if this matches any of the supported numerical base indicators. If a match is found the value is converted to an integer and then cast into a string and returned to the caller.

Next, we call parse_address_mode() passing in all the fields as we may need more than one to identify the addressing mode. The parse_address_mode() function can only be understood in the context of the opcode_table. So refer to the table as you gork this code:

def parse_address_mode(fields: list) -> str:
    """ Example inputs:
                ['HALT']                    : implicit mode
                ['BRA', '0x1F']             : immediate mode
                [LD, R2]                    : register
                [STD, @R3]                  : indirect
                ['SET', 'R0,', '0xFFFE']    : direct
                ['BRC' 'end']               : offset
    """
    numfields = len(fields)
    if numfields == 1:
        return 'implicit'
    elif numfields == 2:
        if fields[0] in directives:
            return 'immediate'
        if fields[1].__contains__('@R'):
            return 'indirect'
        elif fields[1].__contains__('R') and fields[1].__contains__(','):
            return 'direct'
        elif fields[1].__contains__('R'):
            return 'register'
        elif parse_value(fields[1]).isnumeric():
            return 'immediate'
        elif parse_value(fields[1]).isalnum() or parse_value(fields[1]).isalpha():
            return 'offset'
    elif numfields == 3 and parse_value(fields[2]).isnumeric():
        return 'direct'
    elif numfields == 3 and '(' in fields[2] and ')' in fields[2]:
        return 'direct'
    else:
        raise ValueError('Expected numeric value got: {expected}'.format(expected=fields[2]))

The parse_address_mode() function uses the data contained in the fields to sort out the addressing mode of the instruction and return it to the caller, parse_opcode() which then stores this value in the mode variable. 

Next we use the mnemonic value to get the instruction data from the opcode_table and then use the mode variable to get the proper opcode value and instruction format.

If all has gone well, we return a tuple containing the value, register, mode, mnemonic, variables and the object code taken from the opcode_table.

At this poiont our first pass is complete. We have all the data necessary to assemble our instruction code with the possible exception of any forward declared symbol. 

We have one last issue before we can try this out. Our opcode_table has method names in some of the fields. We need to implemnt these helper methods:

All of these methods are pretty strainght forward. They simply take the value from the source code and separe it out into high bytes and low bytes in the case of VALUE_H, VALUE_L, LABEL_H, and LABEL_L. The OFFSET function calculates the offset from the current position.

# Functions used in the creation of object code (used in the table below)
def VALUE_L(pc: int, value):
    val = value_to_int(value) & 0xff
    return str(val & 0xff)


def VALUE_H(pc: int, value: int) -> int:
    val = (value_to_int(value) & 0xff00) >> 8
    return str(val & 0xff)
 
  
def LABEL_L(pc, value):
    print(f'LABEL_L PC: {pc}, value: {value}')
    return (value) & 0xff


def LABEL_H(pc, value):
    print(f'LABEL_H PC: {pc}, value: {value}')
    return ((value) & 0xff00) >> 8


def OFFSET(pc, value):
    print(f'OFFSET: {value}')
    return ((value - pc)) & 0xff


def value_to_int(value):
    # convert value to int from various bases
    if isinstance(value, str):
        if value[1:3].lower() == '0x':
            return int(value, 16)
        elif value[1:3].lower == '0c':
            return int(value, 8)
        elif value[1:3].lower() == '0b':
            return int(value, 2)
        else:
            print('Invalid value.')
    elif isinstance(value, int):
        return value
    # else:
    #     raise ValueError("Expected numerical value, got: {0}".format(value))


def register_to_int(reg):
    if isinstance(reg, str):
        if reg.startswith('@R'):
            return int(reg[2:], 16)
        elif reg.startswith('R') and reg.endswith(','):
            return int(reg[1:-1], 16)
        elif reg.startswith('R'):
            return int(reg[1:])
        else:
            raise ValueError("Cannot parse the register: '{0}'".format(reg))

PK, I think we’re ready to try this! Below is the final code. Note I made some adjustments to the code from the last installment so we could test what we’ve done so far. 

"""Sweet16-GP Assembler"""


# Exception used for errors
class AssemblyError(Exception):
    pass


# Functions used in the creation of object code (used in the table below)
def VALUE_L(pc: int, value):
    # print(f'VALUE_L value: {value}')
    #val = value_to_int(value) & 0xff
    #print(f'VALUE_L val: {val}')
    #return str(val & 0xff)
    pass


def VALUE_H(pc: int, value: int) -> int:
    val = (value_to_int(value) & 0xff00) >> 8
    return str(val & 0xff)


def LABEL_L(pc, value):
    return (value) & 0xff


def LABEL_H(pc, value):
    return ((value) & 0xff00) >> 8


def OFFSET(pc, value):
    return ((value - pc)) & 0xff


def value_to_int(value):
    # convert value to int from various bases
    if isinstance(value, str):
        if value[1:3].lower() == '0x':
            return int(value, 16)
        elif value[1:3].lower == '0c':
            return int(value, 8)
        elif value[1:3].lower() == '0b':
            return int(value, 2)
        else:
            print('Invalid value.')
    elif isinstance(value, int):
        return value
    # else:
    #     raise ValueError("Expected numerical value, got: {0}".format(value))


def register_to_int(reg):
    if isinstance(reg, str):
        if reg.startswith('@R'):
            return int(reg[2:], 16)
        elif reg.startswith('R') and reg.endswith(','):
            return int(reg[1:-1], 16)
        elif reg.startswith('R'):
            return int(reg[1:])
        else:
            raise ValueError("Cannot parse the register: '{0}'".format(reg))


opcode_table = {
    'HALT': {
        'implicit': ['0x00'],
    },
    'BRA': {
        'immediate': ['0x01', VALUE_L],
        'offset': ['0x01', OFFSET],
    },
    'BRC': {
        'immediate': ['0x02', VALUE_L],
        'offset': ['0x02', OFFSET],
    },
    'BRZ': {
        'immediate': ['0x03', VALUE_L],
        'offset': ['0x03', OFFSET],
    },
    'BRN': {
        'immediate': ['0x04', VALUE_L],
        'offset': ['0x04', OFFSET],
    },
    'BRV': {
        'immediate': ['0x05', VALUE_L],
        'offset': ['0x05', OFFSET],
    },
    'BSR': {
        'immediate': ['0x06', VALUE_L, VALUE_H],
        'offset': ['0x06', LABEL_L, LABEL_H],
    },
    'RTS': {
        'implicit': ['0x07'],
    },

    'SET': {
        'direct': ['0x08', VALUE_L, VALUE_H],
    },
    'LD': {
        'register': ['0x10'],
        'indirect': ['0x20'],
    },
    'ST': {
        'register': ['0x18'],
        'indirect': ['0x28'],
    },
    'LDD': {
        'indirect': ['0x30'],
    },
    'STD': {
        'indirect': ['0x38'],
    },
    'POP': {
        'indirect': ['0x40'],
    },
    'STP': {
        'indirect': ['0x48'],
    },
    'ADD': {
        'register': ['0x50'],
    },
    'SUB': {
        'register': ['0x58'],
    },
    'MUL': {
        'register': ['0x60'],
    },
    'DIV': {
        'register': ['0x68'],
    },
    'AND': {
        'register': ['0x70'],
    },
    'OR': {
        'register': ['0x78'],
    },
    'XOR': {
        'register': ['0x80'],
    },
    'NOT': {
        'register': ['0x88'],
    },
    'SHL': {
        'register': ['0x90'],
    },
    'SHR': {
        'register': ['0x98'],
    },
    'ROL': {
        'register': ['0xA0'],
    },
    'ROR': {
        'register': ['0xA8'],
    },
    'POPD': {
        'indirect': ['0xE0'],
    },
    'CPR': {
        'register': ['0xE8'],
    },
    'INC': {
        'register': ['0xF0'],
    },
    'DEC': {
        'register': ['0xF8'],
    },
}

def parse_value(val: str) -> str:
    """ Try to parse a value, which can be an integer
        in base 2, 8, 10, 16 or a register.

        Returns: A string representation of the integer
        value in base 10, or empty string if the value
        cannot be converted to an integer.
        On Error: return original value.
    """
    # convert value to int from various bases
    if isinstance(val, str):
        # Get any possible prefix
        val = val.strip(' ')
        base = val[0:2]
        if val.isnumeric():
            return str(int(val, 10))
        if base == '0x':
            return str(int(val, 16))
        elif base == '0c':
            return str(int(val, 8))
        elif base == '0b':
            return str(int(val, 2))
        elif val.__contains__('R'):
            return ''
        else:
            return val


def parse_address_mode(fields: list) -> str:
    """ Example inputs:
                ['HALT']                    : implicit mode
                ['BRA', '0x1F']             : immediate mode
                [LD, R2]                    : register
                [STD, @R3]                  : indirect
                ['SET', 'R0,', '0xFFFE']    : direct
                ['BYTE' '0x1f']             : immediate
                ['WORD' '0x3BFC']           : immediate
                ['STRING' 'This is a test'] : immediate
                ['BRC' 'end']               : offset
    """
    numfields = len(fields)
    if numfields == 1:
        return 'implicit'
    elif numfields == 2:
        # if fields[0] in directives:
        #     return 'immediate'
        if fields[1].__contains__('@R'):
            return 'indirect'
        elif fields[1].__contains__('R') and fields[1].__contains__(','):
            return 'direct'
        elif fields[1].__contains__('R'):
            return 'register'
        elif parse_value(fields[1]).isnumeric():
            return 'immediate'
        elif parse_value(fields[1]).isalnum() or parse_value(fields[1]).isalpha():
            return 'offset'
    elif numfields == 3 and parse_value(fields[2]).isnumeric():
        return 'direct'
    elif numfields == 3 and '(' in fields[2] and ')' in fields[2]:
        return 'direct'
    else:
        raise ValueError('Expected numeric value got: {expected}'.format(expected=fields[2]))


def parse_register(field: str) -> str:
    """ Examples:
            R2
            @R3
            R2, 0x1F
    """
    register = field.upper()
    if register.__contains__('R'):
        pos = field.find('R')
        register = field[pos + 1:pos + 2]
    return register


def parse_opcode(line: str, symbols) -> tuple:
    """ Break the line into its constituent parts.
        Locate any labels and store their definition.
        Then locate the instruction's addressing mode,
        and stores it in mode. Calculate the instruction's
        actual opcode value.

        Returns: tuple(value, register, mode, objcode) where
        value is a string to be evaluated in the second pass.
        register is the register value or empty string if no
        register exists. "mode" is the addressing mode and
        objcode is a dict containing the base opcode value,
        and a list of functions needed to process the
        instruction
    """
    fields = line.split(None, 1)
    nofields = len(fields)
    if nofields > 1:
        extra = fields[1].split(',')
        if len(extra) > 1:
            fields[1] = extra[0]
            fields.extend(extra[1:])
    nofields = len(fields)
    mnemonic = fields[0]
    register = ''
    value = ''

    """ Examples:
            HALT
            BRA 0x1F
            SET R0, 0xFFFE
            LD R2
            STD @R3
    """
    # Get register and value
    if nofields > 1:
        register = parse_register(fields[1])

    if register == '' and nofields > 1:
        value = parse_value(fields[1])

    if nofields > 2:
        register = parse_register(fields[1])
        value = parse_value(fields[2])

    # Get address mode
    mode = parse_address_mode(fields)

    # Get all addressing modes for this mnemonic
    opcodemodes = opcode_table.get(mnemonic)
    if not opcodemodes:
        raise AssemblyError("Unknown opcode '{}'".format(mnemonic))

    # Get the address mode used in the instruction
    objcode = opcodemodes.get(mode)
    if not objcode:
        raise AssemblyError("Invalid addressing mode '{0}' for {1}".format(mode, mnemonic))

    return value, register, mode, mnemonic, list(objcode)


def parse_lines(lines, symbols):
    """ Determine mnemonic, register, value, and address mode"""
    for lineno, line in enumerate(lines, 1):
        # Handle labels
        label, *colon, statement = line.rpartition(":")
        try:
            # parse the line into ir (intermediate representation)
            data = lineno, label, parse_opcode(statement, symbols) if statement \
                else (None, None, None, None, None)
            yield data
        except AssemblyError as e:
            print("{0:4d} : Error : {1}".format(lineno, e))


def assemble(lines, lc=0):
    """Breaks the line into its constituent parts.
        it then locates the instruction's addressing mode,
        stores it in mode, and calculates the instruction's
        actual opcode value and length. """
    objcode = []
    symbols = {}

    # Pass 1 : Parse instructions and create intermediate code
    for lineno, label, (value, register, mode, mnemonic, icode) in parse_lines(lines, symbols):
        # Try to evaluate numeric labels and set the lc (location counter)
        if label:
            try:
                lc = int(eval(label, symbols))
            except (ValueError, NameError):
                symbols[label] = lc

        # Store the resulting object code for later
        # expansion and adjust the location counter lc.
        if icode:
            objcode.append((lineno, lc, value, register, mode, mnemonic, icode))
            lc += len(icode)
    return objcode


def replace_register_names(line):
    """ Replace register names with 'R' + register index.
        Example: ACC becomes R0. STATUS becomes R6."""
    line = line.replace('ACC', 'R0')
    line = line.replace('RETSTACK', 'R4')
    line = line.replace('COMP', 'R5')
    line = line.replace('STATUS', 'R6')
    line = line.replace('PC', 'R7')
    return line


def strip_lines(lines):
    """ Takes a sequence of lines and strips comments and blank lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        line = replace_register_names(line)
        yield line


if __name__ == '__main__':
    text = ';This is a comment\n' \
           'start:  SET  ACC, 0xFFDE  ; This is also a comment\n' \
           'end:    HALT  ; end of program'
    lines = text.split('\n')

    # Remove comments and blank lines
    lines = strip_lines(lines)
    print(assemble(lines))

If you run the above code you should get an output like the following:

 

sweet16-articles-code/articles/part-07/assembler_02.py
 [
  (2, 0, '65502', '0', 'direct', 'SET', ['0x08', , ]), 
  (3, 3, '', '', 'implicit', 'HALT', ['0x00']) 
 ]

As you can see we have the intermediate representation for two instructions. All the info we will need to assemble these instructions into actual machine code. Reading the tuples the first value (2) is the line number from the source. The second value (0) is the location counter at the start of the instruction. The next value (65502) is the value provided in the instruction, followed by the register index (0), the addressing mode (direct), and the mnemonic. The final fields holds the opcode information from the opcode_table.  Try this out on several instructions and see that the results make sense. Write some unit test for the various functions and for the assemble as it stands. 

This has been a long post. Next time we will tackle the second pass of our assembler. Until then, keep coding!

 

Sweet16 CPU Emulator

Part 4

I hope you played around with the emulator and tried to implement some of the CPU instructions yourself. As promised, in this post we will continue our work on the emulator. First, let me show you my implementation of decode_register_type_instr(). 

def decode_register_instr(self, opcode: int, reg_id: int):
    if opcode == 0x08:
        self.exec_set(opcode, reg_id)
    elif opcode == 0x10:
        self.exec_ld(opcode, reg_id)
    elif opcode == 0x18:
        self.exec_st(opcode, reg_id)
    elif opcode == 0x20:
        self.exec_ld_ea(opcode, reg_id)
    elif opcode == 0x28:
        self.exec_st_ea(opcode, reg_id)
    elif opcode == 0x30:
        self.exec_ldd_ea(opcode, reg_id)
    elif opcode == 0x38:
        self.exec_std_ea(opcode, reg_id)
    elif opcode == 0x40:
        self.exec_pop_ea(opcode, reg_id)
    elif opcode == 0x48:
        self.exec_stp_ea(opcode, reg_id)
    elif opcode == 0x50:
        self.exec_add(opcode, reg_id)
    elif opcode == 0x58:
        self.exec_sub(opcode, reg_id)
    elif opcode == 0x60:
        self.exec_mul(opcode, reg_id)
    elif opcode == 0x68:
        self.exec_div(opcode, reg_id)
    elif opcode == 0x70:
        self.exec_and(opcode, reg_id)
    elif opcode == 0x78:
        self.exec_or(opcode, reg_id)
    elif opcode == 0x80:
        self.exec_xor(opcode, reg_id)
    elif opcode == 0x88:
        self.exec_not(opcode, reg_id)
    elif opcode == 0x90:
        self.exec_shl(opcode, reg_id)
    elif opcode == 0x98:
        self.exec_shr(opcode, reg_id)
    elif opcode == 0xA0:
        self.exec_rol(opcode, reg_id)
    elif opcode == 0xA8:
        self.exec_ror(opcode, reg_id)
    elif opcode == 0xE0:
        self.exec_popd_ea(opcode, reg_id)
    elif opcode == 0xE8:
        self.exec_cpr(opcode, reg_id)
    elif opcode == 0xF0:
        self.exec_inc(opcode, reg_id)
    elif opcode == 0xF8:
        self.exec_dec(opcode, reg_id)
    else:
        print(f'Unknown OpCode code {opcode}')
        self.halt_flag = True

As you can see, this is nothing but a long set of “if”, “elif” statements checking for a particular opcode and then dispatching to the exec_xxx method. Finally, if we reach the “else” clause then we have an error. So we print a message and halt. Pretty simple right. The harder part is in the exec_xxx methods. But, only slightly harder. We spent some time building up a set of housekeeping methods that do all the heavy lifting for us. So we only need to call a subset of them passing the correct values to implement any of the exec_xxx methods. 

I’ll present the code for each of the register type instructions below with a short description of what it does. We’ve already seen the code for SET and LD so I’ll skip those.

ST is our next instruction that needs implemented. The description says:

 

The contents of ACC are stored in Rn and the branch conditions are set to reflect the new value of Rn. The carry flag is cleared, and the contents of ACC are left undisturbed.

So this just loads the value in ACC (R0) into Rn. Then it sets the flags to reflect the new value of Rn. Let’s see how we might do this:

def exec_st(self, opcode: int, reg_id: int):
    val = self.get_register(self.ACC)
    self.set_register(reg_id, val)
    self.set_value_flags(val)
    self.clear_flags('C')

Moving on, next we need to handle LD @Rn. Any instruction that uses the @ symbol is using indirect addressing. This means the register Rn is storing an address that we will either write new content into or read the content from. For example, let’s say that R3 contains the value 0x0A (10 in decimal) and we want to get the value located at address 0x0A. We could use LD @R3. R3 contains the memory address we want to get the value from. This shouldn’t be too difficult either:

def exec_ld_ea(self, opcode: int, reg_id: int):
    val = self.peek_byte(self.get_register(reg_id))
    self.inc_register(reg_id)
    self.set_register(self.ACC, val)
    self.set_value_flags(val)
    self.clear_flags('C')

I’ve appended _ea to all instructions that use indirect or effective addressing modes. 

Our housekeeping methods make implementation quick and simple. Now on to ST @Rn. 

As you might have guessed, ST @Rn simply stores the low byte value in the ACC into the address pointed to by Rn and increments Rn. Here’s the official description.

The low order byte of ACC is stored into memory location whose address is resides in Rn. Branch conditions are set to reflect the two byte contents of ACC (R0). The carry flag is cleared and Rn is incremented by 1.

Here’s the code:

def exec_st_ea(self, opcode: int, reg_id: int):
    val = self.get_register_lsb(self.ACC)
    self.poke_byte(self.get_register(reg_id), val)
    self.inc_register(reg_id)
    self.set_value_flags(val)
    self.clear_flags('C')

Most of the register instructions come in two forms, byte and word and are distinguished by the trailing “D” for the word sized version. For example LD is load a byte and LDD is load a word. Respectively, ST is store a byte and STD is store a word. 

Our nest set of instructions are just word versions of their byte sized versions we’ve already dealt with. Their main difference is the need to fetch two bytes and combine them together to form the correct word value. So we’ll just present the code and leave it to you to compare the two versions.

def exec_ldd_ea(self, opcode: int, reg_id: int):
    val = self.peek_word(self.get_register(reg_id))
    self.inc_register(reg_id)
    self.inc_register(reg_id)
    self.set_register(self.ACC, val)
    self.set_value_flags(val)
    self.clear_flags('C')

def exec_std_ea(self, opcode: int, reg_id: int):
    val = self.get_register(self.ACC)
    self.poke_word(self.get_register(reg_id), val)
    self.inc_register(reg_id)
    self.inc_register(reg_id)
    self.set_value_flags(val)
    self.clear_flags('C')

Our next instruction is the POP @Rn instruction. It’s description is:

Rn is decremented by 1. Then the low order byte of ACC (R0) is loaded from the memory locations whose address resides in (the decremented) Rn. The high order byte of ACC is cleared and the status bits are set to reflect the final value of ACC whose content will always be positive. The carry flag is cleared. Because Rn is decremented before prior to loading the ACC, single byte stacks can be implemented using ST @Rn and POP @Rn.

And here’s the code to implement it:

def exec_pop_ea(self, opcode: int, reg_id: int):
    self.dec_register(reg_id)
    val = self.peek_byte(self.get_register(reg_id))
    self.set_register(self.ACC, val)
    self.set_value_flags(val)
    self.clear_flags('C')

STP is a bit unique in that it has a special purpose. It is used for moving memory. It’s a store instruction with indirect addressing followed by a decrement of the pointer value. Here’s the description:

Rn is decremented by 1 and the low order byte of ACC (R0) is stored in to the memory location that resides in Rn. Status bits are set to reflect the final 16 bit value of ACC. The contents of ACC are left undisturbed. STP @Rn and POP @Rn can be used together to move blocks of memory beginning with the greatest address and working downward. 

And the code:

def exec_stp_ea(self, opcode: int, reg_id: int):
    lo = self.get_register_lsb(self.ACC)
    self.dec_register(reg_id)
    self.poke_byte(self.get_register(reg_id), lo)
    self.set_value_flags(lo)
    self.clear_flags('C')

The rest of the register type instructions are pretty easy to figure out. So in the interest of saving time and space I’ll just show the code below. It’s important, however, that you read them and see how they are implemented. Truly, if you read just the first couple you’ll understand them all. So here is the code for the remaining register type instructions:

def exec_add(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    isum = v1 + v2
    self.set_value_flags(isum)
    self.set_overflow_flags(v1, v2, isum)
    self.set_carry(v1, v2, isum)
    val = isum & 0xffff
    self.set_register(self.ACC, val)

def exec_sub(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    isum = v1 - v2
    self.set_value_flags(isum)
    self.set_overflow_flags(v1, v2, isum)
    self.set_borrow(v1, v2)
    val = isum & 0xffff
    self.set_register(self.ACC, val)

def exec_mul(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    isum = v1 * v2
    self.set_value_flags(isum)
    self.set_overflow_flags(v1, v2, isum)
    val = isum & 0xffff
    self.set_register(self.ACC, val)

def exec_div(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    isum = v1 // v2
    self.set_value_flags(isum)
    self.set_overflow_flags(v1, v2, isum)
    val = isum & 0xffff
    self.set_register(self.ACC, val)

def exec_and(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    val = (v1 & v2) & 0xffff
    self.set_register(self.ACC, val)
    self.set_value_flags(val)

def exec_or(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    val = (v1 | v2) & 0xffff
    self.set_register(self.ACC, val)
    self.set_value_flags(val)

def exec_xor(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    val = (v1 ^ v2) & 0xffff
    self.set_register(self.ACC, val)
    self.set_value_flags(val)

def exec_not(self, opcode: int, reg_id: int):
    val = ~self.get_register(reg_id)
    self.set_register(reg_id, val)
    self.set_value_flags(val)
    self.clear_flags('C')

def exec_shl(self, opcode: int, reg_id: int):
    val = self.get_register(reg_id) << 1
    self.set_value_flags(val)
    if ((val & 0x10000) >> 16) == 1:
        self.set_flags('C')
    else:
        self.clear_flags('C')
    self.set_register(reg_id, val)

def exec_shr(self, opcode: int, reg_id: int):
    v1 = self.get_register(reg_id)
    val = v1 >> 1
    self.set_value_flags(val)
    if (v1 & 0x1) == 1:
        self.set_flags('C')
    else:
        self.clear_flags('C')
    self.set_register(reg_id, val)

def exec_rol(self, opcode: int, reg_id: int):
    val = self.get_register(reg_id) << 1
    if ((val & 0x10000) >> 16) == 1:
        val |= 0x1
    self.set_value_flags(val)
    self.clear_flags('C')
    self.set_register(reg_id, val)

def exec_ror(self, opcode: int, reg_id: int):
    v1 = self.get_register(reg_id)
    val = v1 >> 1
    if (val & 0x1) == 1:
        val |= 0b1000_0000_0000_0000
    val = val & 0xffff
    self.set_value_flags(val)
    self.clear_flags('C')
    self.set_register(reg_id, val)

def exec_popd_ea(self, opcode: int, reg_id: int):
    self.dec_register(reg_id)
    hi = self.peek_byte(self.get_register(reg_id))
    self.dec_register(reg_id)
    lo = self.peek_byte(self.get_register(reg_id))
    val = (hi << 8) + lo
    self.set_register(self.ACC, val)
    self.set_value_flags(val)
    self.clear_flags('C')

def exec_cpr(self, opcode: int, reg_id: int):
    v1 = self.get_register(self.ACC)
    v2 = self.get_register(reg_id)
    val = (v1 & 0xFFFF) - (v2 & 0xFFFF)
    self.set_value_flags(val)
    self.set_borrow(v1, v2)
    val = val & 0xFFFF
    self.set_register(self.COMP, val)

def exec_inc(self, opcode: int, reg_id: int):
    self.inc_register(reg_id)
    val = self.get_register(reg_id)
    self.set_value_flags(val)
    self.clear_flags('C')

def exec_dec(self, opcode: int, reg_id: int):
    self.dec_register(reg_id)
    val = self.get_register(reg_id)
    self.set_value_flags(val)
    self.clear_flags('C')

Ok, include this code into your skeleton and try some of the instructions out. Remember to write tests for each of them. Both succeeding and failing tests. 

I’ll end this post here today. Next time we’ll implement the remaining 6 or 7 instruction (which are all non register types) and gain a working emulator. 

For completeness, here’s the complete sweet16gp.py code for this post:

"""
 Sweet16GP an implementation of an 8/16 bit CPU Emulator
 based on Steve Wozniak's SWEET16 virtual machine.
"""

from time import sleep, time
import status_bits


class Sweet16GP:
    MEM = []
    REGFILE = [0, 0, 0, 0, 0, 0, 0, 0]
    ACC = 0
    RETSTACK = 4
    COMP = 5
    STATUS = 6
    PC = 7
    cur_instr = 0
    halt_flag = False

    def __init__(self):
        self.cold_boot()

    # booting routines
    def cold_boot(self, mem_size=256):
        self.init_ram(mem_size)
        self.warm_boot()

    def warm_boot(self):
        self.init_regfile()

    # Memory handling routines
    def init_ram(self, mem_size=265):
        self.MEM = []
        for addr in range(mem_size):
            self.MEM.append(0x00)

    def poke_byte(self, addr: int, val: int):
        self.MEM[addr] = val & 0xff

    def peek_byte(self, addr: int) -> int:
        return self.MEM[addr] & 0xff

    def poke_word(self, addr: int, val: int):
        self.poke_byte(addr, (val & 0xff))
        self.poke_byte(addr+1, (val & 0xff00) >> 8)

    def peek_word(self, addr: int) -> int:
        lo = self.peek_byte(addr)
        hi = self.peek_byte(addr+1)
        val = ((hi << 8) + lo) & 0xffff
        return (hi << 8) + lo

    # Register handling routines
    def init_regfile(self):
        for id in range(len(self.REGFILE)):
            self.REGFILE[id] = 0x00

    def get_register(self, id: int) -> int:
        return self.REGFILE[id] & 0xffff

    def get_register_lsb(self, id: int) -> int:
        return self.REGFILE[id] & 0xff

    def get_register_msb(self, id: int) -> int:
        return (self.REGFILE[id] & 0xff00) >> 8

    def set_register(self, id: int, val: int):
        self.REGFILE[id] = val & 0xffff

    def set_register_lsb(self, id: int, val: int):
        hi = self.REGFILE[id] & 0xff00
        self.REGFILE[id] = (hi | (val & 0xff)) & 0xffff

    def set_register_msb(self, id: int, val: int):
        lo = self.REGFILE[id] & 0xff
        self.REGFILE[id] = ((val & 0xff) << 8) | lo

    def inc_register(self, _id: int):
        val = self.get_register(_id)
        val = (val + 1) & 0xffff
        self.set_register(_id, val)

    def dec_register(self, id: int):
        val = self.get_register(id)
        val = (val - 1) & 0xffff
        self.set_register(id, val)

    # Memory handling routines
    def init_ram(self, mem_size=265):
        self.MEM = []
        for addr in range(mem_size):
            self.MEM.append(0x00)

    def poke_byte(self, addr: int, val: int):
        self.MEM[addr] = val & 0xff

    def peek_byte(self, addr: int) -> int:
        return self.MEM[addr] & 0xff

    def poke_word(self, addr: int, val: int):
        self.poke_byte(addr, (val & 0xff))
        self.poke_byte(addr + 1, (val & 0xff00) >> 8)

    def peek_word(self, addr: int) -> int:
        lo = self.peek_byte(addr)
        hi = self.peek_byte(addr + 1)
        val = ((hi << 8) + lo) & 0xffff
        return (hi << 8) + lo

    # Bit twiddling routines
    def bit_set(self, val: int, bit: int) -> int:
        return val | (1 << bit)

    def bit_clear(self, val: int, bit: int) -> int:
        return val & ~(1 << bit)

    def bit_toggle(self, val: int, bit: int) -> int:
        return val ^ (1 << bit)

    def bit_test(self, val: int, bit: int) -> int:
        return (val & (1 << bit)) >> bit

    # STATUS Flags routines
    def set_flags(self, flags: str):
        flags = flags.upper()
        if flags.__contains__('C'):
            self.REGFILE[self.STATUS] |= (1 << status_bits.C)
        if flags.__contains__('Z'):
            self.REGFILE[self.STATUS] |= (1 << status_bits.Z)
        if flags.__contains__('V'):
            self.REGFILE[self.STATUS] |= (1 << status_bits.V)
        if flags.__contains__('N'):
            self.REGFILE[self.STATUS] |= (1 << status_bits.N)

    def clear_flags(self, flags: str):
        flags = flags.upper()
        if flags.__contains__('C'):
            self.REGFILE[self.STATUS] &= ~(1 << status_bits.C)
        if flags.__contains__('Z'):
            self.REGFILE[self.STATUS] &= ~(1 << status_bits.Z)
        if flags.__contains__('V'):
            self.REGFILE[self.STATUS] &= ~(1 << status_bits.V)
        if flags.__contains__('N'):
            self.REGFILE[self.STATUS] &= ~(1 << status_bits.N)

    def test_flag(self, flag: str) -> int:
        flags = flag.upper()
        if flags.__contains__('C'):
            return (self.REGFILE[self.STATUS] & (1 << status_bits.C)) >> status_bits.C
        if flags.__contains__('Z'):
            return (self.REGFILE[self.STATUS] & (1 << status_bits.Z)) >> status_bits.Z
        if flags.__contains__('V'):
            return (self.REGFILE[self.STATUS] & (1 << status_bits.V)) >> status_bits.V
        if flags.__contains__('N'):
            return (self.REGFILE[self.STATUS] & (1 << status_bits.N)) >> status_bits.N

    def set_value_flags(self, val: int):
        # Note we can't set the overflow
        # flag here as we need both input
        # and result values to computer
        # overflow.
        if val == 0:
            self.set_flags('Z')
        else:
            self.clear_flags('Z')

        if val > 0b0111_1111_1111_1111:
            self.set_flags('N')
        else:
            self.clear_flags('N')

        if (val & 0x10000) >> 17:
            self.set_flags('C')
        else:
            self.clear_flags('C')

    def set_overflow_flags(self, v1: int, v2: int, result: int):
        if ((v1 & (1 << 15)) >> 15 & (v2 & (1 << 15)) >> 15) == 1:
            # Both values have the sign bit set
            # So the result should have the sign
            # bit unset. If not, we have overflow
            if not (result & (1 << 15)) > 0:
                self.set_flags('V')
            else:
                self.clear_flags('V')
        elif ((v1 & (1 << 15)) >> 15 | (v2 & (1 << 15)) >> 15) == 0:
            # Both values have the sign bits unset
            # So, the result should have the sign
            # bit unset.
            if ((result & (1 << 15)) >> 15) == 1:
                self.set_flags('V')
            else:
                self.clear_flags('V')

    def set_carry(self, v1, v2, result):
        # value must be unmasked or it won't include the 17th bit.
        if ((result & (1 << 16)) >> 16) == 1:
            self.set_flags('C')
        else:
            self.clear_flags('C')

    def set_borrow(self, v1, v2):
        if v1 < v2:
            self.set_flags('C')
        else:
            self.clear_flags('C')

    # Instruction Decoding
    def decode_inst(self, instr: int):
        if instr > 7:
            opcode = instr & 0b11111000
            reg_id = instr & 0b00000111
            self.decode_register_instr(opcode, reg_id)
        else:
            self.decode_non_register_instr(instr)

    def decode_register_instr(self, opcode: int, reg_id: int):
        if opcode == 0x08:
            self.exec_set(opcode, reg_id)
        elif opcode == 0x10:
            self.exec_ld(opcode, reg_id)
        elif opcode == 0x18:
            self.exec_st(opcode, reg_id)
        elif opcode == 0x20:
            self.exec_ld_ea(opcode, reg_id)
        elif opcode == 0x28:
            self.exec_st_ea(opcode, reg_id)
        elif opcode == 0x30:
            self.exec_ldd_ea(opcode, reg_id)
        elif opcode == 0x38:
            self.exec_std_ea(opcode, reg_id)
        elif opcode == 0x40:
            self.exec_pop_ea(opcode, reg_id)
        elif opcode == 0x48:
            self.exec_stp_ea(opcode, reg_id)
        elif opcode == 0x50:
            self.exec_add(opcode, reg_id)
        elif opcode == 0x58:
            self.exec_sub(opcode, reg_id)
        elif opcode == 0x60:
            self.exec_mul(opcode, reg_id)
        elif opcode == 0x68:
            self.exec_div(opcode, reg_id)
        elif opcode == 0x70:
            self.exec_and(opcode, reg_id)
        elif opcode == 0x78:
            self.exec_or(opcode, reg_id)
        elif opcode == 0x80:
            self.exec_xor(opcode, reg_id)
        elif opcode == 0x88:
            self.exec_not(opcode, reg_id)
        elif opcode == 0x90:
            self.exec_shl(opcode, reg_id)
        elif opcode == 0x98:
            self.exec_shr(opcode, reg_id)
        elif opcode == 0xA0:
            self.exec_rol(opcode, reg_id)
        elif opcode == 0xA8:
            self.exec_ror(opcode, reg_id)
        elif opcode == 0xE0:
            self.exec_popd_ea(opcode, reg_id)
        elif opcode == 0xE8:
            self.exec_cpr(opcode, reg_id)
        elif opcode == 0xF0:
            self.exec_inc(opcode, reg_id)
        elif opcode == 0xF8:
            self.exec_dec(opcode, reg_id)
        else:
            print(f'Unknown OpCode code {opcode}')
            self.halt_flag = True

    def exec_set(self, opcode: int, reg_id: int):
        self.inc_register(self.PC)
        lo = self.peek_byte(self.get_register(self.PC))
        self.inc_register(self.PC)
        hi = self.peek_byte(self.get_register(self.PC))
        val = (hi << 8) + lo
        self.set_register(reg_id, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_ld(self, opcode: int, reg_id: int):
        val = self.get_register(reg_id)
        self.set_register(self.ACC, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_st(self, opcode: int, reg_id: int):
        val = self.get_register(self.ACC)
        self.set_register(reg_id, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_ld_ea(self, opcode: int, reg_id: int):
        val = self.peek_byte(self.get_register(reg_id))
        self.inc_register(reg_id)
        self.set_register(self.ACC, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_st_ea(self, opcode: int, reg_id: int):
        val = self.get_register_lsb(self.ACC)
        self.poke_byte(self.get_register(reg_id), val)
        self.inc_register(reg_id)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_ldd_ea(self, opcode: int, reg_id: int):
        val = self.peek_word(self.get_register(reg_id))
        self.inc_register(reg_id)
        self.inc_register(reg_id)
        self.set_register(self.ACC, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_std_ea(self, opcode: int, reg_id: int):
        val = self.get_register(self.ACC)
        self.poke_word(self.get_register(reg_id), val)
        self.inc_register(reg_id)
        self.inc_register(reg_id)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_pop_ea(self, opcode: int, reg_id: int):
        self.dec_register(reg_id)
        val = self.peek_byte(self.get_register(reg_id))
        self.set_register(self.ACC, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_stp_ea(self, opcode: int, reg_id: int):
        lo = self.get_register_lsb(self.ACC)
        self.dec_register(reg_id)
        self.poke_byte(self.get_register(reg_id), lo)
        self.set_value_flags(lo)
        self.clear_flags('C')

    def exec_add(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        isum = v1 + v2
        self.set_value_flags(isum)
        self.set_overflow_flags(v1, v2, isum)
        self.set_carry(v1, v2, isum)
        val = isum & 0xffff
        self.set_register(self.ACC, val)

    def exec_sub(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        isum = v1 - v2
        self.set_value_flags(isum)
        self.set_overflow_flags(v1, v2, isum)
        self.set_borrow(v1, v2)
        val = isum & 0xffff
        self.set_register(self.ACC, val)

    def exec_mul(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        isum = v1 * v2
        self.set_value_flags(isum)
        self.set_overflow_flags(v1, v2, isum)
        val = isum & 0xffff
        self.set_register(self.ACC, val)

    def exec_div(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        isum = v1 // v2
        self.set_value_flags(isum)
        self.set_overflow_flags(v1, v2, isum)
        val = isum & 0xffff
        self.set_register(self.ACC, val)

    def exec_and(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        val = (v1 & v2) & 0xffff
        self.set_register(self.ACC, val)
        self.set_value_flags(val)

    def exec_or(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        val = (v1 | v2) & 0xffff
        self.set_register(self.ACC, val)
        self.set_value_flags(val)

    def exec_xor(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        val = (v1 ^ v2) & 0xffff
        self.set_register(self.ACC, val)
        self.set_value_flags(val)

    def exec_not(self, opcode: int, reg_id: int):
        val = ~self.get_register(reg_id)
        self.set_register(reg_id, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_shl(self, opcode: int, reg_id: int):
        val = self.get_register(reg_id) << 1
        self.set_value_flags(val)
        if ((val & 0x10000) >> 16) == 1:
            self.set_flags('C')
        else:
            self.clear_flags('C')
        self.set_register(reg_id, val)

    def exec_shr(self, opcode: int, reg_id: int):
        v1 = self.get_register(reg_id)
        val = v1 >> 1
        self.set_value_flags(val)
        if (v1 & 0x1) == 1:
            self.set_flags('C')
        else:
            self.clear_flags('C')
        self.set_register(reg_id, val)

    def exec_rol(self, opcode: int, reg_id: int):
        val = self.get_register(reg_id) << 1
        if ((val & 0x10000) >> 16) == 1:
            val |= 0x1
        self.set_value_flags(val)
        self.clear_flags('C')
        self.set_register(reg_id, val)

    def exec_ror(self, opcode: int, reg_id: int):
        v1 = self.get_register(reg_id)
        val = v1 >> 1
        if (val & 0x1) == 1:
            val |= 0b1000_0000_0000_0000
        val = val & 0xffff
        self.set_value_flags(val)
        self.clear_flags('C')
        self.set_register(reg_id, val)

    def exec_popd_ea(self, opcode: int, reg_id: int):
        self.dec_register(reg_id)
        hi = self.peek_byte(self.get_register(reg_id))
        self.dec_register(reg_id)
        lo = self.peek_byte(self.get_register(reg_id))
        val = (hi << 8) + lo
        self.set_register(self.ACC, val)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_cpr(self, opcode: int, reg_id: int):
        v1 = self.get_register(self.ACC)
        v2 = self.get_register(reg_id)
        val = (v1 & 0xFFFF) - (v2 & 0xFFFF)
        self.set_value_flags(val)
        self.set_borrow(v1, v2)
        val = val & 0xFFFF
        self.set_register(self.COMP, val)

    def exec_inc(self, opcode: int, reg_id: int):
        self.inc_register(reg_id)
        val = self.get_register(reg_id)
        self.set_value_flags(val)
        self.clear_flags('C')

    def exec_dec(self, opcode: int, reg_id: int):
        self.dec_register(reg_id)
        val = self.get_register(reg_id)
        self.set_value_flags(val)
        self.clear_flags('C')

    def decode_non_register_instr(self, opcode: int):
        if opcode == 0x00:
            self.exec_halt()

    def exec_halt(self):
        self.halt_flag = True

    # Misc Methods
    def dump(self):
        print('\n\nSweet16-GP CPU')
        print('---------------------------------------------------------')
        self.dump_status()
        self.dump_registers()
        self.dump_memory()

    def dump_status(self):
        result = ''
        status = self.REGFILE[self.STATUS]
        # print('Status 0b{:016b}'.format(status))
        if self.bit_test(status, status_bits.C):
            result += 'C, '
        if self.bit_test(status, status_bits.Z):
            result += 'Z, '
        if self.bit_test(status, status_bits.V):
            result += 'V, '
        if self.bit_test(status, status_bits.N):
            result += 'N, '
        result = result[:-2]
        print('Status Flags: 0b{:04b}'.format(self.REGFILE[self.STATUS]))
        print('---------------------------------------------------------')
        print(str(result) + ' \n')

    def dump_registers(self):
        idx = 0
        col = 0
        max_cols = 4
        print('Registers')
        print('---------------------------------------------------------')
        for r in self.REGFILE:
            if idx == 0:
                print('ACC (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            elif idx == 4:
                print('RETPTR (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            elif idx == 5:
                print('COMPARE (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            elif idx == 6:
                print('STATUS (R{:02}):0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            elif idx == 7:
                print('PC (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            else:
                # General purpose register
                print('R{:02}: 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
            idx += 1
            col += 1
            if col >= max_cols:
                col = 0
                print()
        print()

    def dump_memory(self, max_bytes=256):
        col = 0
        row = 0

        max_cols = 16
        max_rows = int(max_bytes / max_cols)
        idx = 0
        print('\nAddr   Data                                  MEMORY DUMP')
        print('------------------------------------------------------------------------------------------------------')
        for b in self.MEM:
            if col == 0:
                print('0x{:04x}'.format(idx), end=':  ')
                print('0x{:02x}'.format(b), end=', ')
            else:
                print('0x{:02x}'.format(b), end=', ')
            col += 1
            if col >= max_cols:
                row += 1
                col = 0
                print()
            if row >= max_rows:
                print()
                break;
            idx += 1
        print()

    # CPU control routines
    def run(self):
        while not self.halt_flag and self.get_register(self.PC) < len(self.MEM):
            self.cur_instr = self.peek_byte(self.get_register(self.PC))
            print(f'PC: {hex(self.get_register(self.PC))}, cur_instr: {hex(self.cur_instr)}')
            self.decode_inst(self.cur_instr)
            sleep(1)
            self.inc_register(self.PC)
            

def main():
    cpu = Sweet16GP()
    cpu.set_register(0, 0xffae)
    cpu.poke_byte(0, 0x08)
    cpu.poke_byte(1, 0x05)
    cpu.poke_byte(2, 0xff)
    cpu.run()
    cpu.dump()

if __name__ == '__main__':
    # This method is for simple testing
    # and demonstration purposes.
    main()
Newsletter Powered By : XYZScripts.com