Header Image - Randall Morgan

Tag Archives

6 Articles

Sweet16-GP Assembler – Part 2

by SysOps 0 Comments
This entry is part 7 of 8 in the series Sweet16-GP CPU: A Complete Development Cycle

Last time we left our assembler with only two functions. The strip_lines() function which removes comments and blank lines and the rename_registers() function which replaces the register’s friendly names with the cononical names i.e. Rx where x is a number between 0 and 7 (the register’s index value into the register file).

With these to pre-processing functions out of the way we are ready to start processing our assembly code.  We will use a two-pass method of assembly. The first pass will break each line into its constituent parts and then locate the instruction’s addressing mode, stores it in a field called mode, and calculate the instruction’s actual opcode value and length. In addition it will store a labels in a symbol table to record their value for later use in the second pass or the assembler.

Our main assembler method will take our clean lines, one at a time, and parse them as described above. We will need a table of instructions that includes details about each instruct including it’s addressing mode(s), opcode(s) and length. You may be wondering why the “(s)” above. This is because some of our instructions have multiple addressing modes and therefore multipple opcodes and perhaps even multiple instruction lengths. We will use our table to disambiguate the instruction format into the proper values. 

We will implement our opcode_table as a dict. Each entry will be  indexed (key) by the instruction mnemonic.  The value for each key in the table will be a dict indexed by the addressing mode. Each addressing mode will then have a list of values containing the instruction’s opcode. If the opcode requires a one or two by value, they will be provided as VALUE_L and VALUE_H in the list. This description sounds much more complex than it really is. Refere to the code for our table below and re-read the above paragraph. It should all become clear.

opcode_table = {
    'HALT': {
        'implicit': ['0x00'],
    },
    'BRA': {
        'immediate': ['0x01', VALUE_L],
        'offset': ['0x01', OFFSET],
    },
    'BRC': {
        'immediate': ['0x02', VALUE_L],
        'offset': ['0x02', OFFSET],
    },
    'BRZ': {
        'immediate': ['0x03', VALUE_L],
        'offset': ['0x03', OFFSET],
    },
    'BRN': {
        'immediate': ['0x04', VALUE_L],
        'offset': ['0x04', OFFSET],
    },
    'BRV': {
        'immediate': ['0x05', VALUE_L],
        'offset': ['0x05', OFFSET],
    },
    'BSR': {
        'immediate': ['0x06', VALUE_L, VALUE_H],
        'offset': ['0x06', LABEL_L, LABEL_H],
    },
    'RTS': {
        'implicit': ['0x07'],
    },

    'SET': {
        'direct': ['0x08', VALUE_L, VALUE_H],
    },
    'LD': {
        'register': ['0x10'],
        'indirect': ['0x20'],
    },
    'ST': {
        'register': ['0x18'],
        'indirect': ['0x28'],
    },
    'LDD': {
        'indirect': ['0x30'],
    },
    'STD': {
        'indirect': ['0x38'],
    },
    'POP': {
        'indirect': ['0x40'],
    },
    'STP': {
        'indirect': ['0x48'],
    },
    'ADD': {
        'register': ['0x50'],
    },
    'SUB': {
        'register': ['0x58'],
    },
    'MUL': {
        'register': ['0x60'],
    },
    'DIV': {
        'register': ['0x68'],
    },
    'AND': {
        'register': ['0x70'],
    },
    'OR': {
        'register': ['0x78'],
    },
    'XOR': {
        'register': ['0x80'],
    },
    'NOT': {
        'register': ['0x88'],
    },
    'SHL': {
        'register': ['0x90'],
    },
    'SHR': {
        'register': ['0x98'],
    },
    'ROL': {
        'register': ['0xA0'],
    },
    'ROR': {
        'register': ['0xA8'],
    },
    'POPD': {
        'indirect': ['0xE0'],
    },
    'CPR': {
        'register': ['0xE8'],
    },
    'INC': {
        'register': ['0xF0'],
    },
    'DEC': {
        'register': ['0xF8'],
    },
}

As you can see, our instructions can fall into one of four addressing modes. These include implicit (when the instruction itself indicates the addressing mode),  immediate (when the value needed by the instruction directly follows it), indirect (when the register specified contains the value needed to complete the instructions, and register (when the instruction operates directly on the register specified). To tell the truth, these could be further broken down but that would only complicate things.

OK, now we have our opcode_table we are going to start writing our parser. Our parser is a little bit unorthodox. Since our input language is so constrained we can forego all the formal theory and much of the discipline or parser development. 

I do believe however, that any developer worth their weight should write at least a simple parser at some point. Understanding even a very simple parser expands your understanding of how your tools work whether you’re a web developer, AI master, ETL programmer, or Embedded Systems developer. If you call yourself a Software Engineer or Programmer and you have written at least a simple parser. compiler, or interpreter, then you owe it to yourself to do so! To get you started I recomend DR. Jack Crenshaw’s “Let’s Build A Compiler” and Ruslans’s “Let’s Build A Simple Interpreter” blog serries. If you’re a seasoned developer checkout the book “Language Implementation Patterns”. For those mathematically oriented, checkout the Dragon Book

So what I am going to do is simply plow forward. We will tackle one issue at a time and add functions to handle each issue as it arises. Note that this technique wont work if you’re trying to write a high level language compiler or interpreter. There you will need all the formal methods and a have good design before writing code. What we are going for here is a quick and simple boot-strap assembler. We can write better tools later. For now, we just need something to get us up and running.

OK, the first thing we are going to need is some set of methods to break up the line of source code we feed to our assembler into it’s various fields. We will create an entry point function called assemble which will do our heavy lifting by calling other functions.

We will need to produce some data about the current source line. This will include any label, register, addressing mode, and value. Additionally, we may also want to keep track of the original instruction mnemonic and any intermediate code we produce. So we will need a supporting function to handle this. We will also need some storage for any lables or symbols we encounter during parsing.

The code below is our entry point. It will be responsible for handling our two apasses over our code and  will call the functions that do the heavy lifting.

def assemble(lines, lc=0):
    """Breaks the line into its constituent parts.
        it then locates the instruction's addressing mode,
        stores it in mode, and calculates the instruction's
        actual opcode value and length. """
    objcode = []
    symbols = {}

    # Pass 1 : Parse instructions and create intermediate code
    for lineno, label, (value, register, mode, mnemonic, icode) in parse_lines(lines, symbols):
        # Try to evaluate numeric labels and set the lc (location counter)
        if label:
            try:
                lc = int(eval(label, symbols))
            except (ValueError, NameError):
                symbols[label] = lc

        # Store the resulting object code for later expansion
        if icode:
            objcode.append((lineno, lc, value, register, mode, mnemonic, icode))
            lc += len(icode

Note the call to parse_lines() we will write this function next. But, let’s step back and think about what we need it to do…

Given a line of code like:

start:    SET   R1, 0xFA

We can see we need to break up this line into it’s smaller parts. Luckily, this is pretty easy to do. Also, since each line will follow a similar pattern for the fields, we can easily deduce what each field contains. 

# Exception used for errors
class AssemblyError(Exception):
    pass

def parse_lines(lines, symbols):
    """ Determine line number, mnemonic, register, value, and address mode"""
    for lineno, line in enumerate(lines, 1):
        # Handle labels
        label, *colon, statement = line.rpartition(":")
        try:
            # parse the line into ir (intermediate representation)
            data = lineno, label, parse_opcode(statement, symbols) if statement else (None, None, None, None, None)
            yield data
        except AssemblyError as e:
            print("{0:4d} : Error : {1}".format(lineno, e))

Refering to the code above, we use the enumerate function to keep track of our line numbers. One drawback to doing this approach is that our comment and blank lines wont be counted. Something we could fix by tracking line numbers in the strip_lines() function. But for now, we will do it this way.

Enumerate returns to us an integer value for the line number (lineno) and the line of source code. Next, we use rpartition(“:”) on on the source line to break it into three parts. These parts are return in a tuple containing the part before the seperator (colon in our case, which ends a label declaration), the seperator, and the part after the seperator. If any part is not found (for example, if there is no colon in the source line) then that part returns an empty string value. So if our line does not contains a colon, we will get an empty string for the label and for the seperator followed by the initial line of code which we save in the statement variable.

Once we have gotten any label, we need to take our statement and further parse it into its various parts. If it turns out that our statement is empty, we need to return a default tuple of ‘None’ values otherwise, we make a call to parse_opcode() passing in our statement and our symbol table so that any symbols found in our source line can be handled. If this all goes horribly wrong, we throw an exception and print the error message. We defined a simple exception class above for this purpose.

Our next task is to implement a function to parse the line into opcodes. We called this function pasre_opcode() above. The implementation is shown below:

def parse_opcode(line: str, symbols) -> tuple:
    """ Break the line into its constituent parts.
        Locate any labels and store their definition.
        Then locate the instruction's addressing mode,
        and stores it in mode. Calculate the instruction's
        actual opcode value.

        Returns: tuple(value, register, mode, objcode) where
        value is a string to be evaluated in the second pass.
        register is the register value or empty string if no
        register exists. "mode" is the addressing mode and
        objcode is a dict containing the base opcode value,
        and a list of functions needed to process the
        instruction
    """
    fields = line.split(None, 1)
    nofields = len(fields)
    if nofields > 1:
        extra = fields[1].split(',')
        if len(extra) > 1:
            fields[1] = extra[0]
            fields.extend(extra[1:])
    nofields = len(fields)
    mnemonic = fields[0]
    register = ''
    value = ''

    """ Examples:
            HALT
            BRA 0x1F
            SET R0, 0xFFFE
            LD R2
            STD @R3  
    """
    # Get register and value
    if nofields > 1:
        register = parse_register(fields[1])

    if register == '' and nofields > 1:
        value = parse_value(fields[1])

    if nofields > 2:
        register = parse_register(fields[1])
        value = parse_value(fields[2])

    # Get address mode
    mode = parse_address_mode(fields)

    # Get all addressing modes for this mnemonic
    opcodemodes = opcode_table.get(mnemonic)
    if not opcodemodes:
        raise AssemblyError("Unknown opcode '{}'".format(mnemonic))

    # Get the address mode used in the instruction
    objcode = opcodemodes.get(mode)
    if not objcode:
        raise AssemblyError("Invalid addressing mode '{0}' for {1}".format(mode, mnemonic))

    return value, register, mode, mnemonic, list(objcode)

There is a lot going on here so let’s unpack it. First, we use the split() method to split the statement into two portions. The first parameter to split() gives the character to split on and the second parameter is used to limit the number of splits. What we are doing here is seperating the instruction mnemonic from the rest of the statement. Since some instructions (i.e.: HALT) only have the mnemonic we need to check the number of fields we get back. If it’s only one, then we have an instruction like HALT or RTS. 

However, if we have more than one field returned from the split we need to further decompose the line. We now have a mnemonic in fields[0] and the remainder of the line in fields[1]. We now know we may have something like Rx, @Rx, or Rx, 0xFA following the instruction mnemonic. So the next move is to tray and seperate the remaining portion of the statement on the comma seperator. If the comma is found, one of two cases will be returned i.e.: (register, value) or @register, value). These are stored in the extra variable. We then add the extra fields to the fields variable and assign mnemonic the value in fields[0] which contains the instruction mnemonic.

Next, using the value of ‘nofields’ which indicate what we should expect next, we sort out the remaining fields and store their values into their perspective variables. In each case, we call parse functions which perform a specific sub-task.

Our first sub-task is to locate any register in the fields. At this point we know if a register exists it should be in fields[1]. So we pass this value to the parse_register() function shown below:

def parse_register(field: str) -> str:
    """ Examples:
            R2
            @R3
            R2, 0x1F
    """
    register = field.upper()
    if register.__contains__('R'):
        pos = field.find('R')
        register = field[pos + 1:pos + 2]
    return register

The first thing we do is convert the register value to upper case. Then we check if the string contains ‘R’ and return the character following the ‘R’ which should be the register index value (a single digit between 0 and 7 inclusively). We then return that value to the caller.

It’s possible that we didn’t have a register. In this case, there may be a value in the fields[1] position. So we check if the call to parse_register() returned a value of an empty string. If the latter was returned, we try and parse a value using the function parse_value() passing in fields[1] as a parameter. This situation occurs with the branch instructions where the mnemonic is directly followed by the offset value for the jump.

def parse_value(val: str) -> str:
    """ Try to parse a value, which can be an integer
        in base 2, 8, 10, 16 or a register.

        Returns: A string representation of the integer
        value in base 10, or empty string if the value
        cannot be converted to an integer.
        On Error: return original value.
    """
    # convert value to int from various bases
    if isinstance(val, str):
        # Get any possible prefix
        val = val.strip(' ')
        base = val[0:2]
        if val.isnumeric():
            return str(int(val, 10))
        if base == '0x':
            return str(int(val, 16))
        elif base == '0c':
            return str(int(val, 8))
        elif base == '0b':
            return str(int(val, 2))
        elif val.__contains__('R'):
            return ''
        else:
            return val

As you can see above the parse_value() function must handle many types of values. It first strips off any prefix to the value and tests if this matches any of the supported numerical base indicators. If a match is found the value is converted to an integer and then cast into a string and returned to the caller.

Next, we call parse_address_mode() passing in all the fields as we may need more than one to identify the addressing mode. The parse_address_mode() function can only be understood in the context of the opcode_table. So refer to the table as you gork this code:

def parse_address_mode(fields: list) -> str:
    """ Example inputs:
                ['HALT']                    : implicit mode
                ['BRA', '0x1F']             : immediate mode
                [LD, R2]                    : register
                [STD, @R3]                  : indirect
                ['SET', 'R0,', '0xFFFE']    : direct
                ['BRC' 'end']               : offset
    """
    numfields = len(fields)
    if numfields == 1:
        return 'implicit'
    elif numfields == 2:
        if fields[0] in directives:
            return 'immediate'
        if fields[1].__contains__('@R'):
            return 'indirect'
        elif fields[1].__contains__('R') and fields[1].__contains__(','):
            return 'direct'
        elif fields[1].__contains__('R'):
            return 'register'
        elif parse_value(fields[1]).isnumeric():
            return 'immediate'
        elif parse_value(fields[1]).isalnum() or parse_value(fields[1]).isalpha():
            return 'offset'
    elif numfields == 3 and parse_value(fields[2]).isnumeric():
        return 'direct'
    elif numfields == 3 and '(' in fields[2] and ')' in fields[2]:
        return 'direct'
    else:
        raise ValueError('Expected numeric value got: {expected}'.format(expected=fields[2]))

The parse_address_mode() function uses the data contained in the fields to sort out the addressing mode of the instruction and return it to the caller, parse_opcode() which then stores this value in the mode variable. 

Next we use the mnemonic value to get the instruction data from the opcode_table and then use the mode variable to get the proper opcode value and instruction format.

If all has gone well, we return a tuple containing the value, register, mode, mnemonic, variables and the object code taken from the opcode_table.

At this poiont our first pass is complete. We have all the data necessary to assemble our instruction code with the possible exception of any forward declared symbol. 

We have one last issue before we can try this out. Our opcode_table has method names in some of the fields. We need to implemnt these helper methods:

All of these methods are pretty strainght forward. They simply take the value from the source code and separe it out into high bytes and low bytes in the case of VALUE_H, VALUE_L, LABEL_H, and LABEL_L. The OFFSET function calculates the offset from the current position.

# Functions used in the creation of object code (used in the table below)
def VALUE_L(pc: int, value):
    val = value_to_int(value) & 0xff
    return str(val & 0xff)


def VALUE_H(pc: int, value: int) -> int:
    val = (value_to_int(value) & 0xff00) >> 8
    return str(val & 0xff)
 
  
def LABEL_L(pc, value):
    print(f'LABEL_L PC: {pc}, value: {value}')
    return (value) & 0xff


def LABEL_H(pc, value):
    print(f'LABEL_H PC: {pc}, value: {value}')
    return ((value) & 0xff00) >> 8


def OFFSET(pc, value):
    print(f'OFFSET: {value}')
    return ((value - pc)) & 0xff


def value_to_int(value):
    # convert value to int from various bases
    if isinstance(value, str):
        if value[1:3].lower() == '0x':
            return int(value, 16)
        elif value[1:3].lower == '0c':
            return int(value, 8)
        elif value[1:3].lower() == '0b':
            return int(value, 2)
        else:
            print('Invalid value.')
    elif isinstance(value, int):
        return value
    # else:
    #     raise ValueError("Expected numerical value, got: {0}".format(value))


def register_to_int(reg):
    if isinstance(reg, str):
        if reg.startswith('@R'):
            return int(reg[2:], 16)
        elif reg.startswith('R') and reg.endswith(','):
            return int(reg[1:-1], 16)
        elif reg.startswith('R'):
            return int(reg[1:])
        else:
            raise ValueError("Cannot parse the register: '{0}'".format(reg))

PK, I think we’re ready to try this! Below is the final code. Note I made some adjustments to the code from the last installment so we could test what we’ve done so far. 

"""Sweet16-GP Assembler"""


# Exception used for errors
class AssemblyError(Exception):
    pass


# Functions used in the creation of object code (used in the table below)
def VALUE_L(pc: int, value):
    # print(f'VALUE_L value: {value}')
    #val = value_to_int(value) & 0xff
    #print(f'VALUE_L val: {val}')
    #return str(val & 0xff)
    pass


def VALUE_H(pc: int, value: int) -> int:
    val = (value_to_int(value) & 0xff00) >> 8
    return str(val & 0xff)


def LABEL_L(pc, value):
    return (value) & 0xff


def LABEL_H(pc, value):
    return ((value) & 0xff00) >> 8


def OFFSET(pc, value):
    return ((value - pc)) & 0xff


def value_to_int(value):
    # convert value to int from various bases
    if isinstance(value, str):
        if value[1:3].lower() == '0x':
            return int(value, 16)
        elif value[1:3].lower == '0c':
            return int(value, 8)
        elif value[1:3].lower() == '0b':
            return int(value, 2)
        else:
            print('Invalid value.')
    elif isinstance(value, int):
        return value
    # else:
    #     raise ValueError("Expected numerical value, got: {0}".format(value))


def register_to_int(reg):
    if isinstance(reg, str):
        if reg.startswith('@R'):
            return int(reg[2:], 16)
        elif reg.startswith('R') and reg.endswith(','):
            return int(reg[1:-1], 16)
        elif reg.startswith('R'):
            return int(reg[1:])
        else:
            raise ValueError("Cannot parse the register: '{0}'".format(reg))


opcode_table = {
    'HALT': {
        'implicit': ['0x00'],
    },
    'BRA': {
        'immediate': ['0x01', VALUE_L],
        'offset': ['0x01', OFFSET],
    },
    'BRC': {
        'immediate': ['0x02', VALUE_L],
        'offset': ['0x02', OFFSET],
    },
    'BRZ': {
        'immediate': ['0x03', VALUE_L],
        'offset': ['0x03', OFFSET],
    },
    'BRN': {
        'immediate': ['0x04', VALUE_L],
        'offset': ['0x04', OFFSET],
    },
    'BRV': {
        'immediate': ['0x05', VALUE_L],
        'offset': ['0x05', OFFSET],
    },
    'BSR': {
        'immediate': ['0x06', VALUE_L, VALUE_H],
        'offset': ['0x06', LABEL_L, LABEL_H],
    },
    'RTS': {
        'implicit': ['0x07'],
    },

    'SET': {
        'direct': ['0x08', VALUE_L, VALUE_H],
    },
    'LD': {
        'register': ['0x10'],
        'indirect': ['0x20'],
    },
    'ST': {
        'register': ['0x18'],
        'indirect': ['0x28'],
    },
    'LDD': {
        'indirect': ['0x30'],
    },
    'STD': {
        'indirect': ['0x38'],
    },
    'POP': {
        'indirect': ['0x40'],
    },
    'STP': {
        'indirect': ['0x48'],
    },
    'ADD': {
        'register': ['0x50'],
    },
    'SUB': {
        'register': ['0x58'],
    },
    'MUL': {
        'register': ['0x60'],
    },
    'DIV': {
        'register': ['0x68'],
    },
    'AND': {
        'register': ['0x70'],
    },
    'OR': {
        'register': ['0x78'],
    },
    'XOR': {
        'register': ['0x80'],
    },
    'NOT': {
        'register': ['0x88'],
    },
    'SHL': {
        'register': ['0x90'],
    },
    'SHR': {
        'register': ['0x98'],
    },
    'ROL': {
        'register': ['0xA0'],
    },
    'ROR': {
        'register': ['0xA8'],
    },
    'POPD': {
        'indirect': ['0xE0'],
    },
    'CPR': {
        'register': ['0xE8'],
    },
    'INC': {
        'register': ['0xF0'],
    },
    'DEC': {
        'register': ['0xF8'],
    },
}

def parse_value(val: str) -> str:
    """ Try to parse a value, which can be an integer
        in base 2, 8, 10, 16 or a register.

        Returns: A string representation of the integer
        value in base 10, or empty string if the value
        cannot be converted to an integer.
        On Error: return original value.
    """
    # convert value to int from various bases
    if isinstance(val, str):
        # Get any possible prefix
        val = val.strip(' ')
        base = val[0:2]
        if val.isnumeric():
            return str(int(val, 10))
        if base == '0x':
            return str(int(val, 16))
        elif base == '0c':
            return str(int(val, 8))
        elif base == '0b':
            return str(int(val, 2))
        elif val.__contains__('R'):
            return ''
        else:
            return val


def parse_address_mode(fields: list) -> str:
    """ Example inputs:
                ['HALT']                    : implicit mode
                ['BRA', '0x1F']             : immediate mode
                [LD, R2]                    : register
                [STD, @R3]                  : indirect
                ['SET', 'R0,', '0xFFFE']    : direct
                ['BYTE' '0x1f']             : immediate
                ['WORD' '0x3BFC']           : immediate
                ['STRING' 'This is a test'] : immediate
                ['BRC' 'end']               : offset
    """
    numfields = len(fields)
    if numfields == 1:
        return 'implicit'
    elif numfields == 2:
        # if fields[0] in directives:
        #     return 'immediate'
        if fields[1].__contains__('@R'):
            return 'indirect'
        elif fields[1].__contains__('R') and fields[1].__contains__(','):
            return 'direct'
        elif fields[1].__contains__('R'):
            return 'register'
        elif parse_value(fields[1]).isnumeric():
            return 'immediate'
        elif parse_value(fields[1]).isalnum() or parse_value(fields[1]).isalpha():
            return 'offset'
    elif numfields == 3 and parse_value(fields[2]).isnumeric():
        return 'direct'
    elif numfields == 3 and '(' in fields[2] and ')' in fields[2]:
        return 'direct'
    else:
        raise ValueError('Expected numeric value got: {expected}'.format(expected=fields[2]))


def parse_register(field: str) -> str:
    """ Examples:
            R2
            @R3
            R2, 0x1F
    """
    register = field.upper()
    if register.__contains__('R'):
        pos = field.find('R')
        register = field[pos + 1:pos + 2]
    return register


def parse_opcode(line: str, symbols) -> tuple:
    """ Break the line into its constituent parts.
        Locate any labels and store their definition.
        Then locate the instruction's addressing mode,
        and stores it in mode. Calculate the instruction's
        actual opcode value.

        Returns: tuple(value, register, mode, objcode) where
        value is a string to be evaluated in the second pass.
        register is the register value or empty string if no
        register exists. "mode" is the addressing mode and
        objcode is a dict containing the base opcode value,
        and a list of functions needed to process the
        instruction
    """
    fields = line.split(None, 1)
    nofields = len(fields)
    if nofields > 1:
        extra = fields[1].split(',')
        if len(extra) > 1:
            fields[1] = extra[0]
            fields.extend(extra[1:])
    nofields = len(fields)
    mnemonic = fields[0]
    register = ''
    value = ''

    """ Examples:
            HALT
            BRA 0x1F
            SET R0, 0xFFFE
            LD R2
            STD @R3
    """
    # Get register and value
    if nofields > 1:
        register = parse_register(fields[1])

    if register == '' and nofields > 1:
        value = parse_value(fields[1])

    if nofields > 2:
        register = parse_register(fields[1])
        value = parse_value(fields[2])

    # Get address mode
    mode = parse_address_mode(fields)

    # Get all addressing modes for this mnemonic
    opcodemodes = opcode_table.get(mnemonic)
    if not opcodemodes:
        raise AssemblyError("Unknown opcode '{}'".format(mnemonic))

    # Get the address mode used in the instruction
    objcode = opcodemodes.get(mode)
    if not objcode:
        raise AssemblyError("Invalid addressing mode '{0}' for {1}".format(mode, mnemonic))

    return value, register, mode, mnemonic, list(objcode)


def parse_lines(lines, symbols):
    """ Determine mnemonic, register, value, and address mode"""
    for lineno, line in enumerate(lines, 1):
        # Handle labels
        label, *colon, statement = line.rpartition(":")
        try:
            # parse the line into ir (intermediate representation)
            data = lineno, label, parse_opcode(statement, symbols) if statement \
                else (None, None, None, None, None)
            yield data
        except AssemblyError as e:
            print("{0:4d} : Error : {1}".format(lineno, e))


def assemble(lines, lc=0):
    """Breaks the line into its constituent parts.
        it then locates the instruction's addressing mode,
        stores it in mode, and calculates the instruction's
        actual opcode value and length. """
    objcode = []
    symbols = {}

    # Pass 1 : Parse instructions and create intermediate code
    for lineno, label, (value, register, mode, mnemonic, icode) in parse_lines(lines, symbols):
        # Try to evaluate numeric labels and set the lc (location counter)
        if label:
            try:
                lc = int(eval(label, symbols))
            except (ValueError, NameError):
                symbols[label] = lc

        # Store the resulting object code for later
        # expansion and adjust the location counter lc.
        if icode:
            objcode.append((lineno, lc, value, register, mode, mnemonic, icode))
            lc += len(icode)
    return objcode


def replace_register_names(line):
    """ Replace register names with 'R' + register index.
        Example: ACC becomes R0. STATUS becomes R6."""
    line = line.replace('ACC', 'R0')
    line = line.replace('RETSTACK', 'R4')
    line = line.replace('COMP', 'R5')
    line = line.replace('STATUS', 'R6')
    line = line.replace('PC', 'R7')
    return line


def strip_lines(lines):
    """ Takes a sequence of lines and strips comments and blank lines."""
    for line in lines:
        comment_index = line.find(";")
        if comment_index >= 0:
            line = line[:comment_index]
        line = line.strip()
        line = replace_register_names(line)
        yield line


if __name__ == '__main__':
    text = ';This is a comment\n' \
           'start:  SET  ACC, 0xFFDE  ; This is also a comment\n' \
           'end:    HALT  ; end of program'
    lines = text.split('\n')

    # Remove comments and blank lines
    lines = strip_lines(lines)
    print(assemble(lines))

If you run the above code you should get an output like the following:

 

sweet16-articles-code/articles/part-07/assembler_02.py
 [
  (2, 0, '65502', '0', 'direct', 'SET', ['0x08', , ]), 
  (3, 3, '', '', 'implicit', 'HALT', ['0x00']) 
 ]

As you can see we have the intermediate representation for two instructions. All the info we will need to assemble these instructions into actual machine code. Reading the tuples the first value (2) is the line number from the source. The second value (0) is the location counter at the start of the instruction. The next value (65502) is the value provided in the instruction, followed by the register index (0), the addressing mode (direct), and the mnemonic. The final fields holds the opcode information from the opcode_table.  Try this out on several instructions and see that the results make sense. Write some unit test for the various functions and for the assemble as it stands. 

This has been a long post. Next time we will tackle the second pass of our assembler. Until then, keep coding!

 

Designing The MiniMips – Part 4 Computer Architecture and Data Paths

by SysOps 0 Comments

Ok, last time we completed the basic design of our ALU. I hope you took the time to build it in Logisim and play with it. It should be trivial to comfirm it works by doing simple operations on paper or a programmer’s calculator and the ALU and comparing the results. Once you have confirmed that your ALU works correctly,  it is time to save it as a sub-circuit in Logisim. We’ll take a short step back to discuss the basic operation of a CPU and support systems and then we’ll answer to question of where our ALU input values and control signals come from and where the output goes.

Computer Architecture

Computers all follow a similar architecture. At a minimum they have Input, Processing, Storage, and Output. These components can take various forms. For example, input could come from a keyboard as with the typical personal computer, or it may come from some type of sensor such as the oxigen sensor in your car. Output could be to a display screen such as the LED monitor or a relay used to control a well pump. Memory can take the form of registers, magnetic disks, solid state disks, integrated circuits, or even paper punch cards. Processing can be either analog, digital, or even quantum devices.  However, the typical computer most people think of is the PC or typical desktop or laptop machine.

Computers systems can be categorized into many categories. Here we will place them into two broad categories based on the system architecture. Systems using seperate stores for instructions and data are said to be of a Hardvard Architecture. These machine read cpu instructions from ROM (Read Only Memory) and execute them. All data is stored in the system RAM (Random Access Memory). A typical microcontroller such as the Atmel AtMega324 is an example of this architecture. Other machines store cpuinstructions and data in the same memory. This allows for programs to become dynamic, being altered or created by the running program. This type of achitecture is known as Von Nueman Architecture. The Von Nueman Architecture is a powerful because program instructions are treated just like data. It does have it’s down sides however. One such short coming is the issue of security. Since any running program can access the program instructions, it can alter not only it’s own code but the code of other programs. So a rouge program could for instance alter the program for calculating an aircraft’s altitude. This could result in personal injury or death to the passengers. Because of this short-coming a lot of work has gone into developing ways to limit programs from having complete access to all programs stored in memory. Modern Von Nueman machnes contain a large amount of circuitry to manage program access to memory so such things wont occur.

Here we will develop a system using the Hardvardarchitecture and use a seperate instruction and data memories. This will allow us to reduce a few complexities and focus on other issues. In the future I may write another article on developing a Von Nueman machine.

CPU Architecture and Data Paths

Ok, let’s talk a bit about how a digital computer system works.  The first thing that occurs in most systems when power is applied is that the major components are set to a predetermined state. This reset also occurs when a “Reset” button is pushed. If you’ve ever had to take a paper click and hold in a tiny button on your router, then you’ve used a reset button. After the system has been reset, the cpu begins to step through it’s cycles. The first cycle is the ‘Fetch’ cycle. In this cycle the CPU fetches the next instruction from the program memory. Next, the instruction decoder decodes the instruction into the appropriate control signals for the various cpu components. The next step is to ‘Execute’ the instruction. The last step is to ‘Store’ the results in memory if required by the instruction. Most cpu instructions follow this pattern. However, this cycle can be shortened or have wait states added to it for complex instructions. In simple processesors some of these instructions are combined. Usually the Fetch and decode cycles are made to occur in the same cycle. In a single cycle cpu are these steps occur in one clock cycle. This is often accomplished by multiplying the clock input to generate the needed signals. However, in some architectures such as the one we will build, these steps can occur in a single cycle without the need for clock multiplication.

Below is the logisim circuit for the SimpleComputer.  This is the basic machine we will be building.  I’ve included some 7-Segment Hex digit displays to allow use to view the the CPU operation. I also included four ports on the ALU that allow us to connect some of those displays directly to the registers in the registerfile. While the circuit looks pretty basic here, it is only because I have used logisim’s ability to can a subcircuit as a component. So in this view all the major components are actually subcircuits that vary in complexity. They are all made out of logic gates of logisim standard components. If you have not used the Subcircuit feature in logisim I recommend you break out the tutorials and learn to use them. OK, let’s describe the circuit from a 10,000 foot level. We’ll being seeing each componet in great detail as we move forward.

The machine must first be reset using the reset button in the lower left of the circuit. Then we can single step through programs by clicking the clock input. We can run programs from the simulation menu in logisim. Once reset, the PC (Program Counter) will be set to 0x00 hexidecimal (hex). This will cause the first program instruction to be placed on the data output of the ROM (Read Only Memory). The ROM instruction code is broken into various parts that control certain aspects of the CPU. For eample, bits 0-3 of the instruction contain the ALU OP code. These four bits are used directly to select the chosen ALU operation. For shift operations bits 4-7 select the shift distance. For example if bits 4 – 7 contain the value 0010b in binary, then the value on the ALU’s A input will be shifted by two bit positions. Many bits in the instruction perform multiple duties. For eaxample, although bits 0-3 and 4-7 are used for the ALUOP code and the SHIFTAMT they also do double duty as the lower order byte of an immediate address in an instruction that contains an address value.

Once the instruction code is broken into it’s constituent parts, the various signals are reouted to other parts of the circuit as needed. A portion of the instruction code defines the type of instruction being executed. This allows the proper signals to be generated to control the parts of the circuit that are not directly controled by the bits in the instruction. Next the if the instruction is a register type instruction, the two source registers and the destination register are selected. The values in those registers and posibly an immediate value are feed into the ALU A and B inputs. The ALU then produces the out for all the operations we provided hardware for. This is done in parallele so producing all output at once isn’t an issue. The ALUOP code selects the desired function results and places them on the ALU output. Depending on the instruction the ALU ouput could be routed back to the destination register selected in the RegFile or it may be stored in RAM or used as a RAM address. It could also be used to set condition flags the determine if a branch will be taken. Lastly, it can be used as the input to the PC though that portion of the circuit is not shown here.

Look over the circuit and trace through the signal paths. See if you can figure out how the various parts work to control the circuit. Next time we’ll talk about the Instruction Set and continue on about the architecture. Until then, I hope you all have a happy new year!

 

 

Designing The MiniMips – Part 2 Subtraction and Negative Numbers

by SysOps 0 Comments

Last time we discussed binary addition and how we could use a few logic gates to do arithmetic. I showed you a few simple circuits and left you with a simple assignment to use the full adder circuit to build a multi-bit full adder. I told you I’d show you a solution to the exercise. So here it is:

As you can see the full adders are simply cascaded by connecting the carry-out of the previous stage to the carry-in on the next stage. This cascading can be continued until the propagation delay of the carry signal become intolerable. If that happens there are techniques for improving the circuit. Checkout the Carry Lookahead Adder circuit.

Binary subtraction

We’ve seen how binary adders can be built from simple logic gates. Now let’s look at binary subtraction. The binary subtraction rules are very simple:

0-0 = 0 1-0=1 1-1 = 1 10-1 = 1

Binary subtraction is similar to decimal subtraction with the difference that when 1 is subtracted from 0, it is necessary to borrow 1 from the next higher order bit. That bit is reduced by 1 and the remainder is 1. The one thing that confuses most people when doing a binary borrow is that when you borrow in decimal you add it as 10 to the previous column. In binary we are using powers of 2. So when we borrow we must add the borrow as two bits of half the original value:

1  1  1
1  1
1 0 0 0
0 1 0 1
0 1 1

In the above, starting from the left most bit. First we can’t subtract 1 from 0 so we try to borrow 1 from the next column to the right (the tow’s column). However, it too is 0. We can’t borrow 1 from 0 so we move on to the third column left (the fours column). Next, in the fourth column left we find a 1 in the eights column. Now this one represents an 8 in decimal. So when we move it to the fours column. We have to split that 8 into two 4s. So we place two 1’s above the fours column. Next we borrow one of them and move it to the two’s column as two 2s. Again, we borrow a two from the 2s column and move it as two ones and place them in the 1s column. Now we can subtract the 1 in the right most the two ones we placed there. This leaves a one which we bring down.  In the two’s column we can now subtract 0 from the remaining 1 and get 1. We then move left to the fours column and subtract 1 from the remaining 1 we carried over and get 0. The eights column is now 0 and being a leading 0 we don’t really need to bring it down as it adds no value to our answer. Now to check our answer we can see that our original problem was: 8-5=3. And indeed 11 in binary is 3 in decimal.

Now let’s look at the logic needed to do a single bit binary subtraction. The truth table for subtraction is:

A B Diff Barrow
0 0 0 0
1 0 1 0
1 1 0 0
0 1 1 1

 

Looking at the table we can see that the difference is computed using an XOR gate. The borrow out however is a slightly more complex. Borrow out is only 1 when A is 0 and B is 1. This looks like and AND gate with an inverter on B.  Let’s try it out in logisim. Build the following circuit in logisim and see if it matches the truth table above:

Now this circuit is called a half subtractor. Just as the half adder is missing a carry in, the half subtractor is missing a borrow in. However, a similar cascading trick like that used for the half adder can be used to build a full subtractor. Let’s try this out. Build the circuit below in logisim and give it a try:

Negative Numbers

Now this kind of subtractor is great if you are only working with positive numbers. But if we ever need to work with a negative value as an operand or a resulting value, we are out of luck. So how do we handle negative numbers? Well, the most typical way is to reserve the left-most bit for use as a sign bit. The sign bit is set to 1 to indicate a negative value and 0 to indicate a positive value.  So if we have 8 bits we can represent -127 through +127.  There is an issue however, To demonstrate it let’s reduce this to a bit bit signed number. So we can represent -7 to +7 with the three bits know as the magnitude and the fourth bit as the sign bit.

 

Value Sign 4 2 1
0 0 0 0 0
1 0 0 0 1
2 0 0 1 0
3 0 0 1 1
4 0 1 0 0
5 0 1 0 1
6 0 1 1 0
7 0 1 1 1
-0 1 0 0 0
-1 1 0 0 1
-2 1 0 1 0
-3 1 0 1 1
-4 1 1 0 0
-5 1 1 0 1
-6 1 1 1 0
-7 1 1 1 1

Notice anything odd about this arrangement? Look closely, there are two zero values. Zero and negative zero. Who ever heard of negative zero? Sign magnitude presents it’s own issues. A program or hardware would have to check for two values of zero if this was implemented in our microprocessor. So what can we do about this? How about flipping the bits and using one’s complements. This too has it’s own issues. The issues with negative numbers is solved by something called two’s-complement. I found a Computerphile video that I think gives one of the best explanations available. Watch the video to learn more about binary math and negative numbers:

 

Now I am sure you have a good grasp on the problems and solutions to binary mathematics. I’ll leave you with this: How can we modify our multi-bit adder for use in both addition and subtraction? I’ll show you my solution next time. Until then, play around with your own ideas in logisim and try to build an adder/subtractor circuit using 2’s-complement.

 

Desgning the MiniMips A Simple 16 bit Microprocessor

by SysOps 0 Comments

A while back I designed and built my own 16-bit microprocessor. I wanted to be able to use my processor in my own projects. My requirements were very simple. It had to be easy to understand, have an instruction set that is simple to use, as this makes it easy to generate a simple compiler for the instruction set. My desired instructions were pretty simple. I wanted something akin to the MIPS instruction set. Instructions should take one or two source operands, and a destination register or memory address. In the case of comparative instructions, it should simply discard the results. The processor should include a hardware stack. It should contain at least 16 general purpose registers and at least four I/O ports. I would prefer instructions to take a single cycle. But If I include a multiply and divide instruction this isn’t practical.  My last requirement was that I should be able to use free and open-source tools. I did a basic design in Logisim and then moved the design to VHDL and place it on an FPGA.

The project started off great but hit a few bumps in the road along the way. I eventually worked them out and got everything working. But I never really documented the project. So that’s what I’ll attempt to do here. I’ll step back and walk through the process I took; errors and all.

Designing a microprocessor isn’t a task for the faint of heart, however, it isn’t a task that should be impossible either.

This serries of blog posts will be more a record of how I proceeded with this project. I’ll include my mistakes, faux pas, and hopefully, the solutions I found, so readers may amuse themselves at my expense… If you follow along and find these posts interesting, have helpful hints or information, or if you build the MiniMips let me know in the contact form. I’d love to hear from you.

So how do we get started? Well, I’ll assume like myself, you have a grasp of digital logic basics. So let’s start by writing some requirements for the ALU. The ALU is the core of the processor. It’s the part that provides the Arithmetic and Logic functions. The ALU has at least three inputs: Operand 1, Operand 2, and the operation code. It should produce at least two outputs; the result of the operation, and a set of status indicators or flags.The ALU will perform the following operations:

  • Binary Addition
  • Binary Two’s Complement Subtraction
  • Binary Bitwise Inversion
  • Logical And of two binary values
  • Logical Or of two binary values
  • Logical XOR of two binary values
  • Logic Shift Left
  • Logical Shift Right
  • Logical Rotate Left
  • Logical Rotate Right

So if I count correctly that’s ten ALU operations we need circuitry for. Let’s start with the arithmetic first, Add and Subtract:

Let’s first review the laws of binary arithmetic:

Addition 0 + 0 = 0 0 + 1 = 1 1 + 0 = 1 1 + 1 = 0
Subtraction 0 – 0 = 0 1 – 0 = 1 1 -1 = 0 10 – 1 = 1

Now let’s look at a couple simple logic operations:

AND Operation OR Operation XOR Operation
1 AND 1 = 1   1 OR 1 = 0  1 XOR 1 = 0
0 AND 1 = 0   1 OR 0 = 1  1 XOR 0 = 1
1 AND 0 = 0  0 OR 1 = 1  0 XOR 1 = 1
0 AND 0 = 0  0 OR 0 = 0 0 XOR 0 = 0

You might notice that adding to binary digits produces the same results as XORing two binary digits. So to add to binary digits together we only need to XOR them. However, when we try to do arithmetic operations on multi-digit numbers we have the issue of carrys and barrows. Let’s just look at addition for the moment. When we add two two-bit binary numbers and have a one in the same column in both numbers we must carry a one to the next column. Just as we do in decimal arithmetic.

Carries 1 1
1 0 1 1
+ 0 0 0 1
Results 1 1 0 0

 

If you look closely you will notice that the carry happens anytime there are two ones in the same column. This is basically the AND logic operation.

To follow along download Logisim or Logisim-Evolution and create the circuit show below. (I reccomend you open the Logisim help and locate the tutorial and complete that first if you have little or no experience with logisim.)

Above you see a simple XOR gate. Play with it and test it to see if the output is high only when A+B = 1 and low when A+B <> 1. These condition only hold true if only A * -B | B * -A is true.Remember if A and B are both 1 or high, then the addition results in the output being low and a 1 carried over to the second bit condition. However, here we have no second bit to handle the carry out. So let’s see how to add the carry out to our simple 1-bit adder circuit.

Above you can see the addition of a simple AND gate. The AND gate only generates a 1 on it’s output when both inputs are high. So in this circuit it makes the perfect carry out generator. This type of adder is known as a Half Adder because it only does half the job…

Ok, that’s fine and good you may say however, what about adding multi-bit numbers? How can we do that?

Adding multi-bit numbers requires more than a carry out signal… We also need a carry-in signal. The carry-in allows the carry-out from the previous column to be added into the result of A + B. So our new circuit needs to include the carry-in as part of the sum. To do this, we just need to add the carry-in to the result of A + B. We can do this with another Adder:

So here’s two Half-Adders cascaded together to allow for a Carry-in signal from the prevous column. There is an issue however. We now have two carry-outs. The original and the carry out from the second half adder used to add the carry-in bit. As it turns out, you can’t have both carry-outs active at the same time. To see why simply follow the logic;

  • If A=1, B=1 Cin=1: Sum 1 = 0 and Carry-out 1 = 1, Sum-out 2 = 1 and Carry-out 2 = 0
  • If A=1, B=1, Cin=0: Sum 1 = 0 and Carry-out 1 = 1, Sum-out 2 = 0 and Carry-out 2 = 0

Check the rest of the conditions by building a truth table. You’ll see that only one carry-out bit can be active at a time. So we simply need to detect when one or the other is active. That’s a perfect job for an OR gate:

OK, build these circuits in Logisim and play with them. Work out the a truth table for them all and prove to yourself they work as expected.

With the full adder you can create N-bit Adders by simply feeding the Carry-out of the previous column to the Carry-in of the next column. As an exercise use our full adder circuit to create an 8-Bit Adder.  Next time I’ll show you my 8-Bit Adder and then we’ll look at subtraction.

 

 

 

 

 

 

 

Newsletter Powered By : XYZScripts.com