# Blog

25 Articles

## Sweet16-GP Assembler – Part 3

0 Comments
This entry is part 8 of 8 in the series Sweet16-GP CPU: A Complete Development Cycle

Last time we left off with the first pass of our assembler creating an intermediate  representation for our assembly code. Today we are going to begin the second pass. So let’s get on with it!

Our first pass completes with a list of code that contains all the information we need to assemble the code. So why did we use a two pass design instead of a single pass? The answer is that in the single pass we have to do a lot more work to handle forward references to label values. By using a two pass design we can forego the complexities and simply handle any foreward references in the second pass after we have already calculated each instructions address in the first pass.

So what work do we need to accomplish in the second pass? Well, we need to keep track of the instructions location in memory. We will do this by maintaining a symbol in a synbol table called “pc” for program counter. We will use  will need to replace any symbols with thier values stored in the symbol table. Then we will need to encode the instruction in hex values. So let’s see how to accomplish this:

```# Second pass -- Assembly
for lineno, pc, value, register, mode, mnemonic, icode in objcode:
print(f'Pass 2 : {lineno, pc, value, register, mode, mnemonic, icode}')
# Evaluate the string
try:
symbols[pc] = pc

if value is not '':
realvalue = eval(value, symbols)

if isinstance(realvalue, str):
realvalue = ord(realvalue) & 0xff

if not isinstance(realvalue, int):
raise TypeError("Integer expected in {0}".format(value))
except Exception as e:
print("{0:4d} : Error : {1}".format(lineno, e), file=sys.stderr)
realvalue = 0```

Referring to the code above, we can see that first we loop over objcode extracting the values we assembled in the first pass.  With each pass we set the new value of the pc counter. Then we check the value field. Any value we stored here will be a string. So we use eval() to convert the value to an integer. If the value happens to be a  symbol we can convert it to it’s value by passing the symbol table along with the value to eval(). However, if the value was a string representation of a numerical value, and not a symbol in the symbol table, eval will simply return our string value. So we check if the value returned by eval is a string and if so, we make a call to ord to convert it to an integer value. We also “AND”  the value with 0xFF to ensure it remains a byte sized value.  Now we should have an integer value between 0 and 255 in “realvalue”. In the last if statement we raises an TypeError if this is not the case. Finally, we wrap this all in an try/except block and print an error message to stderr is any issues arise and set the “realvalue” variable back to zero.

Our next step is to take the values and data we extracted from objcode and assemble all of it into the actual values to represent the instruction. For this we will need to use the Callable library to call the functions we stored in the opcode table. So our first order of business here will be to import the Callable module from the collections library.

`from collections import Callable`

Now that we have the Callable module we can encode our actual instruction code:

```# Encode the instruction. If register is present,
# it must be added to the base opcode value stored
# in icode[0]
opcode = int(icode[0], 16)
if register != '':
opcode += int(register, 16)
encode = [op(pc, realvalue) if isinstance(op, Callable) else op for op in icode]
encode[0] = opcode

print(lineno, pc, encode)
execode.append((lineno, pc, encode))
return execode```

Here we first extract the instruction opcode and convert it to hexadecimal. Then we check if the register field is empty. If not, we have a register value that must be added to the instruction opcode. The next step is to encode the extra data values. We do this by looping over the values in “icode” and if they are Callable, i.e. stored functions, we call the functions on “realvalue” variable and store the values in the list variable “encode”. We update the value of the opcode in encode and then print the results for a simple sanity check. Finally, we append the list of values in encode to the instruction list execode passing in the line number and pc value as well. Once we have iterrated over all the instructions in our program, we return the execode list to the caller.

We now have a list of encoded instruction with a little extra info attached. If you run the code above the final results in encode will look something like this:

[(2, 0, [8, ‘222’, ‘255’]), (3, 3, [0])]

What we see here is a list of tuples where the fisrt entry in each tuple is the line number in the source file the instruction was found on. The second value is program counter value at the start of the instructions, and so where to store the first byte of the instruction.  The last value in the tuple is a list of values representing the values to be placed in memory. Here, our first instruction has an opcode of 8 and is followed by the values 222 and 255. Recall the first byte is the lower order byte. If we look up the opcode 8 we will see it is the SET instruction. Since the this instruction requires a register index be added to it, and the register index did not change the opcode. We can deduce the register is register 0 or ACC. The value to be loaded into register 0 is 255 << 8 + 222 =  65502 or 0xFFDE. So the complete instruction is SET R0, 0xFFDE. The second instruction is even easier to decode. The opcode 0x00 is the HALT instruction and takes no arguments.

OK, let’s see the code as we have it now:

```"""Sweet16-GP Assembler"""
from collections import Callable

# Exception used for errors
class AssemblyError(Exception):
pass

# Functions used in the creation of object code (used in the table below)
def VALUE_L(pc: int, value):
val = value_to_int(value) & 0xff
return str(val & 0xff)

def VALUE_H(pc: int, value: int) -> int:
val = (value_to_int(value) & 0xff00) >> 8
return str(val & 0xff)

def LABEL_L(pc, value):
return (value) & 0xff

def LABEL_H(pc, value):
return ((value) & 0xff00) >> 8

def OFFSET(pc, value):
return ((value - pc)) & 0xff

def value_to_int(value):
# convert value to int from various bases
if isinstance(value, str):
if value[1:3].lower() == '0x':
return int(value, 16)
elif value[1:3].lower == '0c':
return int(value, 8)
elif value[1:3].lower() == '0b':
return int(value, 2)
else:
print('Invalid value.')
elif isinstance(value, int):
return value
# else:
#     raise ValueError("Expected numerical value, got: {0}".format(value))

def register_to_int(reg):
if isinstance(reg, str):
if reg.startswith('@R'):
return int(reg[2:], 16)
elif reg.startswith('R') and reg.endswith(','):
return int(reg[1:-1], 16)
elif reg.startswith('R'):
return int(reg[1:])
else:
raise ValueError("Cannot parse the register: '{0}'".format(reg))

opcode_table = {
# TODO: Get DATA implemented

'HALT': {
'implicit': ['0x00'],
},
'BRA': {
'immediate': ['0x01', VALUE_L],
'offset': ['0x01', OFFSET],
},
'BRC': {
'immediate': ['0x02', VALUE_L],
'offset': ['0x02', OFFSET],
},
'BRZ': {
'immediate': ['0x03', VALUE_L],
'offset': ['0x03', OFFSET],
},
'BRN': {
'immediate': ['0x04', VALUE_L],
'offset': ['0x04', OFFSET],
},
'BRV': {
'immediate': ['0x05', VALUE_L],
'offset': ['0x05', OFFSET],
},
'BSR': {
'immediate': ['0x06', VALUE_L, VALUE_H],
'offset': ['0x06', LABEL_L, LABEL_H],
},
'RTS': {
'implicit': ['0x07'],
},

'SET': {
'direct': ['0x08', VALUE_L, VALUE_H],
},
'LD': {
'register': ['0x10'],
'indirect': ['0x20'],
},
'ST': {
'register': ['0x18'],
'indirect': ['0x28'],
},
'LDD': {
'indirect': ['0x30'],
},
'STD': {
'indirect': ['0x38'],
},
'POP': {
'indirect': ['0x40'],
},
'STP': {
'indirect': ['0x48'],
},
'ADD': {
'register': ['0x50'],
},
'SUB': {
'register': ['0x58'],
},
'MUL': {
'register': ['0x60'],
},
'DIV': {
'register': ['0x68'],
},
'AND': {
'register': ['0x70'],
},
'OR': {
'register': ['0x78'],
},
'XOR': {
'register': ['0x80'],
},
'NOT': {
'register': ['0x88'],
},
'SHL': {
'register': ['0x90'],
},
'SHR': {
'register': ['0x98'],
},
'ROL': {
'register': ['0xA0'],
},
'ROR': {
'register': ['0xA8'],
},
'POPD': {
'indirect': ['0xE0'],
},
'CPR': {
'register': ['0xE8'],
},
'INC': {
'register': ['0xF0'],
},
'DEC': {
'register': ['0xF8'],
},
}

def parse_value(val: str) -> str:
""" Try to parse a value, which can be an integer
in base 2, 8, 10, 16 or a register.

Returns: A string representation of the integer
value in base 10, or empty string if the value
cannot be converted to an integer.
On Error: return original value.
"""
# convert value to int from various bases
if isinstance(val, str):
# Get any possible prefix
val = val.strip(' ')
base = val[0:2]
if val.isnumeric():
return str(int(val, 10))
if base == '0x':
return str(int(val, 16))
elif base == '0c':
return str(int(val, 8))
elif base == '0b':
return str(int(val, 2))
elif val.__contains__('R'):
return ''
else:
return val

def parse_address_mode(fields: list) -> str:
""" Example inputs:
['HALT']                    : implicit mode
['BRA', '0x1F']             : immediate mode
[LD, R2]                    : register
[STD, @R3]                  : indirect
['SET', 'R0,', '0xFFFE']    : direct
['BYTE' '0x1f']             : immediate
['WORD' '0x3BFC']           : immediate
['STRING' 'This is a test'] : immediate
['BRC' 'end']               : offset
"""
numfields = len(fields)
if numfields == 1:
return 'implicit'
elif numfields == 2:
# if fields[0] in directives:
#     return 'immediate'
if fields[1].__contains__('@R'):
return 'indirect'
elif fields[1].__contains__('R') and fields[1].__contains__(','):
return 'direct'
elif fields[1].__contains__('R'):
return 'register'
elif parse_value(fields[1]).isnumeric():
return 'immediate'
elif parse_value(fields[1]).isalnum() or parse_value(fields[1]).isalpha():
return 'offset'
elif numfields == 3 and parse_value(fields[2]).isnumeric():
return 'direct'
elif numfields == 3 and '(' in fields[2] and ')' in fields[2]:
return 'direct'
else:
raise ValueError('Expected numeric value got: {expected}'.format(expected=fields[2]))

def parse_register(field: str) -> str:
""" Examples:
R2
@R3
R2, 0x1F
"""
register = field.upper()
if register.__contains__('R'):
pos = field.find('R')
register = field[pos + 1:pos + 2]
return register

def parse_opcode(line: str, symbols) -> tuple:
""" Break the line into its constituent parts.
Locate any labels and store their definition.
Then locate the instruction's addressing mode,
and stores it in mode. Calculate the instruction's
actual opcode value.

Returns: tuple(value, register, mode, objcode) where
value is a string to be evaluated in the second pass.
register is the register value or empty string if no
register exists. "mode" is the addressing mode and
objcode is a dict containing the base opcode value,
and a list of functions needed to process the
instruction
"""
fields = line.split(None, 1)
nofields = len(fields)
if nofields > 1:
extra = fields[1].split(',')
if len(extra) > 1:
fields[1] = extra[0]
fields.extend(extra[1:])
nofields = len(fields)
mnemonic = fields[0]
register = ''
value = ''

""" Examples:
HALT
BRA 0x1F
SET R0, 0xFFFE
LD R2
STD @R3
"""
# Get register and value
if nofields > 1:
register = parse_register(fields[1])

if register == '' and nofields > 1:
value = parse_value(fields[1])

if nofields > 2:
register = parse_register(fields[1])
value = parse_value(fields[2])

# Get address mode
mode = parse_address_mode(fields)

# Get all addressing modes for this mnemonic
opcodemodes = opcode_table.get(mnemonic)
if not opcodemodes:
raise AssemblyError("Unknown opcode '{}'".format(mnemonic))

# Get the address mode used in the instruction
objcode = opcodemodes.get(mode)
if not objcode:
raise AssemblyError("Invalid addressing mode '{0}' for {1}".format(mode, mnemonic))

return value, register, mode, mnemonic, list(objcode)

def parse_lines(lines, symbols):
""" Determine mnemonic, register, value, and address mode"""
for lineno, line in enumerate(lines, 1):
# Handle labels
label, *colon, statement = line.rpartition(":")
try:
# parse the line into ir (intermediate representation)
data = lineno, label, parse_opcode(statement, symbols) if statement \
else (None, None, None, None, None)
yield data
except AssemblyError as e:
print("{0:4d} : Error : {1}".format(lineno, e))

def assemble(lines, lc=0):
"""Breaks the line into its constituent parts.
it then locates the instruction's addressing mode,
stores it in mode, and calculates the instruction's
actual opcode value and length. """
objcode = []
symbols = {}

# Pass 1 : Parse instructions and create intermediate code
for lineno, label, (value, register, mode, mnemonic, icode) in parse_lines(lines, symbols):
# Try to evaluate numeric labels and set the lc (location counter)
if label:
try:
lc = int(eval(label, symbols))
except (ValueError, NameError):
symbols[label] = lc

# Store the resulting object code for later
# expansion and adjust the location counter lc.
if icode:
objcode.append((lineno, lc, value, register, mode, mnemonic, icode))
lc += len(icode)

# Second pass -- Assembly
execode = []
for lineno, pc, value, register, mode, mnemonic, icode in objcode:
# Evaluate the string
try:
symbols[pc] = pc

if value is not '':
realvalue = eval(value, symbols)

if isinstance(realvalue, str):
realvalue = ord(realvalue) & 0xff

if not isinstance(realvalue, int):
raise TypeError("Integer expected in {0}".format(value))
except Exception as e:
print("{0:4d} : Error : {1}".format(lineno, e), file=sys.stderr)
realvalue = 0

# Encode the instruction. If register is present,
# it must be added to the base opcode value stored
# in icode[0]
opcode = int(icode[0], 16)
if register != '':
opcode += int(register, 16)
encode = [op(pc, realvalue) if isinstance(op, Callable) else op for op in icode]
encode[0] = opcode

execode.append((lineno, pc, encode))
return execode

def replace_register_names(line):
""" Replace register names with 'R' + register index.
Example: ACC becomes R0. STATUS becomes R6."""
line = line.replace('ACC', 'R0')
line = line.replace('RETSTACK', 'R4')
line = line.replace('COMP', 'R5')
line = line.replace('STATUS', 'R6')
line = line.replace('PC', 'R7')
return line

def strip_lines(lines):
""" Takes a sequence of lines and strips comments and blank lines."""
for line in lines:
comment_index = line.find(";")
if comment_index >= 0:
line = line[:comment_index]
line = line.strip()
line = replace_register_names(line)
yield line

if __name__ == '__main__':
text = ';This is a comment\n' \
'start:  SET  ACC, 0xFFDE  ; This is also a comment\n' \
'end:    HALT  ; end of program'
lines = text.split('\n')

# Remove comments and blank lines
lines = strip_lines(lines)
# Assemble code
print(assemble(lines))```

Recall back in our emulator we added a routine to load a hex-file:

```"""
Load a program assembled into a hex file and store in memory.     File format is: each line contains 18 bytes. First two bytes are the beginning address of the data in the current line. The following 16 bytes are the memory values. All values are in hexadecimal. If the 0x character pair is used, it will be stripped. All values are space delimited. The file must end with a new line.
"""```

Since this is the format our emulator expects, we are going to write a function to write out our data into this format:

```def write_rom_code(execode):
last_pc = 0
rom = []

for lineno, pc, ecode in execode:
if last_pc > pc:
pass
elif last_pc < pc:
while last_pc < pc:
rom.append('0x00')
last_pc += 1
else:
for byte in ecode:
byte = '0x%.2x' % int(byte)
rom.append(byte)
last_pc += 1

while last_pc % 16 != 0:
rom.append('0x00')
last_pc += 1

return rom```

As you can see, the process is pretty straight forward. The code write the instruction code from the execode parameter and if this does not end an a 16 byte boundery, it write 0x00 from the end of the code to the next 16 byte boundary. This is accomplished in while loop near the end of the function. Also note that we convert the integer values to hexadecimal value. This is typical and comes in handy when transfering programs across a serial of JTAG channel to load it into hardware.

Add this function just above the final if statement (if __name__ == ‘__main__’:) and the change that if statement to read as follows:

```if __name__ == '__main__':
text = ';This is a comment\n' \
'start:  SET  ACC, 0xFFDE  ; This is also a comment\n' \
'end:    HALT  ; end of program'
lines = text.split('\n')

# Remove comments and blank lines
lines = strip_lines(lines)
# Assemble
code = assemble(lines)
# Convert ot Hexfile format
print(write_rom_code(code))```

Now if we run the program we get a list of ROM values that always end on a 16 byte boundary. Note that this raw ROM format does not include the leading two byte address values:

`['0x08', '0xde', '0xff', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00']`

Now that we have a raw ROM format, we need to a function to convert this to our hex file format. Since we will be writing to a file we will need to import the sys module.

`def rom2hexfile(rom):    col = 0    row = 0    max_cols = 16    max_rows = int(len(rom) / max_cols)    idx = 0     # Generate output filename    # If none given use input filename's    # path and base name with a '.hex'    # extension appended.    if len(sys.argv) > 2:        # we should have an outfile name        filename = sys.argv[2]    else:        file = sys.argv[1]        file = file.split('.')        head, tail = os.path.split(sys.argv[1])        parts = tail.split('.')        filename = os.path.join(head, parts[0] + '.hex')    with open(filename, 'w') as ofh:        for b in rom:            if col == 0:                ofh.write('{:02x} {:02x} '.format((idx & 0xff), (idx & 0xff00) >> 8))                ofh.write('{:02x} '.format(int(b, 16)))            else:                ofh.write('{:02x} '.format(int(b, 16)))            col += 1            if col >= max_cols:                row += 1                col = 0                ofh.write('\n')            if row >= max_rows:                ofh.write('\n')                break;            idx += 1        ofh.close()`

The code above writes the hex file to a file with the same name as the asm file we load. Well, actually, we haven’t loaded an asm file yet. We’ve been passing our assembly code as a string to our assemble function. The above code depends on a main() function we’ve yet to write and filename values passed as command-line arguments. So let’s see our main function:

```def main():

if len(sys.argv) >= 2:
filename = sys.argv[1]

# Remove comments and blank lines
lines = strip_lines(open(filename))

if 1:
rom = write_rom_code(assemble(lines))
print(rom)```

Now that we are accpting assembly language files from the command line we will be needing a simple assembly language file. Below is a simple assembly program to add two number. Create a file named add.asm (all our assembly language files will have a file extension of “.asm”) and save it as add.asm in the same folder as the assembler.

```;
; Test ADD Rn
; On Exit:
; Save in a file named add.asm
;
start:      SET R1, 0x00fe      ; initialize pointers.
SET R0, 0x01        ; load ACC with 1
ADD R1              ; ACC gets 0cfe + 0x01 = 0xff
end:        HALT                ; STOP Execution```

Now we need to change our “if name equals main” statement to call our main() function:

```if __name__ == '__main__':
import sys, os
main()```

Now if you run this code from the command line, you should get the following output:

```python3 assembler_03.py add.asm
['0x09', '0xfe', '0x00', '0x08', '0x01', '0x00', '0x51', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00', '0x00']```

In addition you should have a new file named add.hex in the same folder as your add.asm file. The add.hex file should have the following contents:

`00 00 09 fe 00 08 01 00 51 00 00 00 00 00 00 00 00 00 `

As you can see we added the two address bytes to the rom format. This hex file should now be loadable into the emulator.

Now let’s see it all put together for completeness:

```"""Sweet16-GP Assembler"""
from collections import Callable

# Exception used for errors
class AssemblyError(Exception):
pass

# Functions used in the creation of object code (used in the table below)
def VALUE_L(pc: int, value):
val = value_to_int(value) & 0xff
return str(val & 0xff)

def VALUE_H(pc: int, value: int) -> int:
val = (value_to_int(value) & 0xff00) >> 8
return str(val & 0xff)

def LABEL_L(pc, value):
return (value) & 0xff

def LABEL_H(pc, value):
return ((value) & 0xff00) >> 8

def OFFSET(pc, value):
return ((value - pc)) & 0xff

def value_to_int(value):
# convert value to int from various bases
if isinstance(value, str):
if value[1:3].lower() == '0x':
return int(value, 16)
elif value[1:3].lower == '0c':
return int(value, 8)
elif value[1:3].lower() == '0b':
return int(value, 2)
else:
print('Invalid value.')
elif isinstance(value, int):
return value
# else:
#     raise ValueError("Expected numerical value, got: {0}".format(value))

def register_to_int(reg):
if isinstance(reg, str):
if reg.startswith('@R'):
return int(reg[2:], 16)
elif reg.startswith('R') and reg.endswith(','):
return int(reg[1:-1], 16)
elif reg.startswith('R'):
return int(reg[1:])
else:
raise ValueError("Cannot parse the register: '{0}'".format(reg))

opcode_table = {
# TODO: Get DATA implemented

'HALT': {
'implicit': ['0x00'],
},
'BRA': {
'immediate': ['0x01', VALUE_L],
'offset': ['0x01', OFFSET],
},
'BRC': {
'immediate': ['0x02', VALUE_L],
'offset': ['0x02', OFFSET],
},
'BRZ': {
'immediate': ['0x03', VALUE_L],
'offset': ['0x03', OFFSET],
},
'BRN': {
'immediate': ['0x04', VALUE_L],
'offset': ['0x04', OFFSET],
},
'BRV': {
'immediate': ['0x05', VALUE_L],
'offset': ['0x05', OFFSET],
},
'BSR': {
'immediate': ['0x06', VALUE_L, VALUE_H],
'offset': ['0x06', LABEL_L, LABEL_H],
},
'RTS': {
'implicit': ['0x07'],
},

'SET': {
'direct': ['0x08', VALUE_L, VALUE_H],
},
'LD': {
'register': ['0x10'],
'indirect': ['0x20'],
},
'ST': {
'register': ['0x18'],
'indirect': ['0x28'],
},
'LDD': {
'indirect': ['0x30'],
},
'STD': {
'indirect': ['0x38'],
},
'POP': {
'indirect': ['0x40'],
},
'STP': {
'indirect': ['0x48'],
},
'ADD': {
'register': ['0x50'],
},
'SUB': {
'register': ['0x58'],
},
'MUL': {
'register': ['0x60'],
},
'DIV': {
'register': ['0x68'],
},
'AND': {
'register': ['0x70'],
},
'OR': {
'register': ['0x78'],
},
'XOR': {
'register': ['0x80'],
},
'NOT': {
'register': ['0x88'],
},
'SHL': {
'register': ['0x90'],
},
'SHR': {
'register': ['0x98'],
},
'ROL': {
'register': ['0xA0'],
},
'ROR': {
'register': ['0xA8'],
},
'POPD': {
'indirect': ['0xE0'],
},
'CPR': {
'register': ['0xE8'],
},
'INC': {
'register': ['0xF0'],
},
'DEC': {
'register': ['0xF8'],
},
}

def parse_value(val: str) -> str:
""" Try to parse a value, which can be an integer
in base 2, 8, 10, 16 or a register.

Returns: A string representation of the integer
value in base 10, or empty string if the value
cannot be converted to an integer.
On Error: return original value.
"""
# convert value to int from various bases
if isinstance(val, str):
# Get any possible prefix
val = val.strip(' ')
base = val[0:2]
if val.isnumeric():
return str(int(val, 10))
if base == '0x':
return str(int(val, 16))
elif base == '0c':
return str(int(val, 8))
elif base == '0b':
return str(int(val, 2))
elif val.__contains__('R'):
return ''
else:
return val

def parse_address_mode(fields: list) -> str:
""" Example inputs:
['HALT']                    : implicit mode
['BRA', '0x1F']             : immediate mode
[LD, R2]                    : register
[STD, @R3]                  : indirect
['SET', 'R0,', '0xFFFE']    : direct
['BYTE' '0x1f']             : immediate
['WORD' '0x3BFC']           : immediate
['STRING' 'This is a test'] : immediate
['BRC' 'end']               : offset
"""
numfields = len(fields)
if numfields == 1:
return 'implicit'
elif numfields == 2:
# if fields[0] in directives:
#     return 'immediate'
if fields[1].__contains__('@R'):
return 'indirect'
elif fields[1].__contains__('R') and fields[1].__contains__(','):
return 'direct'
elif fields[1].__contains__('R'):
return 'register'
elif parse_value(fields[1]).isnumeric():
return 'immediate'
elif parse_value(fields[1]).isalnum() or parse_value(fields[1]).isalpha():
return 'offset'
elif numfields == 3 and parse_value(fields[2]).isnumeric():
return 'direct'
elif numfields == 3 and '(' in fields[2] and ')' in fields[2]:
return 'direct'
else:
raise ValueError('Expected numeric value got: {expected}'.format(expected=fields[2]))

def parse_register(field: str) -> str:
""" Examples:
R2
@R3
R2, 0x1F
"""
register = field.upper()
if register.__contains__('R'):
pos = field.find('R')
register = field[pos + 1:pos + 2]
return register

def parse_opcode(line: str, symbols) -> tuple:
""" Break the line into its constituent parts.
Locate any labels and store their definition.
Then locate the instruction's addressing mode,
and stores it in mode. Calculate the instruction's
actual opcode value.

Returns: tuple(value, register, mode, objcode) where
value is a string to be evaluated in the second pass.
register is the register value or empty string if no
register exists. "mode" is the addressing mode and
objcode is a dict containing the base opcode value,
and a list of functions needed to process the
instruction
"""
fields = line.split(None, 1)
nofields = len(fields)
if nofields > 1:
extra = fields[1].split(',')
if len(extra) > 1:
fields[1] = extra[0]
fields.extend(extra[1:])
nofields = len(fields)
mnemonic = fields[0]
register = ''
value = ''

""" Examples:
HALT
BRA 0x1F
SET R0, 0xFFFE
LD R2
STD @R3
"""
# Get register and value
if nofields > 1:
register = parse_register(fields[1])

if register == '' and nofields > 1:
value = parse_value(fields[1])

if nofields > 2:
register = parse_register(fields[1])
value = parse_value(fields[2])

# Get address mode
mode = parse_address_mode(fields)

# Get all addressing modes for this mnemonic
opcodemodes = opcode_table.get(mnemonic)
if not opcodemodes:
raise AssemblyError("Unknown opcode '{}'".format(mnemonic))

# Get the address mode used in the instruction
objcode = opcodemodes.get(mode)
if not objcode:
raise AssemblyError("Invalid addressing mode '{0}' for {1}".format(mode, mnemonic))

return value, register, mode, mnemonic, list(objcode)

def parse_lines(lines, symbols):
""" Determine mnemonic, register, value, and address mode"""
for lineno, line in enumerate(lines, 1):
# Handle labels
label, *colon, statement = line.rpartition(":")
try:
# parse the line into ir (intermediate representation)
data = lineno, label, parse_opcode(statement, symbols) if statement \
else (None, None, None, None, None)
yield data
except AssemblyError as e:
print("{0:4d} : Error : {1}".format(lineno, e))

def assemble(lines, lc=0):
"""Breaks the line into its constituent parts.
it then locates the instruction's addressing mode,
stores it in mode, and calculates the instruction's
actual opcode value and length. """
objcode = []
symbols = {}

# Pass 1 : Parse instructions and create intermediate code
for lineno, label, (value, register, mode, mnemonic, icode) in parse_lines(lines, symbols):
# Try to evaluate numeric labels and set the lc (location counter)
if label:
try:
lc = int(eval(label, symbols))
except (ValueError, NameError):
symbols[label] = lc

# Store the resulting object code for later
# expansion and adjust the location counter lc.
if icode:
objcode.append((lineno, lc, value, register, mode, mnemonic, icode))
lc += len(icode)

# Second pass -- Assembly
execode = []
for lineno, pc, value, register, mode, mnemonic, icode in objcode:
# Evaluate the string
try:
symbols[pc] = pc

if value is not '':
realvalue = eval(value, symbols)

if isinstance(realvalue, str):
realvalue = ord(realvalue) & 0xff

if not isinstance(realvalue, int):
raise TypeError("Integer expected in {0}".format(value))
except Exception as e:
print("{0:4d} : Error : {1}".format(lineno, e), file=sys.stderr)
realvalue = 0

# Encode the instruction. If register is present,
# it must be added to the base opcode value stored
# in icode[0]
opcode = int(icode[0], 16)
if register != '':
opcode += int(register, 16)
encode = [op(pc, realvalue) if isinstance(op, Callable) else op for op in icode]
encode[0] = opcode

execode.append((lineno, pc, encode))
return execode

def replace_register_names(line):
""" Replace register names with 'R' + register index.
Example: ACC becomes R0. STATUS becomes R6."""
line = line.replace('ACC', 'R0')
line = line.replace('RETSTACK', 'R4')
line = line.replace('COMP', 'R5')
line = line.replace('STATUS', 'R6')
line = line.replace('PC', 'R7')
return line

def strip_lines(lines):
""" Takes a sequence of lines and strips comments and blank lines."""
for line in lines:
comment_index = line.find(";")
if comment_index >= 0:
line = line[:comment_index]
line = line.strip()
line = replace_register_names(line)
yield line

def write_rom_code(execode):
last_pc = 0
rom = []

for lineno, pc, ecode in execode:
if last_pc > pc:
pass
elif last_pc < pc:
while last_pc < pc:
rom.append('0x00')
last_pc += 1
else:
for byte in ecode:
byte = '0x%.2x' % int(byte)
rom.append(byte)
last_pc += 1

while last_pc % 16 != 0:
rom.append('0x00')
last_pc += 1

return rom

def rom2hexfile(rom):
col = 0
row = 0

max_cols = 16
max_rows = int(len(rom) / max_cols)
idx = 0
filename = 'a.hex'

# Generate output filename
# If none given use input filename's
# path and base name with a '.hex'
# extension appended.
if len(sys.argv) > 2:
# we should have an outfile name
filename = sys.argv[2]
else:
file = sys.argv[1]
file = file.split('.')
head, tail = os.path.split(sys.argv[1])
parts = tail.split('.')
filename = os.path.join(head, parts[0] + '.hex')

with open(filename, 'w') as ofh:
for b in rom:
if col == 0:
ofh.write('{:02x} {:02x} '.format((idx & 0xff), (idx & 0xff00) >> 8))
ofh.write('{:02x} '.format(int(b, 16)))
else:
ofh.write('{:02x} '.format(int(b, 16)))
col += 1
if col >= max_cols:
row += 1
col = 0
ofh.write('\n')
if row >= max_rows:
ofh.write('\n')
break;
idx += 1
ofh.close()

def main():

if len(sys.argv) >= 2:
filename = sys.argv[1]

# Remove comments and blank lines
lines = strip_lines(open(filename))

if 1:
rom = write_rom_code(assemble(lines))
rom2hexfile(rom)
print(rom)

if __name__ == '__main__':
import sys, os
main()```

OK, that it for today. We now have a working assembler. However, there are a few features I would like to add. So next time we will add a few simple assembler directives and complete the assembler. Then we will develop a few simple assembly language programs for the Sweet16gp.

Until next time, Keep coding!

## Sweet16-GP Assembler – Part 2

0 Comments
This entry is part 7 of 8 in the series Sweet16-GP CPU: A Complete Development Cycle

Last time we left our assembler with only two functions. The strip_lines() function which removes comments and blank lines and the rename_registers() function which replaces the register’s friendly names with the cononical names i.e. Rx where x is a number between 0 and 7 (the register’s index value into the register file).

With these to pre-processing functions out of the way we are ready to start processing our assembly code.  We will use a two-pass method of assembly. The first pass will break each line into its constituent parts and then locate the instruction’s addressing mode, stores it in a field called mode, and calculate the instruction’s actual opcode value and length. In addition it will store a labels in a symbol table to record their value for later use in the second pass or the assembler.

Our main assembler method will take our clean lines, one at a time, and parse them as described above. We will need a table of instructions that includes details about each instruct including it’s addressing mode(s), opcode(s) and length. You may be wondering why the “(s)” above. This is because some of our instructions have multiple addressing modes and therefore multipple opcodes and perhaps even multiple instruction lengths. We will use our table to disambiguate the instruction format into the proper values.

We will implement our opcode_table as a dict. Each entry will be  indexed (key) by the instruction mnemonic.  The value for each key in the table will be a dict indexed by the addressing mode. Each addressing mode will then have a list of values containing the instruction’s opcode. If the opcode requires a one or two by value, they will be provided as VALUE_L and VALUE_H in the list. This description sounds much more complex than it really is. Refere to the code for our table below and re-read the above paragraph. It should all become clear.

```opcode_table = {
'HALT': {
'implicit': ['0x00'],
},
'BRA': {
'immediate': ['0x01', VALUE_L],
'offset': ['0x01', OFFSET],
},
'BRC': {
'immediate': ['0x02', VALUE_L],
'offset': ['0x02', OFFSET],
},
'BRZ': {
'immediate': ['0x03', VALUE_L],
'offset': ['0x03', OFFSET],
},
'BRN': {
'immediate': ['0x04', VALUE_L],
'offset': ['0x04', OFFSET],
},
'BRV': {
'immediate': ['0x05', VALUE_L],
'offset': ['0x05', OFFSET],
},
'BSR': {
'immediate': ['0x06', VALUE_L, VALUE_H],
'offset': ['0x06', LABEL_L, LABEL_H],
},
'RTS': {
'implicit': ['0x07'],
},

'SET': {
'direct': ['0x08', VALUE_L, VALUE_H],
},
'LD': {
'register': ['0x10'],
'indirect': ['0x20'],
},
'ST': {
'register': ['0x18'],
'indirect': ['0x28'],
},
'LDD': {
'indirect': ['0x30'],
},
'STD': {
'indirect': ['0x38'],
},
'POP': {
'indirect': ['0x40'],
},
'STP': {
'indirect': ['0x48'],
},
'ADD': {
'register': ['0x50'],
},
'SUB': {
'register': ['0x58'],
},
'MUL': {
'register': ['0x60'],
},
'DIV': {
'register': ['0x68'],
},
'AND': {
'register': ['0x70'],
},
'OR': {
'register': ['0x78'],
},
'XOR': {
'register': ['0x80'],
},
'NOT': {
'register': ['0x88'],
},
'SHL': {
'register': ['0x90'],
},
'SHR': {
'register': ['0x98'],
},
'ROL': {
'register': ['0xA0'],
},
'ROR': {
'register': ['0xA8'],
},
'POPD': {
'indirect': ['0xE0'],
},
'CPR': {
'register': ['0xE8'],
},
'INC': {
'register': ['0xF0'],
},
'DEC': {
'register': ['0xF8'],
},
}
```

As you can see, our instructions can fall into one of four addressing modes. These include implicit (when the instruction itself indicates the addressing mode),  immediate (when the value needed by the instruction directly follows it), indirect (when the register specified contains the value needed to complete the instructions, and register (when the instruction operates directly on the register specified). To tell the truth, these could be further broken down but that would only complicate things.

OK, now we have our opcode_table we are going to start writing our parser. Our parser is a little bit unorthodox. Since our input language is so constrained we can forego all the formal theory and much of the discipline or parser development.

I do believe however, that any developer worth their weight should write at least a simple parser at some point. Understanding even a very simple parser expands your understanding of how your tools work whether you’re a web developer, AI master, ETL programmer, or Embedded Systems developer. If you call yourself a Software Engineer or Programmer and you have written at least a simple parser. compiler, or interpreter, then you owe it to yourself to do so! To get you started I recomend DR. Jack Crenshaw’s “Let’s Build A Compiler” and Ruslans’s “Let’s Build A Simple Interpreter” blog serries. If you’re a seasoned developer checkout the book “Language Implementation Patterns”. For those mathematically oriented, checkout the Dragon Book

So what I am going to do is simply plow forward. We will tackle one issue at a time and add functions to handle each issue as it arises. Note that this technique wont work if you’re trying to write a high level language compiler or interpreter. There you will need all the formal methods and a have good design before writing code. What we are going for here is a quick and simple boot-strap assembler. We can write better tools later. For now, we just need something to get us up and running.

OK, the first thing we are going to need is some set of methods to break up the line of source code we feed to our assembler into it’s various fields. We will create an entry point function called assemble which will do our heavy lifting by calling other functions.

We will need to produce some data about the current source line. This will include any label, register, addressing mode, and value. Additionally, we may also want to keep track of the original instruction mnemonic and any intermediate code we produce. So we will need a supporting function to handle this. We will also need some storage for any lables or symbols we encounter during parsing.

The code below is our entry point. It will be responsible for handling our two apasses over our code and  will call the functions that do the heavy lifting.

```def assemble(lines, lc=0):
"""Breaks the line into its constituent parts.
it then locates the instruction's addressing mode,
stores it in mode, and calculates the instruction's
actual opcode value and length. """
objcode = []
symbols = {}

# Pass 1 : Parse instructions and create intermediate code
for lineno, label, (value, register, mode, mnemonic, icode) in parse_lines(lines, symbols):
# Try to evaluate numeric labels and set the lc (location counter)
if label:
try:
lc = int(eval(label, symbols))
except (ValueError, NameError):
symbols[label] = lc

# Store the resulting object code for later expansion
if icode:
objcode.append((lineno, lc, value, register, mode, mnemonic, icode))
lc += len(icode
```

Note the call to parse_lines() we will write this function next. But, let’s step back and think about what we need it to do…

Given a line of code like:

`start:    SET   R1, 0xFA`

We can see we need to break up this line into it’s smaller parts. Luckily, this is pretty easy to do. Also, since each line will follow a similar pattern for the fields, we can easily deduce what each field contains.

```# Exception used for errors
class AssemblyError(Exception):
pass

def parse_lines(lines, symbols):
""" Determine line number, mnemonic, register, value, and address mode"""
for lineno, line in enumerate(lines, 1):
# Handle labels
label, *colon, statement = line.rpartition(":")
try:
# parse the line into ir (intermediate representation)
data = lineno, label, parse_opcode(statement, symbols) if statement else (None, None, None, None, None)
yield data
except AssemblyError as e:
print("{0:4d} : Error : {1}".format(lineno, e))```

Refering to the code above, we use the enumerate function to keep track of our line numbers. One drawback to doing this approach is that our comment and blank lines wont be counted. Something we could fix by tracking line numbers in the strip_lines() function. But for now, we will do it this way.

Enumerate returns to us an integer value for the line number (lineno) and the line of source code. Next, we use rpartition(“:”) on on the source line to break it into three parts. These parts are return in a tuple containing the part before the seperator (colon in our case, which ends a label declaration), the seperator, and the part after the seperator. If any part is not found (for example, if there is no colon in the source line) then that part returns an empty string value. So if our line does not contains a colon, we will get an empty string for the label and for the seperator followed by the initial line of code which we save in the statement variable.

Once we have gotten any label, we need to take our statement and further parse it into its various parts. If it turns out that our statement is empty, we need to return a default tuple of ‘None’ values otherwise, we make a call to parse_opcode() passing in our statement and our symbol table so that any symbols found in our source line can be handled. If this all goes horribly wrong, we throw an exception and print the error message. We defined a simple exception class above for this purpose.

Our next task is to implement a function to parse the line into opcodes. We called this function pasre_opcode() above. The implementation is shown below:

```def parse_opcode(line: str, symbols) -> tuple:
""" Break the line into its constituent parts.
Locate any labels and store their definition.
Then locate the instruction's addressing mode,
and stores it in mode. Calculate the instruction's
actual opcode value.

Returns: tuple(value, register, mode, objcode) where
value is a string to be evaluated in the second pass.
register is the register value or empty string if no
register exists. "mode" is the addressing mode and
objcode is a dict containing the base opcode value,
and a list of functions needed to process the
instruction
"""
fields = line.split(None, 1)
nofields = len(fields)
if nofields > 1:
extra = fields[1].split(',')
if len(extra) > 1:
fields[1] = extra[0]
fields.extend(extra[1:])
nofields = len(fields)
mnemonic = fields[0]
register = ''
value = ''

""" Examples:
HALT
BRA 0x1F
SET R0, 0xFFFE
LD R2
STD @R3
"""
# Get register and value
if nofields > 1:
register = parse_register(fields[1])

if register == '' and nofields > 1:
value = parse_value(fields[1])

if nofields > 2:
register = parse_register(fields[1])
value = parse_value(fields[2])

# Get address mode
mode = parse_address_mode(fields)

# Get all addressing modes for this mnemonic
opcodemodes = opcode_table.get(mnemonic)
if not opcodemodes:
raise AssemblyError("Unknown opcode '{}'".format(mnemonic))

# Get the address mode used in the instruction
objcode = opcodemodes.get(mode)
if not objcode:
raise AssemblyError("Invalid addressing mode '{0}' for {1}".format(mode, mnemonic))

return value, register, mode, mnemonic, list(objcode)
```

There is a lot going on here so let’s unpack it. First, we use the split() method to split the statement into two portions. The first parameter to split() gives the character to split on and the second parameter is used to limit the number of splits. What we are doing here is seperating the instruction mnemonic from the rest of the statement. Since some instructions (i.e.: HALT) only have the mnemonic we need to check the number of fields we get back. If it’s only one, then we have an instruction like HALT or RTS.

However, if we have more than one field returned from the split we need to further decompose the line. We now have a mnemonic in fields[0] and the remainder of the line in fields[1]. We now know we may have something like Rx, @Rx, or Rx, 0xFA following the instruction mnemonic. So the next move is to tray and seperate the remaining portion of the statement on the comma seperator. If the comma is found, one of two cases will be returned i.e.: (register, value) or @register, value). These are stored in the extra variable. We then add the extra fields to the fields variable and assign mnemonic the value in fields[0] which contains the instruction mnemonic.

Next, using the value of ‘nofields’ which indicate what we should expect next, we sort out the remaining fields and store their values into their perspective variables. In each case, we call parse functions which perform a specific sub-task.

Our first sub-task is to locate any register in the fields. At this point we know if a register exists it should be in fields[1]. So we pass this value to the parse_register() function shown below:

```def parse_register(field: str) -> str:
""" Examples:
R2
@R3
R2, 0x1F
"""
register = field.upper()
if register.__contains__('R'):
pos = field.find('R')
register = field[pos + 1:pos + 2]
return register
```

The first thing we do is convert the register value to upper case. Then we check if the string contains ‘R’ and return the character following the ‘R’ which should be the register index value (a single digit between 0 and 7 inclusively). We then return that value to the caller.

It’s possible that we didn’t have a register. In this case, there may be a value in the fields[1] position. So we check if the call to parse_register() returned a value of an empty string. If the latter was returned, we try and parse a value using the function parse_value() passing in fields[1] as a parameter. This situation occurs with the branch instructions where the mnemonic is directly followed by the offset value for the jump.

```def parse_value(val: str) -> str:
""" Try to parse a value, which can be an integer
in base 2, 8, 10, 16 or a register.

Returns: A string representation of the integer
value in base 10, or empty string if the value
cannot be converted to an integer.
On Error: return original value.
"""
# convert value to int from various bases
if isinstance(val, str):
# Get any possible prefix
val = val.strip(' ')
base = val[0:2]
if val.isnumeric():
return str(int(val, 10))
if base == '0x':
return str(int(val, 16))
elif base == '0c':
return str(int(val, 8))
elif base == '0b':
return str(int(val, 2))
elif val.__contains__('R'):
return ''
else:
return val
```

As you can see above the parse_value() function must handle many types of values. It first strips off any prefix to the value and tests if this matches any of the supported numerical base indicators. If a match is found the value is converted to an integer and then cast into a string and returned to the caller.

Next, we call parse_address_mode() passing in all the fields as we may need more than one to identify the addressing mode. The parse_address_mode() function can only be understood in the context of the opcode_table. So refer to the table as you gork this code:

```def parse_address_mode(fields: list) -> str:
""" Example inputs:
['HALT']                    : implicit mode
['BRA', '0x1F']             : immediate mode
[LD, R2]                    : register
[STD, @R3]                  : indirect
['SET', 'R0,', '0xFFFE']    : direct
['BRC' 'end']               : offset
"""
numfields = len(fields)
if numfields == 1:
return 'implicit'
elif numfields == 2:
if fields[0] in directives:
return 'immediate'
if fields[1].__contains__('@R'):
return 'indirect'
elif fields[1].__contains__('R') and fields[1].__contains__(','):
return 'direct'
elif fields[1].__contains__('R'):
return 'register'
elif parse_value(fields[1]).isnumeric():
return 'immediate'
elif parse_value(fields[1]).isalnum() or parse_value(fields[1]).isalpha():
return 'offset'
elif numfields == 3 and parse_value(fields[2]).isnumeric():
return 'direct'
elif numfields == 3 and '(' in fields[2] and ')' in fields[2]:
return 'direct'
else:
raise ValueError('Expected numeric value got: {expected}'.format(expected=fields[2]))
```

The parse_address_mode() function uses the data contained in the fields to sort out the addressing mode of the instruction and return it to the caller, parse_opcode() which then stores this value in the mode variable.

Next we use the mnemonic value to get the instruction data from the opcode_table and then use the mode variable to get the proper opcode value and instruction format.

If all has gone well, we return a tuple containing the value, register, mode, mnemonic, variables and the object code taken from the opcode_table.

At this poiont our first pass is complete. We have all the data necessary to assemble our instruction code with the possible exception of any forward declared symbol.

We have one last issue before we can try this out. Our opcode_table has method names in some of the fields. We need to implemnt these helper methods:

All of these methods are pretty strainght forward. They simply take the value from the source code and separe it out into high bytes and low bytes in the case of VALUE_H, VALUE_L, LABEL_H, and LABEL_L. The OFFSET function calculates the offset from the current position.

```# Functions used in the creation of object code (used in the table below)
def VALUE_L(pc: int, value):
val = value_to_int(value) & 0xff
return str(val & 0xff)

def VALUE_H(pc: int, value: int) -> int:
val = (value_to_int(value) & 0xff00) >> 8
return str(val & 0xff)

def LABEL_L(pc, value):
print(f'LABEL_L PC: {pc}, value: {value}')
return (value) & 0xff

def LABEL_H(pc, value):
print(f'LABEL_H PC: {pc}, value: {value}')
return ((value) & 0xff00) >> 8

def OFFSET(pc, value):
print(f'OFFSET: {value}')
return ((value - pc)) & 0xff

def value_to_int(value):
# convert value to int from various bases
if isinstance(value, str):
if value[1:3].lower() == '0x':
return int(value, 16)
elif value[1:3].lower == '0c':
return int(value, 8)
elif value[1:3].lower() == '0b':
return int(value, 2)
else:
print('Invalid value.')
elif isinstance(value, int):
return value
# else:
#     raise ValueError("Expected numerical value, got: {0}".format(value))

def register_to_int(reg):
if isinstance(reg, str):
if reg.startswith('@R'):
return int(reg[2:], 16)
elif reg.startswith('R') and reg.endswith(','):
return int(reg[1:-1], 16)
elif reg.startswith('R'):
return int(reg[1:])
else:
raise ValueError("Cannot parse the register: '{0}'".format(reg))
```

PK, I think we’re ready to try this! Below is the final code. Note I made some adjustments to the code from the last installment so we could test what we’ve done so far.

```"""Sweet16-GP Assembler"""

# Exception used for errors
class AssemblyError(Exception):
pass

# Functions used in the creation of object code (used in the table below)
def VALUE_L(pc: int, value):
# print(f'VALUE_L value: {value}')
#val = value_to_int(value) & 0xff
#print(f'VALUE_L val: {val}')
#return str(val & 0xff)
pass

def VALUE_H(pc: int, value: int) -> int:
val = (value_to_int(value) & 0xff00) >> 8
return str(val & 0xff)

def LABEL_L(pc, value):
return (value) & 0xff

def LABEL_H(pc, value):
return ((value) & 0xff00) >> 8

def OFFSET(pc, value):
return ((value - pc)) & 0xff

def value_to_int(value):
# convert value to int from various bases
if isinstance(value, str):
if value[1:3].lower() == '0x':
return int(value, 16)
elif value[1:3].lower == '0c':
return int(value, 8)
elif value[1:3].lower() == '0b':
return int(value, 2)
else:
print('Invalid value.')
elif isinstance(value, int):
return value
# else:
#     raise ValueError("Expected numerical value, got: {0}".format(value))

def register_to_int(reg):
if isinstance(reg, str):
if reg.startswith('@R'):
return int(reg[2:], 16)
elif reg.startswith('R') and reg.endswith(','):
return int(reg[1:-1], 16)
elif reg.startswith('R'):
return int(reg[1:])
else:
raise ValueError("Cannot parse the register: '{0}'".format(reg))

opcode_table = {
'HALT': {
'implicit': ['0x00'],
},
'BRA': {
'immediate': ['0x01', VALUE_L],
'offset': ['0x01', OFFSET],
},
'BRC': {
'immediate': ['0x02', VALUE_L],
'offset': ['0x02', OFFSET],
},
'BRZ': {
'immediate': ['0x03', VALUE_L],
'offset': ['0x03', OFFSET],
},
'BRN': {
'immediate': ['0x04', VALUE_L],
'offset': ['0x04', OFFSET],
},
'BRV': {
'immediate': ['0x05', VALUE_L],
'offset': ['0x05', OFFSET],
},
'BSR': {
'immediate': ['0x06', VALUE_L, VALUE_H],
'offset': ['0x06', LABEL_L, LABEL_H],
},
'RTS': {
'implicit': ['0x07'],
},

'SET': {
'direct': ['0x08', VALUE_L, VALUE_H],
},
'LD': {
'register': ['0x10'],
'indirect': ['0x20'],
},
'ST': {
'register': ['0x18'],
'indirect': ['0x28'],
},
'LDD': {
'indirect': ['0x30'],
},
'STD': {
'indirect': ['0x38'],
},
'POP': {
'indirect': ['0x40'],
},
'STP': {
'indirect': ['0x48'],
},
'ADD': {
'register': ['0x50'],
},
'SUB': {
'register': ['0x58'],
},
'MUL': {
'register': ['0x60'],
},
'DIV': {
'register': ['0x68'],
},
'AND': {
'register': ['0x70'],
},
'OR': {
'register': ['0x78'],
},
'XOR': {
'register': ['0x80'],
},
'NOT': {
'register': ['0x88'],
},
'SHL': {
'register': ['0x90'],
},
'SHR': {
'register': ['0x98'],
},
'ROL': {
'register': ['0xA0'],
},
'ROR': {
'register': ['0xA8'],
},
'POPD': {
'indirect': ['0xE0'],
},
'CPR': {
'register': ['0xE8'],
},
'INC': {
'register': ['0xF0'],
},
'DEC': {
'register': ['0xF8'],
},
}

def parse_value(val: str) -> str:
""" Try to parse a value, which can be an integer
in base 2, 8, 10, 16 or a register.

Returns: A string representation of the integer
value in base 10, or empty string if the value
cannot be converted to an integer.
On Error: return original value.
"""
# convert value to int from various bases
if isinstance(val, str):
# Get any possible prefix
val = val.strip(' ')
base = val[0:2]
if val.isnumeric():
return str(int(val, 10))
if base == '0x':
return str(int(val, 16))
elif base == '0c':
return str(int(val, 8))
elif base == '0b':
return str(int(val, 2))
elif val.__contains__('R'):
return ''
else:
return val

def parse_address_mode(fields: list) -> str:
""" Example inputs:
['HALT']                    : implicit mode
['BRA', '0x1F']             : immediate mode
[LD, R2]                    : register
[STD, @R3]                  : indirect
['SET', 'R0,', '0xFFFE']    : direct
['BYTE' '0x1f']             : immediate
['WORD' '0x3BFC']           : immediate
['STRING' 'This is a test'] : immediate
['BRC' 'end']               : offset
"""
numfields = len(fields)
if numfields == 1:
return 'implicit'
elif numfields == 2:
# if fields[0] in directives:
#     return 'immediate'
if fields[1].__contains__('@R'):
return 'indirect'
elif fields[1].__contains__('R') and fields[1].__contains__(','):
return 'direct'
elif fields[1].__contains__('R'):
return 'register'
elif parse_value(fields[1]).isnumeric():
return 'immediate'
elif parse_value(fields[1]).isalnum() or parse_value(fields[1]).isalpha():
return 'offset'
elif numfields == 3 and parse_value(fields[2]).isnumeric():
return 'direct'
elif numfields == 3 and '(' in fields[2] and ')' in fields[2]:
return 'direct'
else:
raise ValueError('Expected numeric value got: {expected}'.format(expected=fields[2]))

def parse_register(field: str) -> str:
""" Examples:
R2
@R3
R2, 0x1F
"""
register = field.upper()
if register.__contains__('R'):
pos = field.find('R')
register = field[pos + 1:pos + 2]
return register

def parse_opcode(line: str, symbols) -> tuple:
""" Break the line into its constituent parts.
Locate any labels and store their definition.
Then locate the instruction's addressing mode,
and stores it in mode. Calculate the instruction's
actual opcode value.

Returns: tuple(value, register, mode, objcode) where
value is a string to be evaluated in the second pass.
register is the register value or empty string if no
register exists. "mode" is the addressing mode and
objcode is a dict containing the base opcode value,
and a list of functions needed to process the
instruction
"""
fields = line.split(None, 1)
nofields = len(fields)
if nofields > 1:
extra = fields[1].split(',')
if len(extra) > 1:
fields[1] = extra[0]
fields.extend(extra[1:])
nofields = len(fields)
mnemonic = fields[0]
register = ''
value = ''

""" Examples:
HALT
BRA 0x1F
SET R0, 0xFFFE
LD R2
STD @R3
"""
# Get register and value
if nofields > 1:
register = parse_register(fields[1])

if register == '' and nofields > 1:
value = parse_value(fields[1])

if nofields > 2:
register = parse_register(fields[1])
value = parse_value(fields[2])

# Get address mode
mode = parse_address_mode(fields)

# Get all addressing modes for this mnemonic
opcodemodes = opcode_table.get(mnemonic)
if not opcodemodes:
raise AssemblyError("Unknown opcode '{}'".format(mnemonic))

# Get the address mode used in the instruction
objcode = opcodemodes.get(mode)
if not objcode:
raise AssemblyError("Invalid addressing mode '{0}' for {1}".format(mode, mnemonic))

return value, register, mode, mnemonic, list(objcode)

def parse_lines(lines, symbols):
""" Determine mnemonic, register, value, and address mode"""
for lineno, line in enumerate(lines, 1):
# Handle labels
label, *colon, statement = line.rpartition(":")
try:
# parse the line into ir (intermediate representation)
data = lineno, label, parse_opcode(statement, symbols) if statement \
else (None, None, None, None, None)
yield data
except AssemblyError as e:
print("{0:4d} : Error : {1}".format(lineno, e))

def assemble(lines, lc=0):
"""Breaks the line into its constituent parts.
it then locates the instruction's addressing mode,
stores it in mode, and calculates the instruction's
actual opcode value and length. """
objcode = []
symbols = {}

# Pass 1 : Parse instructions and create intermediate code
for lineno, label, (value, register, mode, mnemonic, icode) in parse_lines(lines, symbols):
# Try to evaluate numeric labels and set the lc (location counter)
if label:
try:
lc = int(eval(label, symbols))
except (ValueError, NameError):
symbols[label] = lc

# Store the resulting object code for later
# expansion and adjust the location counter lc.
if icode:
objcode.append((lineno, lc, value, register, mode, mnemonic, icode))
lc += len(icode)
return objcode

def replace_register_names(line):
""" Replace register names with 'R' + register index.
Example: ACC becomes R0. STATUS becomes R6."""
line = line.replace('ACC', 'R0')
line = line.replace('RETSTACK', 'R4')
line = line.replace('COMP', 'R5')
line = line.replace('STATUS', 'R6')
line = line.replace('PC', 'R7')
return line

def strip_lines(lines):
""" Takes a sequence of lines and strips comments and blank lines."""
for line in lines:
comment_index = line.find(";")
if comment_index >= 0:
line = line[:comment_index]
line = line.strip()
line = replace_register_names(line)
yield line

if __name__ == '__main__':
text = ';This is a comment\n' \
'start:  SET  ACC, 0xFFDE  ; This is also a comment\n' \
'end:    HALT  ; end of program'
lines = text.split('\n')

# Remove comments and blank lines
lines = strip_lines(lines)
print(assemble(lines))
```

If you run the above code you should get an output like the following:

```sweet16-articles-code/articles/part-07/assembler_02.py
[
(2, 0, '65502', '0', 'direct', 'SET', ['0x08', , ]),
(3, 3, '', '', 'implicit', 'HALT', ['0x00'])
]```

As you can see we have the intermediate representation for two instructions. All the info we will need to assemble these instructions into actual machine code. Reading the tuples the first value (2) is the line number from the source. The second value (0) is the location counter at the start of the instruction. The next value (65502) is the value provided in the instruction, followed by the register index (0), the addressing mode (direct), and the mnemonic. The final fields holds the opcode information from the opcode_table.  Try this out on several instructions and see that the results make sense. Write some unit test for the various functions and for the assemble as it stands.

This has been a long post. Next time we will tackle the second pass of our assembler. Until then, keep coding!

## Sweet16-GP Assembler

0 Comments
This entry is part 6 of 8 in the series Sweet16-GP CPU: A Complete Development Cycle

In my first post I showed you the instruction set of the Sweet16-GP. Within the instruction set presented we saw examples of the Sweet16-GP assembly language. However, we didn’t talk about the assembly language at that time. It’s now time to discuss the Sweet16-GP’s assembly language.

Statements: Statements in the Sweet16’s assembly language can be borken down into four basic parts; Label, Operation, Operand, and Comment. of all these parts only the Operation is required in all statements. Most statements however also require an operand. The operand can be a register, literal value or label reference. The label at the beginning of the statement is the label declaration.

Comments: The assembly language used with the Sweet16-GP allows comments. Comments begin with a semicolon and end at the end of the line.

Labels: Labels give an human readable name to the line. This name can then be used to reference the line at any point in the program. Labels must begin with a letter or underscore character and may contain alphanumeric and underscore character. Labels declarations must end with a colon. Label declarations occur when the label is located at the beginning of a line. Labels are referenced when they are not the first item in the line. If a label declaration occurs on a line by itself, it is associated with the following line.

Numbers: The assembler can accept numerical values in the following bases: 2, 8, 10, 16. These correspond to binary, octal, decimal, and hexadecimal. Binary values must begin with “0b”; a zero followed by a lower case “B”. Octal values must begin with “0c”, and hexadecimal values must begin with “0x”. Decimal numbers are written normally. Note that all values must be integers. Real numbers are not supported.

Here are some examples of numerical values:

• 0b11101011 – binary
• 0c47 – octal
• 0xfecd – hexadecimal
• 65535 – decimal
• 0fae – not allowed
• 34.92 – not allowed

Style: Assembly language code typically follows a simple coding style. Below is an example of a typical piece of Sweet16-GP assembly code:

```;
; Test for BRN
; On Entry:
; On Exit: ACC = 0x8000, R1 = 0x0010, R2 = 0x01, PC = 0x13, N and V flags Set
;
start:  SET     R0,     0x7FF0      ; Load ACC with 0xFFF0
SET     R2,     0x0001      ; Load R2 with 0x1F00
SET     R1,     0x0000      ; Load R1 with 0x00
SET     R3,     0x0000
calc:   ADD     R2                  ; ADD R2 to ACC and place results in ACC
INC     R1                  ; Count how many times we Add R2 to ACC (R0)
BRV     end                 ; Branch to HALT instruction
BRA     calc                ; Data block FB
end:    HALT                        ; STOP Execution```

As you can see, there is a comment block at the top that documents the code. It is suggested that each block of code be documented this way. Always include the conditions that are expected on entry into the code, and the exit state. List any modified registers, stack pointers, or memory locations that are left in an alter state by the program.

Declare your line labels at the beginning of the line. Make then human readable. If you need long labels place them above the line it should be associated with. Always end a label declaration with a colon. Here are some examples of label declarations:

• start:
• end:
• _loop_01:
• _cond_true:
• _cond_false:
• loop_iter:

Instructions should all be placed in a single column after the label declarations as show above. White space should be used to place the instruction operands in a single column after the instructions. Finally, comments should all line up into a single column. There is nothing in the assembler to enforce this columnization.  However, you’ll find your code much easier to read and groc if you follow these guidelines.

Notes: Many assemblers allow for the operands to be mathematical expressions. They also, typically offer special features like a characters that reference the program counter at the point in the program at which the special character is used. The Sweet16-GP assembler does not include such features. Perhaps a later version will add these feature. However, for this project I wanted to maintain simplicity. Lastly, the assembler also does not support macros or predefined routines.

Assemblers are unique in the parser/translator world in that their source (assembly) language has a nearly one-to-one relationship with its output language (machine code). This fact makes it simple to create basic assemblers using crude pattern matching parsers. Often regular expressions are employed to recognize the various language constructs.

The only difficulty encounter is usually label references. Since labels can be referenced before they are declared. This, however, can be dealt with by breaking the parser into two parts and allowing each part to separately scan the source code.

Our assembler will use a simple two-pass process in which the first pass (scan of the source code) creates a simple intermediate representation. The second pass will take the results of the first pass and convert it to actual machine code. We will also use a table of opcodes and addressing modes to help us build our intermediate code.

The first thing we will need is a simple framework with a line of code for testing. Let’s get that setup. Our assembler will expect our assembly code in an interrable form as it will work line-by-line. So we’ll need to pass our code as a list of lines of assembly.

```"""Sweet16-GP Assembler"""

if __name__ == '__main__':
text = """
;This is a comment
start:  SET  ACC, 0xFFDE  ; This is also a comment
end:    HALT  ; end of program
"""
lines = text.split('\n')
```

Next. to make parsing easier we will write a simple routine to remove comments from our assembly source code.

```def strip_lines(self, lines):
""" Takes a sequence of lines and strips comments and blank lines."""
for line in lines:
comment_index = line.find(";")
if comment_index >= 0:
line = line[:comment_index]
line = line.strip()
yield line```

As you can see we simply loop over the lines in our text and if we find the semicolon character that begins a comment, we remove everything from the semicolon to the end of the line. Next, we call strip() on the line to remove all leading and trailing whitespace, then we yield the line to the caller.

If you’re not familiar with the yield keyword in python 3 I suggest you lookup python 3 generators for a detailed explanation. In a nutshell, however, yield basically stops the loop on each iteration and returns the currently processed line to the caller. When the caller returns back to the strip_lines() method, it will re-enter the method with the state that existed the last time yield was called. In other words, the method remembers where it left off in the loop and starts from there on subsequent calls. You can learn more about the yield keyword here.

It’s time to test this. Change the code to read:

```def strip_lines(lines):
""" Takes a sequence of lines and strips comments and blank lines."""
for line in lines:
comment_index = line.find(";")
if comment_index >= 0:
line = line[:comment_index]
line = line.strip()
yield line

if __name__ == '__main__':
text = """
;This is a comment
start:  SET  ACC, 0xFFDE  ; This is also a comment
end:    HALT  ; end of program
"""
lines = text.split('\n')

# Remove comments and blank lines
for line in strip_lines(lines):
print(line)```

If you run this code now you should get something like the following output:

```start:  SET  ACC, 0xFFDE
end:    HALT

Process finished with exit code 0```

Not bad for the few minutes it took us to write that. We can now take in assembly code and clean it of blank lines and comments.

Now we have one more step of preprocessing before we can move on to assembling our code. Recall that some registers can be referred to by name. For example, R0 is also known as the accumulator and is named ACC. This naming is great for reading and writing code but we can make assembly easier if we replace these friendly names with the canonical register name Rx where x is the numerical index into the register file. The best place to do this is within our strip_lines() method. However, to keep the code modular and easy to understand we’ll write a subroutine for this and make a call to it in strip_lines().

```def replace_register_names(self, line):
""" Replace register names with 'R' + register index.
Example: ACC becomes R0. STATUS becomes R6."""
line = line.replace('ACC', 'R0')
line = line.replace('RETSTACK', 'R4')
line = line.replace('COMP', 'R5')
line = line.replace('STATUS', 'R6')
line = line.replace('PC', 'R7')
return line```

As you can see we simply take a line of text and replace any register names found in the line with the R-value. Now let’s place the call to replace_register_names() in strip_lines().

```def strip_lines(self, lines):
""" Takes a sequence of lines and strips comments and blank lines.
Also, calls replace_register_names on stripped lines."""
for line in lines:
comment_index = line.find(";")
if comment_index >= 0:
line = line[:comment_index]
line = line.strip()
line = replace_register_names(line)
yield line```

Our new code now looks like this:

```"""Sweet16-GP Assembler"""

def replace_register_names(line):
""" Replace register names with 'R' + register index.
Example: ACC becomes R0. STATUS becomes R6."""
line = line.replace('ACC', 'R0')
line = line.replace('RETSTACK', 'R4')
line = line.replace('COMP', 'R5')
line = line.replace('STATUS', 'R6')
line = line.replace('PC', 'R7')
return line

def strip_lines(lines):
""" Takes a sequence of lines and strips comments and blank lines."""
for line in lines:
comment_index = line.find(";")
if comment_index >= 0:
line = line[:comment_index]
line = line.strip()
line = replace_register_names(line)
yield line

if __name__ == '__main__':
text = """
;This is a comment
start:  SET  ACC, 0xFFDE  ; This is also a comment
end:    HALT  ; end of program
"""
lines = text.split('\n')```

If you run the program again, you will see that the register name ACC has been replaced by R0. Exactly what we want. Your output should look like the following:

```start:  SET  R0, 0xFFDE
end:    HALT

Process finished with exit code 0```

That’s it for today, next time we are going to write a method to take our cleaned code and parse it line by line.

## Sweet16 CPU Emulator

0 Comments

Part 4

I hope you played around with the emulator and tried to implement some of the CPU instructions yourself. As promised, in this post we will continue our work on the emulator. First, let me show you my implementation of decode_register_type_instr().

```def decode_register_instr(self, opcode: int, reg_id: int):
if opcode == 0x08:
self.exec_set(opcode, reg_id)
elif opcode == 0x10:
self.exec_ld(opcode, reg_id)
elif opcode == 0x18:
self.exec_st(opcode, reg_id)
elif opcode == 0x20:
self.exec_ld_ea(opcode, reg_id)
elif opcode == 0x28:
self.exec_st_ea(opcode, reg_id)
elif opcode == 0x30:
self.exec_ldd_ea(opcode, reg_id)
elif opcode == 0x38:
self.exec_std_ea(opcode, reg_id)
elif opcode == 0x40:
self.exec_pop_ea(opcode, reg_id)
elif opcode == 0x48:
self.exec_stp_ea(opcode, reg_id)
elif opcode == 0x50:
self.exec_add(opcode, reg_id)
elif opcode == 0x58:
self.exec_sub(opcode, reg_id)
elif opcode == 0x60:
self.exec_mul(opcode, reg_id)
elif opcode == 0x68:
self.exec_div(opcode, reg_id)
elif opcode == 0x70:
self.exec_and(opcode, reg_id)
elif opcode == 0x78:
self.exec_or(opcode, reg_id)
elif opcode == 0x80:
self.exec_xor(opcode, reg_id)
elif opcode == 0x88:
self.exec_not(opcode, reg_id)
elif opcode == 0x90:
self.exec_shl(opcode, reg_id)
elif opcode == 0x98:
self.exec_shr(opcode, reg_id)
elif opcode == 0xA0:
self.exec_rol(opcode, reg_id)
elif opcode == 0xA8:
self.exec_ror(opcode, reg_id)
elif opcode == 0xE0:
self.exec_popd_ea(opcode, reg_id)
elif opcode == 0xE8:
self.exec_cpr(opcode, reg_id)
elif opcode == 0xF0:
self.exec_inc(opcode, reg_id)
elif opcode == 0xF8:
self.exec_dec(opcode, reg_id)
else:
print(f'Unknown OpCode code {opcode}')
self.halt_flag = True```

As you can see, this is nothing but a long set of “if”, “elif” statements checking for a particular opcode and then dispatching to the exec_xxx method. Finally, if we reach the “else” clause then we have an error. So we print a message and halt. Pretty simple right. The harder part is in the exec_xxx methods. But, only slightly harder. We spent some time building up a set of housekeeping methods that do all the heavy lifting for us. So we only need to call a subset of them passing the correct values to implement any of the exec_xxx methods.

I’ll present the code for each of the register type instructions below with a short description of what it does. We’ve already seen the code for SET and LD so I’ll skip those.

ST is our next instruction that needs implemented. The description says:

The contents of ACC are stored in Rn and the branch conditions are set to reflect the new value of Rn. The carry flag is cleared, and the contents of ACC are left undisturbed.

So this just loads the value in ACC (R0) into Rn. Then it sets the flags to reflect the new value of Rn. Let’s see how we might do this:

```def exec_st(self, opcode: int, reg_id: int):
val = self.get_register(self.ACC)
self.set_register(reg_id, val)
self.set_value_flags(val)
self.clear_flags('C')```

Moving on, next we need to handle LD @Rn. Any instruction that uses the @ symbol is using indirect addressing. This means the register Rn is storing an address that we will either write new content into or read the content from. For example, let’s say that R3 contains the value 0x0A (10 in decimal) and we want to get the value located at address 0x0A. We could use LD @R3. R3 contains the memory address we want to get the value from. This shouldn’t be too difficult either:

```def exec_ld_ea(self, opcode: int, reg_id: int):
val = self.peek_byte(self.get_register(reg_id))
self.inc_register(reg_id)
self.set_register(self.ACC, val)
self.set_value_flags(val)
self.clear_flags('C')```

I’ve appended _ea to all instructions that use indirect or effective addressing modes.

Our housekeeping methods make implementation quick and simple. Now on to ST @Rn.

As you might have guessed, ST @Rn simply stores the low byte value in the ACC into the address pointed to by Rn and increments Rn. Here’s the official description.

`The low order byte of ACC is stored into memory location whose address is resides in Rn. Branch conditions are set to reflect the two byte contents of ACC (R0). The carry flag is cleared and Rn is incremented by 1.`

Here’s the code:

```def exec_st_ea(self, opcode: int, reg_id: int):
val = self.get_register_lsb(self.ACC)
self.poke_byte(self.get_register(reg_id), val)
self.inc_register(reg_id)
self.set_value_flags(val)
self.clear_flags('C')```

Most of the register instructions come in two forms, byte and word and are distinguished by the trailing “D” for the word sized version. For example LD is load a byte and LDD is load a word. Respectively, ST is store a byte and STD is store a word.

Our nest set of instructions are just word versions of their byte sized versions we’ve already dealt with. Their main difference is the need to fetch two bytes and combine them together to form the correct word value. So we’ll just present the code and leave it to you to compare the two versions.

```def exec_ldd_ea(self, opcode: int, reg_id: int):
val = self.peek_word(self.get_register(reg_id))
self.inc_register(reg_id)
self.inc_register(reg_id)
self.set_register(self.ACC, val)
self.set_value_flags(val)
self.clear_flags('C')

def exec_std_ea(self, opcode: int, reg_id: int):
val = self.get_register(self.ACC)
self.poke_word(self.get_register(reg_id), val)
self.inc_register(reg_id)
self.inc_register(reg_id)
self.set_value_flags(val)
self.clear_flags('C')```

Our next instruction is the POP @Rn instruction. It’s description is:

Rn is decremented by 1. Then the low order byte of ACC (R0) is loaded from the memory locations whose address resides in (the decremented) Rn. The high order byte of ACC is cleared and the status bits are set to reflect the final value of ACC whose content will always be positive. The carry flag is cleared. Because Rn is decremented before prior to loading the ACC, single byte stacks can be implemented using ST @Rn and POP @Rn.

And here’s the code to implement it:

```def exec_pop_ea(self, opcode: int, reg_id: int):
self.dec_register(reg_id)
val = self.peek_byte(self.get_register(reg_id))
self.set_register(self.ACC, val)
self.set_value_flags(val)
self.clear_flags('C')```

STP is a bit unique in that it has a special purpose. It is used for moving memory. It’s a store instruction with indirect addressing followed by a decrement of the pointer value. Here’s the description:

Rn is decremented by 1 and the low order byte of ACC (R0) is stored in to the memory location that resides in Rn. Status bits are set to reflect the final 16 bit value of ACC. The contents of ACC are left undisturbed. STP @Rn and POP @Rn can be used together to move blocks of memory beginning with the greatest address and working downward.

And the code:

```def exec_stp_ea(self, opcode: int, reg_id: int):
lo = self.get_register_lsb(self.ACC)
self.dec_register(reg_id)
self.poke_byte(self.get_register(reg_id), lo)
self.set_value_flags(lo)
self.clear_flags('C')```

The rest of the register type instructions are pretty easy to figure out. So in the interest of saving time and space I’ll just show the code below. It’s important, however, that you read them and see how they are implemented. Truly, if you read just the first couple you’ll understand them all. So here is the code for the remaining register type instructions:

```def exec_add(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
isum = v1 + v2
self.set_value_flags(isum)
self.set_overflow_flags(v1, v2, isum)
self.set_carry(v1, v2, isum)
val = isum & 0xffff
self.set_register(self.ACC, val)

def exec_sub(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
isum = v1 - v2
self.set_value_flags(isum)
self.set_overflow_flags(v1, v2, isum)
self.set_borrow(v1, v2)
val = isum & 0xffff
self.set_register(self.ACC, val)

def exec_mul(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
isum = v1 * v2
self.set_value_flags(isum)
self.set_overflow_flags(v1, v2, isum)
val = isum & 0xffff
self.set_register(self.ACC, val)

def exec_div(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
isum = v1 // v2
self.set_value_flags(isum)
self.set_overflow_flags(v1, v2, isum)
val = isum & 0xffff
self.set_register(self.ACC, val)

def exec_and(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
val = (v1 & v2) & 0xffff
self.set_register(self.ACC, val)
self.set_value_flags(val)

def exec_or(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
val = (v1 | v2) & 0xffff
self.set_register(self.ACC, val)
self.set_value_flags(val)

def exec_xor(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
val = (v1 ^ v2) & 0xffff
self.set_register(self.ACC, val)
self.set_value_flags(val)

def exec_not(self, opcode: int, reg_id: int):
val = ~self.get_register(reg_id)
self.set_register(reg_id, val)
self.set_value_flags(val)
self.clear_flags('C')

def exec_shl(self, opcode: int, reg_id: int):
val = self.get_register(reg_id) << 1
self.set_value_flags(val)
if ((val & 0x10000) >> 16) == 1:
self.set_flags('C')
else:
self.clear_flags('C')
self.set_register(reg_id, val)

def exec_shr(self, opcode: int, reg_id: int):
v1 = self.get_register(reg_id)
val = v1 >> 1
self.set_value_flags(val)
if (v1 & 0x1) == 1:
self.set_flags('C')
else:
self.clear_flags('C')
self.set_register(reg_id, val)

def exec_rol(self, opcode: int, reg_id: int):
val = self.get_register(reg_id) << 1
if ((val & 0x10000) >> 16) == 1:
val |= 0x1
self.set_value_flags(val)
self.clear_flags('C')
self.set_register(reg_id, val)

def exec_ror(self, opcode: int, reg_id: int):
v1 = self.get_register(reg_id)
val = v1 >> 1
if (val & 0x1) == 1:
val |= 0b1000_0000_0000_0000
val = val & 0xffff
self.set_value_flags(val)
self.clear_flags('C')
self.set_register(reg_id, val)

def exec_popd_ea(self, opcode: int, reg_id: int):
self.dec_register(reg_id)
hi = self.peek_byte(self.get_register(reg_id))
self.dec_register(reg_id)
lo = self.peek_byte(self.get_register(reg_id))
val = (hi << 8) + lo
self.set_register(self.ACC, val)
self.set_value_flags(val)
self.clear_flags('C')

def exec_cpr(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
val = (v1 & 0xFFFF) - (v2 & 0xFFFF)
self.set_value_flags(val)
self.set_borrow(v1, v2)
val = val & 0xFFFF
self.set_register(self.COMP, val)

def exec_inc(self, opcode: int, reg_id: int):
self.inc_register(reg_id)
val = self.get_register(reg_id)
self.set_value_flags(val)
self.clear_flags('C')

def exec_dec(self, opcode: int, reg_id: int):
self.dec_register(reg_id)
val = self.get_register(reg_id)
self.set_value_flags(val)
self.clear_flags('C')```

Ok, include this code into your skeleton and try some of the instructions out. Remember to write tests for each of them. Both succeeding and failing tests.

I’ll end this post here today. Next time we’ll implement the remaining 6 or 7 instruction (which are all non register types) and gain a working emulator.

For completeness, here’s the complete sweet16gp.py code for this post:

```"""
Sweet16GP an implementation of an 8/16 bit CPU Emulator
based on Steve Wozniak's SWEET16 virtual machine.
"""

from time import sleep, time
import status_bits

class Sweet16GP:
MEM = []
REGFILE = [0, 0, 0, 0, 0, 0, 0, 0]
ACC = 0
RETSTACK = 4
COMP = 5
STATUS = 6
PC = 7
cur_instr = 0
halt_flag = False

def __init__(self):
self.cold_boot()

# booting routines
def cold_boot(self, mem_size=256):
self.init_ram(mem_size)
self.warm_boot()

def warm_boot(self):
self.init_regfile()

# Memory handling routines
def init_ram(self, mem_size=265):
self.MEM = []
for addr in range(mem_size):
self.MEM.append(0x00)

def poke_byte(self, addr: int, val: int):
self.MEM[addr] = val & 0xff

def peek_byte(self, addr: int) -> int:
return self.MEM[addr] & 0xff

def poke_word(self, addr: int, val: int):
self.poke_byte(addr, (val & 0xff))
self.poke_byte(addr+1, (val & 0xff00) >> 8)

def peek_word(self, addr: int) -> int:
lo = self.peek_byte(addr)
hi = self.peek_byte(addr+1)
val = ((hi << 8) + lo) & 0xffff
return (hi << 8) + lo

# Register handling routines
def init_regfile(self):
for id in range(len(self.REGFILE)):
self.REGFILE[id] = 0x00

def get_register(self, id: int) -> int:
return self.REGFILE[id] & 0xffff

def get_register_lsb(self, id: int) -> int:
return self.REGFILE[id] & 0xff

def get_register_msb(self, id: int) -> int:
return (self.REGFILE[id] & 0xff00) >> 8

def set_register(self, id: int, val: int):
self.REGFILE[id] = val & 0xffff

def set_register_lsb(self, id: int, val: int):
hi = self.REGFILE[id] & 0xff00
self.REGFILE[id] = (hi | (val & 0xff)) & 0xffff

def set_register_msb(self, id: int, val: int):
lo = self.REGFILE[id] & 0xff
self.REGFILE[id] = ((val & 0xff) << 8) | lo

def inc_register(self, _id: int):
val = self.get_register(_id)
val = (val + 1) & 0xffff
self.set_register(_id, val)

def dec_register(self, id: int):
val = self.get_register(id)
val = (val - 1) & 0xffff
self.set_register(id, val)

# Memory handling routines
def init_ram(self, mem_size=265):
self.MEM = []
for addr in range(mem_size):
self.MEM.append(0x00)

def poke_byte(self, addr: int, val: int):
self.MEM[addr] = val & 0xff

def peek_byte(self, addr: int) -> int:
return self.MEM[addr] & 0xff

def poke_word(self, addr: int, val: int):
self.poke_byte(addr, (val & 0xff))
self.poke_byte(addr + 1, (val & 0xff00) >> 8)

def peek_word(self, addr: int) -> int:
lo = self.peek_byte(addr)
hi = self.peek_byte(addr + 1)
val = ((hi << 8) + lo) & 0xffff
return (hi << 8) + lo

# Bit twiddling routines
def bit_set(self, val: int, bit: int) -> int:
return val | (1 << bit)

def bit_clear(self, val: int, bit: int) -> int:
return val & ~(1 << bit)

def bit_toggle(self, val: int, bit: int) -> int:
return val ^ (1 << bit)

def bit_test(self, val: int, bit: int) -> int:
return (val & (1 << bit)) >> bit

# STATUS Flags routines
def set_flags(self, flags: str):
flags = flags.upper()
if flags.__contains__('C'):
self.REGFILE[self.STATUS] |= (1 << status_bits.C)
if flags.__contains__('Z'):
self.REGFILE[self.STATUS] |= (1 << status_bits.Z)
if flags.__contains__('V'):
self.REGFILE[self.STATUS] |= (1 << status_bits.V)
if flags.__contains__('N'):
self.REGFILE[self.STATUS] |= (1 << status_bits.N)

def clear_flags(self, flags: str):
flags = flags.upper()
if flags.__contains__('C'):
self.REGFILE[self.STATUS] &= ~(1 << status_bits.C)
if flags.__contains__('Z'):
self.REGFILE[self.STATUS] &= ~(1 << status_bits.Z)
if flags.__contains__('V'):
self.REGFILE[self.STATUS] &= ~(1 << status_bits.V)
if flags.__contains__('N'):
self.REGFILE[self.STATUS] &= ~(1 << status_bits.N)

def test_flag(self, flag: str) -> int:
flags = flag.upper()
if flags.__contains__('C'):
return (self.REGFILE[self.STATUS] & (1 << status_bits.C)) >> status_bits.C
if flags.__contains__('Z'):
return (self.REGFILE[self.STATUS] & (1 << status_bits.Z)) >> status_bits.Z
if flags.__contains__('V'):
return (self.REGFILE[self.STATUS] & (1 << status_bits.V)) >> status_bits.V
if flags.__contains__('N'):
return (self.REGFILE[self.STATUS] & (1 << status_bits.N)) >> status_bits.N

def set_value_flags(self, val: int):
# Note we can't set the overflow
# flag here as we need both input
# and result values to computer
# overflow.
if val == 0:
self.set_flags('Z')
else:
self.clear_flags('Z')

if val > 0b0111_1111_1111_1111:
self.set_flags('N')
else:
self.clear_flags('N')

if (val & 0x10000) >> 17:
self.set_flags('C')
else:
self.clear_flags('C')

def set_overflow_flags(self, v1: int, v2: int, result: int):
if ((v1 & (1 << 15)) >> 15 & (v2 & (1 << 15)) >> 15) == 1:
# Both values have the sign bit set
# So the result should have the sign
# bit unset. If not, we have overflow
if not (result & (1 << 15)) > 0:
self.set_flags('V')
else:
self.clear_flags('V')
elif ((v1 & (1 << 15)) >> 15 | (v2 & (1 << 15)) >> 15) == 0:
# Both values have the sign bits unset
# So, the result should have the sign
# bit unset.
if ((result & (1 << 15)) >> 15) == 1:
self.set_flags('V')
else:
self.clear_flags('V')

def set_carry(self, v1, v2, result):
# value must be unmasked or it won't include the 17th bit.
if ((result & (1 << 16)) >> 16) == 1:
self.set_flags('C')
else:
self.clear_flags('C')

def set_borrow(self, v1, v2):
if v1 < v2:
self.set_flags('C')
else:
self.clear_flags('C')

# Instruction Decoding
def decode_inst(self, instr: int):
if instr > 7:
opcode = instr & 0b11111000
reg_id = instr & 0b00000111
self.decode_register_instr(opcode, reg_id)
else:
self.decode_non_register_instr(instr)

def decode_register_instr(self, opcode: int, reg_id: int):
if opcode == 0x08:
self.exec_set(opcode, reg_id)
elif opcode == 0x10:
self.exec_ld(opcode, reg_id)
elif opcode == 0x18:
self.exec_st(opcode, reg_id)
elif opcode == 0x20:
self.exec_ld_ea(opcode, reg_id)
elif opcode == 0x28:
self.exec_st_ea(opcode, reg_id)
elif opcode == 0x30:
self.exec_ldd_ea(opcode, reg_id)
elif opcode == 0x38:
self.exec_std_ea(opcode, reg_id)
elif opcode == 0x40:
self.exec_pop_ea(opcode, reg_id)
elif opcode == 0x48:
self.exec_stp_ea(opcode, reg_id)
elif opcode == 0x50:
self.exec_add(opcode, reg_id)
elif opcode == 0x58:
self.exec_sub(opcode, reg_id)
elif opcode == 0x60:
self.exec_mul(opcode, reg_id)
elif opcode == 0x68:
self.exec_div(opcode, reg_id)
elif opcode == 0x70:
self.exec_and(opcode, reg_id)
elif opcode == 0x78:
self.exec_or(opcode, reg_id)
elif opcode == 0x80:
self.exec_xor(opcode, reg_id)
elif opcode == 0x88:
self.exec_not(opcode, reg_id)
elif opcode == 0x90:
self.exec_shl(opcode, reg_id)
elif opcode == 0x98:
self.exec_shr(opcode, reg_id)
elif opcode == 0xA0:
self.exec_rol(opcode, reg_id)
elif opcode == 0xA8:
self.exec_ror(opcode, reg_id)
elif opcode == 0xE0:
self.exec_popd_ea(opcode, reg_id)
elif opcode == 0xE8:
self.exec_cpr(opcode, reg_id)
elif opcode == 0xF0:
self.exec_inc(opcode, reg_id)
elif opcode == 0xF8:
self.exec_dec(opcode, reg_id)
else:
print(f'Unknown OpCode code {opcode}')
self.halt_flag = True

def exec_set(self, opcode: int, reg_id: int):
self.inc_register(self.PC)
lo = self.peek_byte(self.get_register(self.PC))
self.inc_register(self.PC)
hi = self.peek_byte(self.get_register(self.PC))
val = (hi << 8) + lo
self.set_register(reg_id, val)
self.set_value_flags(val)
self.clear_flags('C')

def exec_ld(self, opcode: int, reg_id: int):
val = self.get_register(reg_id)
self.set_register(self.ACC, val)
self.set_value_flags(val)
self.clear_flags('C')

def exec_st(self, opcode: int, reg_id: int):
val = self.get_register(self.ACC)
self.set_register(reg_id, val)
self.set_value_flags(val)
self.clear_flags('C')

def exec_ld_ea(self, opcode: int, reg_id: int):
val = self.peek_byte(self.get_register(reg_id))
self.inc_register(reg_id)
self.set_register(self.ACC, val)
self.set_value_flags(val)
self.clear_flags('C')

def exec_st_ea(self, opcode: int, reg_id: int):
val = self.get_register_lsb(self.ACC)
self.poke_byte(self.get_register(reg_id), val)
self.inc_register(reg_id)
self.set_value_flags(val)
self.clear_flags('C')

def exec_ldd_ea(self, opcode: int, reg_id: int):
val = self.peek_word(self.get_register(reg_id))
self.inc_register(reg_id)
self.inc_register(reg_id)
self.set_register(self.ACC, val)
self.set_value_flags(val)
self.clear_flags('C')

def exec_std_ea(self, opcode: int, reg_id: int):
val = self.get_register(self.ACC)
self.poke_word(self.get_register(reg_id), val)
self.inc_register(reg_id)
self.inc_register(reg_id)
self.set_value_flags(val)
self.clear_flags('C')

def exec_pop_ea(self, opcode: int, reg_id: int):
self.dec_register(reg_id)
val = self.peek_byte(self.get_register(reg_id))
self.set_register(self.ACC, val)
self.set_value_flags(val)
self.clear_flags('C')

def exec_stp_ea(self, opcode: int, reg_id: int):
lo = self.get_register_lsb(self.ACC)
self.dec_register(reg_id)
self.poke_byte(self.get_register(reg_id), lo)
self.set_value_flags(lo)
self.clear_flags('C')

def exec_add(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
isum = v1 + v2
self.set_value_flags(isum)
self.set_overflow_flags(v1, v2, isum)
self.set_carry(v1, v2, isum)
val = isum & 0xffff
self.set_register(self.ACC, val)

def exec_sub(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
isum = v1 - v2
self.set_value_flags(isum)
self.set_overflow_flags(v1, v2, isum)
self.set_borrow(v1, v2)
val = isum & 0xffff
self.set_register(self.ACC, val)

def exec_mul(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
isum = v1 * v2
self.set_value_flags(isum)
self.set_overflow_flags(v1, v2, isum)
val = isum & 0xffff
self.set_register(self.ACC, val)

def exec_div(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
isum = v1 // v2
self.set_value_flags(isum)
self.set_overflow_flags(v1, v2, isum)
val = isum & 0xffff
self.set_register(self.ACC, val)

def exec_and(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
val = (v1 & v2) & 0xffff
self.set_register(self.ACC, val)
self.set_value_flags(val)

def exec_or(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
val = (v1 | v2) & 0xffff
self.set_register(self.ACC, val)
self.set_value_flags(val)

def exec_xor(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
val = (v1 ^ v2) & 0xffff
self.set_register(self.ACC, val)
self.set_value_flags(val)

def exec_not(self, opcode: int, reg_id: int):
val = ~self.get_register(reg_id)
self.set_register(reg_id, val)
self.set_value_flags(val)
self.clear_flags('C')

def exec_shl(self, opcode: int, reg_id: int):
val = self.get_register(reg_id) << 1
self.set_value_flags(val)
if ((val & 0x10000) >> 16) == 1:
self.set_flags('C')
else:
self.clear_flags('C')
self.set_register(reg_id, val)

def exec_shr(self, opcode: int, reg_id: int):
v1 = self.get_register(reg_id)
val = v1 >> 1
self.set_value_flags(val)
if (v1 & 0x1) == 1:
self.set_flags('C')
else:
self.clear_flags('C')
self.set_register(reg_id, val)

def exec_rol(self, opcode: int, reg_id: int):
val = self.get_register(reg_id) << 1
if ((val & 0x10000) >> 16) == 1:
val |= 0x1
self.set_value_flags(val)
self.clear_flags('C')
self.set_register(reg_id, val)

def exec_ror(self, opcode: int, reg_id: int):
v1 = self.get_register(reg_id)
val = v1 >> 1
if (val & 0x1) == 1:
val |= 0b1000_0000_0000_0000
val = val & 0xffff
self.set_value_flags(val)
self.clear_flags('C')
self.set_register(reg_id, val)

def exec_popd_ea(self, opcode: int, reg_id: int):
self.dec_register(reg_id)
hi = self.peek_byte(self.get_register(reg_id))
self.dec_register(reg_id)
lo = self.peek_byte(self.get_register(reg_id))
val = (hi << 8) + lo
self.set_register(self.ACC, val)
self.set_value_flags(val)
self.clear_flags('C')

def exec_cpr(self, opcode: int, reg_id: int):
v1 = self.get_register(self.ACC)
v2 = self.get_register(reg_id)
val = (v1 & 0xFFFF) - (v2 & 0xFFFF)
self.set_value_flags(val)
self.set_borrow(v1, v2)
val = val & 0xFFFF
self.set_register(self.COMP, val)

def exec_inc(self, opcode: int, reg_id: int):
self.inc_register(reg_id)
val = self.get_register(reg_id)
self.set_value_flags(val)
self.clear_flags('C')

def exec_dec(self, opcode: int, reg_id: int):
self.dec_register(reg_id)
val = self.get_register(reg_id)
self.set_value_flags(val)
self.clear_flags('C')

def decode_non_register_instr(self, opcode: int):
if opcode == 0x00:
self.exec_halt()

def exec_halt(self):
self.halt_flag = True

# Misc Methods
def dump(self):
print('\n\nSweet16-GP CPU')
print('---------------------------------------------------------')
self.dump_status()
self.dump_registers()
self.dump_memory()

def dump_status(self):
result = ''
status = self.REGFILE[self.STATUS]
# print('Status 0b{:016b}'.format(status))
if self.bit_test(status, status_bits.C):
result += 'C, '
if self.bit_test(status, status_bits.Z):
result += 'Z, '
if self.bit_test(status, status_bits.V):
result += 'V, '
if self.bit_test(status, status_bits.N):
result += 'N, '
result = result[:-2]
print('Status Flags: 0b{:04b}'.format(self.REGFILE[self.STATUS]))
print('---------------------------------------------------------')
print(str(result) + ' \n')

def dump_registers(self):
idx = 0
col = 0
max_cols = 4
print('Registers')
print('---------------------------------------------------------')
for r in self.REGFILE:
if idx == 0:
print('ACC (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
elif idx == 4:
print('RETPTR (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
elif idx == 5:
print('COMPARE (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
elif idx == 6:
print('STATUS (R{:02}):0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
elif idx == 7:
print('PC (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
else:
# General purpose register
print('R{:02}: 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
idx += 1
col += 1
if col >= max_cols:
col = 0
print()
print()

def dump_memory(self, max_bytes=256):
col = 0
row = 0

max_cols = 16
max_rows = int(max_bytes / max_cols)
idx = 0
print('\nAddr   Data                                  MEMORY DUMP')
print('------------------------------------------------------------------------------------------------------')
for b in self.MEM:
if col == 0:
print('0x{:04x}'.format(idx), end=':  ')
print('0x{:02x}'.format(b), end=', ')
else:
print('0x{:02x}'.format(b), end=', ')
col += 1
if col >= max_cols:
row += 1
col = 0
print()
if row >= max_rows:
print()
break;
idx += 1
print()

# CPU control routines
def run(self):
while not self.halt_flag and self.get_register(self.PC) < len(self.MEM):
self.cur_instr = self.peek_byte(self.get_register(self.PC))
print(f'PC: {hex(self.get_register(self.PC))}, cur_instr: {hex(self.cur_instr)}')
self.decode_inst(self.cur_instr)
sleep(1)
self.inc_register(self.PC)

def main():
cpu = Sweet16GP()
cpu.set_register(0, 0xffae)
cpu.poke_byte(0, 0x08)
cpu.poke_byte(1, 0x05)
cpu.poke_byte(2, 0xff)
cpu.run()
cpu.dump()

if __name__ == '__main__':
# This method is for simple testing
# and demonstration purposes.
main()```

## Sweet16 CPU Emulator

0 Comments

Part 2

Last time I presented the Sweet16-GP and told you I would present an emulated version of the processor. That’s what I intend to do, so let’s get started.

First, what is a software CPU emulator. A software emulator is software that runs on a computer (called the host) and enables that computer to behave like another computer (called the guest). An emulator typically enables the host system to run software or use peripheral devices designed for the guest system. In our case, our emulator will enable our system to run software written for the Sweet16-GP on any computer capable of running python 3.x. In the near future I may rewrite this emulator in PHP and host it so that students it in my workshops can use it online. But for now, I’ll present the python 3 version.

So why would we want an emulator? Why not just build the hardware? Having an emulator will give us a reference design for the hardware. It will allow us to work out issues that might arise in our instruction set and our cpu design concept. Additionally, it will allow software to be developed while the hardware is being developed. This means that we can have some working software to test the hardware on it’s first boot up.

So what do we need to create an emulator? Well, we need the same things a real system will need. First, we need a CPU. The CPU consists of the REGISTER FILE, ALU, and CONTROL LOGIC. In addition we need a place to store our code and data (memory). Memory is simple. We can use an array or list. The ALU consists mainly of logic routines, the register file is again just a list. Then we need a few helper functions.  Writing an emulator sounds like it would be really hard. However, once you see the code you realize it’s not difficult at all.

Let’s start by defining a CPU class for the Sweet16-GP.

``class Sweet16GP:``    def __init__(self):``        pass``

The first thing our CPU will do is boot. The initial boot up that occurs when power is first applied is called a cold boot. To preform the cold boot we’ll simply write a method. We don’t have this method yet so let’s add it and call it from the __init__() method.

`class Sweet16GP:        def __init__(self):        self.cold_boot()    # booting routines    def cold_boot(self, mem_size=256):        pass`

Ok, that’s pretty simple. Now our CPU doesn’t do anything yet and it’s still lacking important parts. First, we need some RAM and our Register File. So let’s give our little CPU 16 bytes of RAM and create our Register File to store our registers R0 – R7. We will implement both the register file and RAM as a simple python list. We’ll also include some constants to allow us to refer to the registers by name (ACC, RETSTACK, COMP, STATUS, and PC).

`class Sweet16GP:    MEM = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]    REGFILE = [0, 0, 0, 0, 0, 0, 0, 0]    ACC = 0    RETSTACK = 4    COMP = 5    STATUS = 6    PC = 7        def __init__(self):        self.cold_boot()    # booting routines    def cold_boot(self, mem_size=256):        pass`

Now it’s starting to take shape. Recall from part 1 that our accumulator is register 0, the subroutine return stack pointer is register 4, the compare results register is register 5, the status register is register 6 and the program counter is register 7. We define constants within the class for these values so we can refer to these registers by name when it is desirable to do so.

So now we have 16 bytes of ram we can place instructions in, and our 8 registers. Now we need a way to fetch the instructions we place in memory. For this we’ll need a flag variable to indicate if the CPU is in a halt state and a variable to store the current instruction.

`class Sweet16GP:    MEM = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]    REGFILE = [0, 0, 0, 0, 0, 0, 0, 0]    ACC = 0    RETSTACK = 4    COMP = 5    STATUS = 6    PC = 7    cur_instr = 0    halt_flag = False    def __init__(self):        self.cold_boot()        # booting routines    def cold_boot(self, mem_size=256):        pass`

The halt_flag will be used to halt processor execution and cur_instr will be used to store our current instruction.The PC will be updated to point to the next instruction/data location.

Our next task is to write some code for fetching instructions from memory. We’ll call this the run method as it’s job will be to walk through memory grabbing the instructions pointed to be the PC.

`# Run methodsdef run(self):    while not self.halt_flag and self.get_register(self.PC) < len(self.MEM):        self.cur_instr = self.peek_byte(self.get_register(self.PC))        print(f'PC: {hex(self.get_register(self.PC))}, cur_instr: {hex(self.cur_instr)}')        self.decode_inst(self.cur_instr)        sleep(1)        self.inc_register(self.PC)`

Now this method is making use of methods we don’t have but will soon need. I tend to code what I want and then implement it. Looking at the run method we can see we’ll need a few new methods. But before we talk about them let’s walk through what the run method is doing here.

Let’s see what we have so far:

First we enter a while loop and test if the halt flag is set and if the PC points to a valid memory location. Next, I included a print statement to display the value of the program counter and the current instruction. Then we make a call to decode_inst() passing this method the current instruction. decode_inst will be used to decode our instruction. Remember that our hardware processor includes an instruction decoder. We’ll be implementing all the parts of our hardware processor as method in our CPU class.

```class Sweet16GP:
MEM = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
REGFILE = [0, 0, 0, 0, 0, 0, 0, 0]
ACC = 0
RETSTACK = 4
COMP = 5
STATUS = 6
PC = 7
cur_instr = 0
halt_flag = False

def __init__(self):
self.cold_boot()

# booting routines
def cold_boot(self, mem_size=256):
pass

# Register routines
def self.get_register(self, reg):
pass

def inc_register(self, reg):
pass

# Memory access routines
def peek_byte(self, addr):
pass

# Instruction decoding routines
def decode_inst(self, instr):
pass

# CPU control routines
def run(self):
while not self.halt_flag and self.get_register(self.PC) < len(self.MEM):
self.cur_instr = self.peek_byte(self.get_register(self.PC))
print(f'PC: {hex(self.get_register(self.PC))}, cur_instr: {hex(self.cur_instr)}')
self.decode_inst(self.cur_instr)
sleep(1)
self.inc_register(self.PC)```

OK, at this point we can start to fill in a couple of these methods. But before we do there is an important issue we need to discuss. Real hardware has a finite number of bits used to store it’s contents. Python however, can store almost any value in our registers and memory. This isn’t good as it doesn’t represent a real CPU. We need a way to restrict the values stored in our memory and registers to only those values that can be stored in our hardware version of the CPU.

To accomplish this we will use python’s bitwise logical operators. If you’re not familiar with them, you may want to head on over to the python website and read up them: &, |, ~, <<, and >>. First, & is the bitwise logical AND operator. The pipe | character is the bitwise OR operator, and the tilde is the bitwise NOT operator. The double less-than is a bitwise shift left, and the double greater than is a bitwise shift right operator. If you don’t understand these completely and their typical uses, then read up on them.

We can use these operators to manipulate the values we store in our memory and register lists. If we & (and) the values we place in memory or registers with the values 0xff or 0xffff we can limit the values stored to 8 and 16 bits respectively. This is just what we need to restrict our values to a range we can store on our hardware. Let’s implement the get_register() method. This method simply needs to take in an integer value and store it in the proper location in our register file.

```def get_register(self, id: int) -> int:
return self.REGFILE[id] & 0xffff```

Simple, isn’t it!

Now we’ve left a couple of things undone. Remember our cold_boot() method? Well it doesn’t do anything right now. In an real processor this would reset things like clocks and counters and put everything into a known state so we have a regular starting point. In our emulator we need it to do the same task. However, most systems also have what is called a warm boot. A warm boot occurs when you reset the system using the reset signal. This signal causes the CPU registers to be set into a known initial state. Just as the cold boot does. In real hardware there may be some differences. In our software version we will take a little bit of leeway here do something the real hardware won’t do. We will clear the ram (set it to all zeros values) in the cold boot and leave it untouched during a warm boot. This will help us in keeping track of what values in ram we’ve changed and make debugging easier. However, it’s important to document this deviation from true emulation of the real hardware which will not zero memory for us. So let’s write the code to initialize the memory to all zero values.

```# Memory handling routines
def init_ram(self, mem_size=265):
self.MEM = []
for addr in range(mem_size):
self.MEM.append(0x00)```

We also need to fill in our peek_byte() method and while we’re at it, we should just complete our memory manipulation methods. We know we’re going to need to be able to look (peek) at memory addresses and see what value they contain. We will also need to be able to store (poke) values into an address location, and we should be able to do this with both 8 and 16 bit values. So that leaves us needing the following methods: peek_byte(), poke_byte(), peek_word(), and poke_word(). So our full memory manipulation repertoire now looks as follows:

```# Memory handling routines
def init_ram(self, mem_size=265):
self.MEM = []
for addr in range(mem_size):
self.MEM.append(0x00)

def poke_byte(self, addr: int, val: int):
self.MEM[addr] = val & 0xff

def peek_byte(self, addr: int) -> int:
return self.MEM[addr] & 0xff

def poke_word(self, addr: int, val: int):
self.poke_byte(addr, (val & 0xff))
self.poke_byte(addr+1, (val & 0xff00) >> 8)

def peek_word(self, addr: int) -> int:
lo = self.peek_byte(addr)
hi = self.peek_byte(addr+1)
val = ((hi << 8) + lo) & 0xffff
return (hi << 8) + lo
```

For our boot process we will have __init__() call cold_boot() which will call our init_ram() method to clear memory. But we still need to implement a method to clear the register files. We will have cold_boot() call our warm_boot() method which will in turn call init_regfile() which will zero out all our registers. We could do this all in one method but it’s important to keep each step separate as when the hardware is built some aspects of this process may change, and we’ll want the structure in place to support those changes. I realize to some of you with a lot of programming experience you may be saying “Wait! We shouldn’t be trying to anticipate the future. We should just write the least amount of code to handle the current situation.” And in most cases this is true. However, experience has taught me that we want to call out each process our hardware may go through or down the road we may end up refactoring a lot of code. Also, it’s important from a teaching perspective to have each of these processes in place so those just learning can see each process represented. So let’s see what our boot code looks like now.

```def __init__(self):
self.cold_boot()

# booting routines
def cold_boot(self, mem_size=256):
self.init_ram(mem_size)
self.warm_boot()

def warm_boot(self):
self.init_regfile()

# Register handling routines
def init_regfile(self):
for id in range(len(self.REGFILE)):
self.REGFILE[id] = 0x00```

Ok, while we’re at it, we are going to need a handful of register manipulation methods similar to those for memory manipulation. Recall our registers are 16 bits wide but some instructions work on byte values. We will need to be able to read and write our registers in both word and byte modes. In byte mode we need to be able to read/write both the high and low order bytes in the registers.  We also need to be able to increment and decrement registers. Finally, don’t forget we need to restrict any value going into a register to it’s byte or word size. Below is our complete set of register methods:

```# Register handling routines
def init_regfile(self):
for id in range(len(self.REGFILE)):
self.REGFILE[id] = 0x00

def get_register(self, id: int) -> int:
return self.REGFILE[id] & 0xffff

def get_register_lsb(self, id: int) -> int:
return self.REGFILE[id] & 0xff

def get_register_msb(self, id: int) -> int:
return (self.REGFILE[id] & 0xff00) >> 8

def set_register(self, id: int, val: int):
self.REGFILE[id] = val & 0xffff

def set_register_lsb(self, id: int, val: int):
hi = self.REGFILE[id] & 0xff00
self.REGFILE[id] = (hi | (val & 0xff)) & 0xffff

def set_register_msb(self, id: int, val: int):
lo = self.REGFILE[id] & 0xff
self.REGFILE[id] = ((val & 0xff) << 8) | lo

def inc_register(self, id: int):
val = self.get_register(id)
val = (val + 1) & 0xffff
self.set_register(id, val)

def dec_register(self, id: int):
val = self.get_register(id)
val = (val - 1) & 0xffff
self.set_register(id, val)```

OK, all through this process I’ve been keeping something from you. I’ve presented code as if I have been magically writing each method and simply know instinctively that it works! Well, most of these methods did indeed work the first time I wrote them, but not all of them. What I have been leaving out is the testing that was done during development of each method. I am a big fan of Test Driven Development (TDD, however it isn’t for every project).  While developing the emulator, I wrote tests even before writing my methods. This allows you to slow down and think about what you need and how you’re going to implement it.

I use “unittest” to write unit tests for each method. I test the methods with tests that should succeed and tests that should fail. I try to work out edge cases where there may be issues. Like what happens if I multiply two 16 bit values and place the results into a register without limiting the value to an 8 or 16 bit range as needed? Unit tests are very important in developing code that can be proven to work. However, remember your code is proven only as far as you test it. If you don’t test for a particular input, you can’t prove your code won’t fail given that input. Being human, we will rarely test for all possible values of input. In many cases this would be impractical and in some cases impossible. But we should do our best, within reason, to test each method in every class and every function in our code. I bring this up here as we are going to get quite a bit of code built up here and it’s best to know that your methods work properly before adding more complexity to the project. Below I show a couple test methods. However, I’d like you to write your own tests to test each method we have so far and each method I present in the future. Writing tests is a good habit you should get into doing for all your code.

```def test_warm_boot(self):
from sweet16gp import Sweet16GP
cpu = Sweet16GP()

cpu.REGFILE[cpu.ACC] = 0xffff
cpu.REGFILE[cpu.RETSTACK] = 0xffff
cpu.REGFILE[cpu.COMP] = 0xffff
cpu.REGFILE[cpu.STATUS] = 0xffff
cpu.REGFILE[cpu.PC] = 0xffff

cpu.warm_boot()
self.assertEqual(0, cpu.REGFILE[cpu.ACC], 'Failed to clear ACC register (R0)')
self.assertEqual(0, cpu.REGFILE[cpu.RETSTACK], 'Failed to clear RETSTACK register (R4)')
self.assertEqual(0, cpu.REGFILE[cpu.COMP], 'Failed to clear COMP register (R5)')
self.assertEqual(0, cpu.REGFILE[cpu.STATUS], 'Failed to clear STATUS register (R6)')
self.assertEqual(0, cpu.REGFILE[cpu.PC], 'Failed to clear PC register (R7)')

def test_init_regfile(self):
from sweet16gp import Sweet16GP
cpu = Sweet16GP()
self.assertEqual(8, len(cpu.REGFILE), 'Failed to initialize the register file')

def test_get_register(self):
from sweet16gp import Sweet16GP
cpu = Sweet16GP()
cpu.REGFILE[cpu.ACC] = 0xfafc
self.assertEqual(0xfafc, cpu.get_register(cpu.ACC), 'Failed to get register ACC (R0)')
```

The above code is only a short snippet from my tests, which contain multiple tests for each method written. Testing your code this way will make development easier and it can also help people get up to speed with your code by providing some sample calls.  However, you shouldn’t take the tests as absolute proof the code is without error. It’s only proven as far as it is tested!

Also, note that when possible I try to limit the use of additional methods to setup the pre-test conditions. For example, I directly set the values of registers in the test_warm_boot() method and then call warm_boot() which should reset them all to zero. Then I test with assert methods whether the registers were indeed reset to zero. If I had used the set_register() method to place the pre-test values into the registers and a test failed, I wouldn’t know if it was the set_register() method or the warm_boot()/init_register() methods that failed. However, once I have a good set of tests in place to prove that set_register() works as expected, I can then use it in later tests. This also brings up the point that if you are new to a code base, you should be careful about how you use test code to interpret how to use the code base. In production code you wouldn’t want to directly set values in the registers, as I did in the warm_boot() method above. You would instead want to call the set_register() method to do that for you and get the added bonus of being guaranteed that the value will be limited to the acceptable range of values for that call.

Ok, enough of the sidebar on testing. Let’s get back to implementation. If we look over the instructions in part 1 we can see some instructions will require bit manipulation, so we’re going to need a set of method for performing a set of common bit twiddling tasks. For example, given a value we will need methods to set, clear, toggle, and test a particular bit in the value. Let’s see what these look like:

```# Bit twiddling routines
def bit_set(self, val: int, bit: int) -> int:
return val | (1 << bit)

def bit_clear(self, val: int, bit: int) -> int:
return val & ~(1 << bit)

def bit_toggle(self, val: int, bit: int) -> int:
return val ^ (1 << bit)

def bit_test(self, val: int, bit: int) -> int:
return (val & (1 << bit)) >> bit
```

Go ahead and write some tests for the above routines. Make sure you know what they do and how to use them. These routines mirror the processes that occur in real hardware so it is important to have a good grasp of bit manipulations and how to use them.

These routines lead me to a closely related subject. That of status bits. Recall from part 1 that the CPU contains a STATUS register. This register contains bits that act as flags for certain predefined conditions. These flags almost always represent the state or value of the accumulator (ACC, R0) register. However, they may also represent the results of a CPR instruction. The CPR (Compare) instruction places it’s results in the COMP register (R5) and sets the flags in the Status register to represent the results. The flags bits consist of the N flag which is set if the result of an operation is negative, the V flag which is set if the result of an operation overflows the range of values that can be stored in the machine, the C flag which is set when a carry or borrow occurs in an arithmetic operation, and the Z or zero flag which indicates the previous operation resulted in a zero value.

These flags are used to control program flow via branch instructions. The BEQ instructions is usually preceded with a CPR instruction which compares two values by subtracting one from the other. If the result is zero, the two values were equal and the branch occurs. The overflow and carry flags are used in mathematical operations to test if a result is valid. The N flag can be used to indicate if one value is less or greater than another.

We need to implement methods to set and test these bits in the STATUS register. First, we need to define in which bit of the STATUS register each flags resides. To do this I created a new file called status_bits.py. It simply holds the definition of each flag’s bit position.

```C: int = 0
Z: int = 1
V: int = 6
N: int = 7```

As you can see, the two flags reside at each end of the low order byte of the STATUS register. Once we import this file into our main CPU file we will have access to these values. We’ll need these in our STATUS flag routines which follow:

```# STATUS Flags routines
def set_flags(self, flags: str):
flags = flags.upper()
if flags.__contains__('C'):
self.REGFILE[self.STATUS] |= (1 << status_bits.C)
if flags.__contains__('Z'):
self.REGFILE[self.STATUS] |= (1 << status_bits.Z)
if flags.__contains__('V'):
self.REGFILE[self.STATUS] |= (1 << status_bits.V)
if flags.__contains__('N'):
self.REGFILE[self.STATUS] |= (1 << status_bits.N)

def clear_flags(self, flags: str):
flags = flags.upper()
if flags.__contains__('C'):
self.REGFILE[self.STATUS] &= ~(1 << status_bits.C)
if flags.__contains__('Z'):
self.REGFILE[self.STATUS] &= ~(1 << status_bits.Z)
if flags.__contains__('V'):
self.REGFILE[self.STATUS] &= ~(1 << status_bits.V)
if flags.__contains__('N'):
self.REGFILE[self.STATUS] &= ~(1 << status_bits.N)

def test_flag(self, flag: str) -> int:
flags = flag.upper()
if flags.__contains__('C'):
return (self.REGFILE[self.STATUS] & (1 << status_bits.C)) >> status_bits.C
if flags.__contains__('Z'):
return (self.REGFILE[self.STATUS] & (1 << status_bits.Z)) >> status_bits.Z
if flags.__contains__('V'):
return (self.REGFILE[self.STATUS] & (1 << status_bits.V)) >> status_bits.V
if flags.__contains__('N'):
return (self.REGFILE[self.STATUS] & (1 << status_bits.N)) >> status_bits.N

def set_value_flags(self, val: int):
# Note we can't set the overflow
# flag here as we need both input
# and result values to computer
# overflow.
if val == 0:
self.set_flags('Z')
else:
self.clear_flags('Z')

if val > 0b0111_1111_1111_1111:
self.set_flags('N')
else:
self.clear_flags('N')

if (val & 0x10000) >> 17:
self.set_flags('C')
else:
self.clear_flags('C')

def set_overflow_flags(self, v1: int, v2: int, result: int):
if ((v1 & (1 << 15)) >> 15 & (v2 & (1 << 15)) >> 15) == 1:
# Both values have the sign bit set
# So the result should have the sign
# bit unset. If not, we have overflow
if not (result & (1 << 15)) > 0:
self.set_flags('V')
else:
self.clear_flags('V')
elif ((v1 & (1 << 15)) >> 15 | (v2 & (1 << 15)) >> 15) == 0:
# Both values have the sign bits unset
# So, the result should have the sign
# bit unset.
if ((result & (1 << 15)) >> 15) == 1:
self.set_flags('V')
else:
self.clear_flags('V')

def set_carry(self, v1, v2, result):
# value must be unmasked or it won't include the 17th bit.
if ((result & (1 << 16)) >> 16) == 1:
self.set_flags('C')
else:
self.clear_flags('C')

def set_borrow(self, v1, v2):
if v1 < v2:
self.set_flags('C')
else:
self.clear_flags('C')
```

Again, write some tests for these routines. Read up on overflow condition detection and binary arithmetic and be sure you understand what is going on here.  Note that for overflow we can’t just do a test and set, as the overflow condition occurs in two’s complement arithmetic and requires that we know the state of the values previous to the operation performed in order to properly calculate the overflow.

Also, the carry flag can only be set using the raw (unmasked) result of an operation, as it depends on the 17th bit of the result being tested. If we use a mask value to limit our results to 16 bits then there will be no carry bit to test, so this method must be called before masking the results to 16 bits.

Now it would be nice to have some methods to allow us to peek into the processor and see its state. The debugger in your IDE is good for this, but having a nice dump of the registers, memory and flags would allow us to visually check the results of an operation without needing the debugger.  Let’s implement a few routines for this just to make our lives easier.

```# Misc Methods
def dump(self):
print('\n\nSweet16-GP CPU')
print('---------------------------------------------------------')
self.dump_status()
self.dump_registers()
self.dump_memory()

def dump_status(self):
result = ''
status = self.REGFILE[self.STATUS]
#print('Status 0b{:016b}'.format(status))
if self.bit_test(status,status_bits.C):
result += 'C, '
if self.bit_test(status, status_bits.Z):
result += 'Z, '
if self.bit_test(status, status_bits.V):
result += 'V, '
if self.bit_test(status, status_bits.N):
result += 'N, '
result = result[:-2]
print('Status Flags: 0b{:04b}'.format(self.REGFILE[self.STATUS]))
print('---------------------------------------------------------')
print(str(result) +' \n')

def dump_registers(self):
idx = 0
col = 0
max_cols = 4
print('Registers')
print('---------------------------------------------------------')
for r in self.REGFILE:
if idx == 0:
print('ACC (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
elif idx == 4:
print('RETPTR (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
elif idx == 5:
print('COMPARE (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)),end=',  ')
elif idx == 6:
print('STATUS (R{:02}):0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
elif idx == 7:
print('PC (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
else:
# General purpose register
print('R{:02}: 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
idx += 1
col += 1
if col >= max_cols:
col = 0
print()
print()

def dump_memory(self, max_bytes=256):
col = 0
row = 0

max_cols = 16
max_rows = int(max_bytes/max_cols)
idx = 0
print('\nAddr   Data                                  MEMORY DUMP')
print('------------------------------------------------------------------------------------------------------')
for b in self.MEM:
if col == 0:
print('0x{:04x}'.format(idx), end=':  ')
print('0x{:02x}'.format(b), end=', ')
else:
print('0x{:02x}'.format(b), end=', ')
col += 1
if col >= max_cols:
row += 1
col = 0
print()
if row >= max_rows:
print()
break;
idx += 1
print()
```

That’s quite a bit of code for a post. So let’s see it all together.

```"""
Sweet16GP an implementation of an 8/16 bit CPU Emulator
based on Steve Wozniak's SWEET16 virtual machine.
"""

from time import sleep, time
import status_bits

class Sweet16GP:
MEM = []
REGFILE = [0, 0, 0, 0, 0, 0, 0, 0]
ACC = 0
RETSTACK = 4
COMP = 5
STATUS = 6
PC = 7
cur_instr = 0
halt_flag = False

def __init__(self):
self.cold_boot()

# booting routines
def cold_boot(self, mem_size=256):
self.init_ram(mem_size)
self.warm_boot()

def warm_boot(self):
self.init_regfile()

# Memory handling routines
def init_ram(self, mem_size=265):
self.MEM = []
for addr in range(mem_size):
self.MEM.append(0x00)

def poke_byte(self, addr: int, val: int):
self.MEM[addr] = val & 0xff

def peek_byte(self, addr: int) -> int:
return self.MEM[addr] & 0xff

def poke_word(self, addr: int, val: int):
self.poke_byte(addr, (val & 0xff))
self.poke_byte(addr+1, (val & 0xff00) >> 8)

def peek_word(self, addr: int) -> int:
lo = self.peek_byte(addr)
hi = self.peek_byte(addr+1)
val = ((hi << 8) + lo) & 0xffff
return (hi << 8) + lo

# Register handling routines
def init_regfile(self):
for id in range(len(self.REGFILE)):
self.REGFILE[id] = 0x00

def get_register(self, id: int) -> int:
return self.REGFILE[id] & 0xffff

def get_register_lsb(self, id: int) -> int:
return self.REGFILE[id] & 0xff

def get_register_msb(self, id: int) -> int:
return (self.REGFILE[id] & 0xff00) >> 8

def set_register(self, id: int, val: int):
self.REGFILE[id] = val & 0xffff

def set_register_lsb(self, id: int, val: int):
hi = self.REGFILE[id] & 0xff00
self.REGFILE[id] = (hi | (val & 0xff)) & 0xffff

def set_register_msb(self, id: int, val: int):
lo = self.REGFILE[id] & 0xff
self.REGFILE[id] = ((val & 0xff) << 8) | lo

def inc_register(self, id: int):
val = self.get_register(id)
val = (val + 1) & 0xffff
self.set_register(id, val)

def dec_register(self, id: int):
val = self.get_register(id)
val = (val - 1) & 0xffff
self.set_register(id, val)

# Memory handling routines
def init_ram(self, mem_size=265):
self.MEM = []
for addr in range(mem_size):
self.MEM.append(0x00)

def poke_byte(self, addr: int, val: int):
self.MEM[addr] = val & 0xff

def peek_byte(self, addr: int) -> int:
return self.MEM[addr] & 0xff

def poke_word(self, addr: int, val: int):
self.poke_byte(addr, (val & 0xff))
self.poke_byte(addr + 1, (val & 0xff00) >> 8)

def peek_word(self, addr: int) -> int:
lo = self.peek_byte(addr)
hi = self.peek_byte(addr + 1)
val = ((hi << 8) + lo) & 0xffff
return (hi << 8) + lo

# Bit twiddling routines
def bit_set(self, val: int, bit: int) -> int:
return val | (1 << bit)

def bit_clear(self, val: int, bit: int) -> int:
return val & ~(1 << bit)

def bit_toggle(self, val: int, bit: int) -> int:
return val ^ (1 << bit)

def bit_test(self, val: int, bit: int) -> int:
return (val & (1 << bit)) >> bit

# STATUS Flags routines
def set_flags(self, flags: str):
flags = flags.upper()
if flags.__contains__('C'):
self.REGFILE[self.STATUS] |= (1 << status_bits.C)
if flags.__contains__('Z'):
self.REGFILE[self.STATUS] |= (1 << status_bits.Z)
if flags.__contains__('V'):
self.REGFILE[self.STATUS] |= (1 << status_bits.V)
if flags.__contains__('N'):
self.REGFILE[self.STATUS] |= (1 << status_bits.N)

def clear_flags(self, flags: str):
flags = flags.upper()
if flags.__contains__('C'):
self.REGFILE[self.STATUS] &= ~(1 << status_bits.C)
if flags.__contains__('Z'):
self.REGFILE[self.STATUS] &= ~(1 << status_bits.Z)
if flags.__contains__('V'):
self.REGFILE[self.STATUS] &= ~(1 << status_bits.V)
if flags.__contains__('N'):
self.REGFILE[self.STATUS] &= ~(1 << status_bits.N)

def test_flag(self, flag: str) -> int:
flags = flag.upper()
if flags.__contains__('C'):
return (self.REGFILE[self.STATUS] & (1 << status_bits.C)) >> status_bits.C
if flags.__contains__('Z'):
return (self.REGFILE[self.STATUS] & (1 << status_bits.Z)) >> status_bits.Z
if flags.__contains__('V'):
return (self.REGFILE[self.STATUS] & (1 << status_bits.V)) >> status_bits.V
if flags.__contains__('N'):
return (self.REGFILE[self.STATUS] & (1 << status_bits.N)) >> status_bits.N

def set_value_flags(self, val: int):
# Note we can't set the overflow
# flag here as we need both input
# and result values to computer
# overflow.
if val == 0:
self.set_flags('Z')
else:
self.clear_flags('Z')

if val > 0b0111_1111_1111_1111:
self.set_flags('N')
else:
self.clear_flags('N')

if (val & 0x10000) >> 17:
self.set_flags('C')
else:
self.clear_flags('C')

def set_overflow_flags(self, v1: int, v2: int, result: int):
if ((v1 & (1 << 15)) >> 15 & (v2 & (1 << 15)) >> 15) == 1:
# Both values have the sign bit set
# So the result should have the sign
# bit unset. If not, we have overflow
if not (result & (1 << 15)) > 0:
self.set_flags('V')
else:
self.clear_flags('V')
elif ((v1 & (1 << 15)) >> 15 | (v2 & (1 << 15)) >> 15) == 0:
# Both values have the sign bits unset
# So, the result should have the sign
# bit unset.
if ((result & (1 << 15)) >> 15) == 1:
self.set_flags('V')
else:
self.clear_flags('V')

def set_carry(self, v1, v2, result):
# value must be unmasked or it won't include the 17th bit.
if ((result & (1 << 16)) >> 16) == 1:
self.set_flags('C')
else:
self.clear_flags('C')

def set_borrow(self, v1, v2):
if v1 < v2:
self.set_flags('C')
else:
self.clear_flags('C')

# Instruction decoding routines
def decode_inst(self, instr):
pass

# Misc Methods
def dump(self):
print('\n\nSweet16-GP CPU')
print('---------------------------------------------------------')
self.dump_status()
self.dump_registers()
self.dump_memory()

def dump_status(self):
result = ''
status = self.REGFILE[self.STATUS]
# print('Status 0b{:016b}'.format(status))
if self.bit_test(status, status_bits.C):
result += 'C, '
if self.bit_test(status, status_bits.Z):
result += 'Z, '
if self.bit_test(status, status_bits.V):
result += 'V, '
if self.bit_test(status, status_bits.N):
result += 'N, '
result = result[:-2]
print('Status Flags: 0b{:04b}'.format(self.REGFILE[self.STATUS]))
print('---------------------------------------------------------')
print(str(result) + ' \n')

def dump_registers(self):
idx = 0
col = 0
max_cols = 4
print('Registers')
print('---------------------------------------------------------')
for r in self.REGFILE:
if idx == 0:
print('ACC (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
elif idx == 4:
print('RETPTR (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
elif idx == 5:
print('COMPARE (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
elif idx == 6:
print('STATUS (R{:02}):0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
elif idx == 7:
print('PC (R{:02}): 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
else:
# General purpose register
print('R{:02}: 0x{:04x}'.format(idx, (self.REGFILE[idx] & 0xFFFF)), end=',  ')
idx += 1
col += 1
if col >= max_cols:
col = 0
print()
print()

def dump_memory(self, max_bytes=256):
col = 0
row = 0

max_cols = 16
max_rows = int(max_bytes / max_cols)
idx = 0
print('\nAddr   Data                                  MEMORY DUMP')
print('------------------------------------------------------------------------------------------------------')
for b in self.MEM:
if col == 0:
print('0x{:04x}'.format(idx), end=':  ')
print('0x{:02x}'.format(b), end=', ')
else:
print('0x{:02x}'.format(b), end=', ')
col += 1
if col >= max_cols:
row += 1
col = 0
print()
if row >= max_rows:
print()
break;
idx += 1
print()

# CPU control routines
def run(self):
while not self.halt_flag and self.get_register(self.PC) < len(self.MEM):
self.cur_instr = self.peek_byte(self.get_register(self.PC))
print(f'PC: {hex(self.get_register(self.PC))}, cur_instr: {hex(self.cur_instr)}')
self.decode_inst(self.cur_instr)
sleep(1)
self.inc_register(self.PC)

if __name__ == '__main__':
cpu = Sweet16GP()
cpu.set_register(0, 0xffae)
cpu.poke_byte(0, 0xbc)
cpu.poke_byte(1, 0xd7)
cpu.poke_word(4, 0xabcd)
cpu.dump()```

Now test this code. You only need to write some code to set values in the registers and memory, then call the dump methods and visually inspect the results to ensure they are as expected. This will give us a nice visual for the results of a program run.

Ok, that’s a lot for today. We now have a large set of housekeeping methods. Next time we will revisit the run() method and learn how to decode our instructions .

## Salsa – A Teaching Operating System – Part 3

0 Comments

Last time we discussed the x86 assembly language. We really only scratch the surface of this subject. There is much more to learn. Today we need to address addressing modes (pun intended) and the stack.

First, what is an address mode? When we have an instruction such as “mov” that accesses memory, that address must come from somewhere. The address mode determines how the address for the memory location in the instruction is obtained. For example:

mov ebx, 0x7cff

mov word [ebx], 0x7ce0

In the above example, the first instruction “mov ebx, 0x7cff” load the immediate value “0x7cff” into the EBX register. This is known as immediate mode. In this mode the data to be loaded into the register is part of the instruction and is encoded into the actual machine instruction. You’ll recognize immediate mode by the fact that no memory access (other than fetching the instruction itself) is needed to perform the instruction.

In the second example above, the value 0x7ce0 is moved not into the ebx register but into the memory location pointed to by the value in ebx. So, with the two instructions above 0x7ceo would be placed into memory at address 0x7cff because that is the value in ebx. Anytime you see a register in square braces you can think of this as the register saying “Use the address I contain”. This mode of addressing is called register indirect.The x86 family supports many addressing modes. However, we wont need them all. We’ll introduce others as we move along.

One more thing we should comment on is the use of the “word” value in the second instruction. Many assemblers can infer the proper data size to transfer from memory based on the register used. However, nasm may require the use of a data size to tell it how many bytes to transfer from memory. Here we use the “word” size to transfer 16-bits (two bytes) of data. We can also use byte (8-bits), dword (32-bits), and qword (64-bits). However, for our boot loader we wont be using any 64 bit instructions or registers. This is because our processor will initially be in “real” mode (16-bit mode).

Let’s see another example of storing a byte to memory using an address in the base register bx:

mov byte [ebx], al

As stated, this instruction will move the byte stored in register al (the low order byte of ax) to the memory location pointed to by the value in base register bx.

Now what if we want to transfer the high order byte of ax to memory?

We, we can just use the ah register from the ax pair. Like so:

mov byte [ebx+1], ah

Notice that here we passed a constant offset to the destination by adding 1 to the ebx register. This is very handy when we know ahead of time where the byte should be placed in respect to the base value.

Most x86 assembly instructions (using Intel Syntax) will follow the pattern:

<operation> <destination>, <source>

For example:

mov [ebx], eax

This instruction moves the value in eax the location pointed to by ebx.

Some instructions have an implicit source or destination. For example:

add ebx, 4

This instruction uses ebx as both a source and destination as the instruction takes the value in ebx, adds the immediate value 4 and then places the result back into ebx.

Now let’s look at a simple loop example:

``````   xor eax, eax ; zeros out eax
mov ecx, 10 ; load counter with 10
start:
inc eax ; increment eax, our loop code goes here
loop start``````

The ecx register is usually used to store a count when doing repetitive operations. The line “start:” is a label. Labels are special markers in the code that basically names the current code address and must end with a colon. This allows you to refer to points in your code by a human readable label rather than trying to calculate and use the address actual value.

Here we’ve loaded ecx with the number of times we want to repeat our block of code (in this case the inc eax instruction) and marked the start of the block of code with the label “start”. Finally, we use the “loop” operation followed by the label of the block we want to repeat.

The loop operation decrements the value of ecx. Then if ecx is non-zero, it jumps to the address (label) provided. When ecx is zero, execution will continue to the instruction following the “loop” operation.

We’ve seen enough assembly language to get our feet wet. Now let’s return to our operating system.

### Displaying Text

One of the first task we want our OS to perform will be to print something to the screen so we know it’s alive. We’ve seen how to use the BIOS’s teletype function to print characters to the screen. But how does the BIOS do this?

When the PC is first powered on the BIOS initializes the video display to a 16 color text mode. It places video memory at location 0xB8000. We can write directly to video memory to display text. However, we need to know a bit more first.

The default video mode stores it’s display data as two byte values. The low order byte determines the character to display and the high order byte determines the foreground and background color of the character.

So how to we know what values correspond to which characters? Well the default is to support the ASCII character set. You can find ASCII character set tables all over the internet. But for completeness check out this one.

Now, what about that color byte? The high order byte of each pair contains two nibbles (4-bit values). The lower order nibble (bits 0-3) contain the four bit code for the foreground color and the high order nibble contains the background color. To see the sixteen possible colors supported check out this link: https://en.wikipedia.org/wiki/BIOS_color_attributes.

Now let’s try displaying some colorful text to the screen in video mode 0. In your favorite text editor, enter the following x86 assembly code:

``````;
; File: alphabet.asm
; Auth: Randall Morgan <rmorgan62@gmail.com>
; Desc: Program to display the English alphabet on the display in
; Mode 0, a 25 (row) x 40 (cols) x 16 color display mode.
;

start:
mov byte al, 'A'     ; Character
mov byte ah, 0x0F    ; Color: background 0-black, foreground f = white
mov ecx, 26     ; 26 letters in the alphabet
mov ebx, 0xB8000 ; Set base register to start of video memory

loop_start:
mov word [ebx], ax  ; Move character and color code into video memory
inc al              ; Increment al for next character
inc ah              ; Increment next color pair
add ebx, 2          ; Move to next character
loop loop_start
end:
jmp end         ; infinitly jump here

times 510-(\$-\$\$) db 0 ; Pad file to 510 bytes
dw 0xaa55          ; Add the magic number``````

Save this file in slasa/src as alphabet.asm. Then cd to the salsa directory and execute the following command at the terminal:

nasm src/alphabet.asm -o /bin/alphabet.bin

This will assemble the program into machine code and the BIOS will happily execute it for us as long as we half the magic number properly placed at the end of a 512 byte code block. We wont even need to place it on a 1.44MB disk image for qemu to run it.

Next we need to run the binary file in qemu. We can do this with the following command:

qemu-system-i386 bin/alphabet.bin

This command requires that you are still in the salsa directory. If everything works you should see something like this:

Notice the alphabet in the top left of the screen. Notice that we’ve not only printed the alphabet but we have also changed the foreground and background colors of each character. Now we’re getting somewhere…

Ok, next we need to learn a little more about the x86 architecture before we move on to our next step in OS development.

## The Stack

A stack is a simple First-In-Last-Out data structure. The hardware stack used by the x86 is a reserved section of RAM and is used to store temporary data such as local variables and intermediate values, and parameters.

Keeping track of the stack and the next open location in memory that we can push data to is the job of the SP (Stack Pointer) register. On the 8086 this register is 16-bits and is named SP. On 32 bit x86 architectures the register has been extended to 32 bits and has been renamed ESP (Extended Stack Pointer).

To initialize a stack we only need to set the stack pointer to a valid location in memory. It should be noted that the x86 stack grows downward. So each time we push a byte onto the stack the stack pointer (sp) is decremented. If our stack is initialized to 0xFFFF and we push the al register onto the stack it will save the contents of al to 0xFFFF and then decrement the stack pointer to 0xFFFE. If we then push the bl register, the contents of bl will be placed in memory at 0xFFFE and then the stack pointer (sp) will be decremented again, leaving SP with a value of 0xFFFD. When we pop (remove) a value off the stack, it is first incremented and then the value at it’s new location is returned.

In our example above, the stack is pointing to 0xFFFD which is the next available memory location we can use. If we pop bl, SP is incremented to point to 0xFFFE and then the value at 0XFFFE is returned. Now, we don’t have to push and pop values to/from the same registers all the time. Though often this is the case. It should also be mentioned that it is convention to refer to the top of the stack as the high address (the address SP was initialized with) and the bottom of the stack as the address currently in SP. All memory address are unsigned. So they never go negative. If we initialized SP to 0xFFFF in a 64KB memory system ans the popped a value without first pushing a value onto the stack, SP would underflow with out error and wrap it’s value around to 0x0000. However, with x86 we need to initialize the stack pointer to a high value so it has space to ground downward.

## Branch Instructions

We’ve been using the jmp instruction without really touching on it. The x86 architecture has many branch (sometimes called flow-control) instructions. Some of these instructions are:

• jmp – unconditional jump, i.e. jump always
• je – jump when equal
• jne – jump when not equal
• jz – jump when last result was zero
• jg – jump when greater than
• jge – jump when greater than or equal to
• jl – jump when less than
• jle – jump when less than or equal to

These instructions are usually used in combination with a label:

jz start ; jump if the result of the last operation was zero

These instructions are the only instructions that can modify the value of the IP (Instruction Pointer) register. The IP cannot be directly modified.

This brings us to the Flag register. This register contains various bits that are used to indicate various state information. The most common types of flags used in the Flag register are the Error flags which indicate error states such as arithmetic overflow. Status flags such as the interrupt enable flag, and result flags, such as the carry flag which is set if a carry or borrow occurs during an arithmetic operation.

The Flag register cannot be accessed or modified except for the use of “popfd” and “pushfd” instructions.

The most common use of the Flag register is in combination with the “cmp” (Compare) instruction which subtracts one operand from the other, setting the c (carry), z (zero), and sign (negative). Usually a “cmp” instruction is followed by one of the conditional branch operations. For example:

``````mov eax, [ebx]
cmp eax, 0x0a
jne some_label``````

The above code load a value pointed to be ebx into the eax register then compares that value with the decimal value 10 (0x0a). If the value in eax is not equal to 10 then the conditional jump, “jne some_label” (jump not equal) is taken. Otherwise execution continues with the next instruction.

We’re gaining ground in our understanding of the x86 architecture and assembly language. To get comfortable with writing and assembling programs take a little time and try to write the following programs:

• Write an x86 assembly program to fill the BIOS boot screen with ‘Q’ characters.
• Modify the above program to print each row in a different foreground and background color.
• Write a program that loads the last byte of the boot sector (byte 512) and subtracts 0x55 from it. If the result is zero, print “zero” to the screen. If the result is not zero, print the resulting value to the screen in decimal. Hint: Checkout the ascii character codes for decimal digits.

OK, that’s enough for today. Next time we’ll look at a few registers we haven’t mentioned before. The x86 debug registers. We’ll also start on a few routines we’ll need for our boot loader and OS.

Until next time, Happy Coding!

Newsletter Powered By : XYZScripts.com