Sweet16 an 8/16 CPU Design

This entry is part 1 of 8 in the series Sweet16-GP CPU: A Complete Development Cycle

Part 1

CPU Block Diagram
CPU Block Diagram

I was recently browsing some old issues of Byte for information in an 8-bit video display chip when I came across an article from Woz (Stephen Wozniak) in Byte Magazine November 1977. The article entitled “SWEET16: The 6502 Dream Machine” discussed issues Woz had while developing the Apple BASIC interpreter. The issues revolved around the complex gymnastics one had to do on the 6502 to handle 16-bit values. Steve’s ingenious idea at the time was to create a machine in software that he could then send instructions to, to handle the 16-bit gymnastics and provide the results back to BASIC. He dubbed this software emulated machine the SWEET16.
He first mentioned SWEET16 in the May issue of Byte Magazine in 1977. The November issue gave an in-depth article complete with source code and a lovely break-down of each supported instruction. I recalled reading this article back then and again in the early 1980s and thinking it wouldn’t take many alterations to make the SWEET16 a more general processor. I thought then that it might even be possible to take a few PLD was recently browsing some old issues of Byte for information in an 8-bit video display chip when I came across an article from Woz (Stephen Wozniak) in Byte Magazine November 1977. The article entitled “SWEET16: The 6502 Dream Machine” discussed issues Woz had while developing the Apple BASIC interpreter. The issues revolved around the complex gymnastics one had to do on the 6502 to handle 16-bit values. Steve’s ingenious idea at the time was to create a machine in software that he could then send instructions to, to handle the 16-bit gymnastics and provide the results back to BASIC. He dubbed this software emulated machine the SWEET16.

He first mentioned SWEET16 in the May issue of Byte Magazine in 1977. The time I had concluded that building the SWEET16 using 74xx chips was cost and space prohibitive.

When I stumbled across this article again, I figured it was finally time to take a stab at modifying the SWEET16 into a more general-purpose processor and perhaps even lay it out on an FPGA. I have a few laying around and old Cyclone II and IV dev boards are pretty cheap on Amazon now. Even the Cyclone I has plenty of logic to implement a bare-bones processor like the SWEET16. However, the dedicated purpose of the SWEET16 and it’s original implementation in software meant that it lacked many instructions that would be required by a more general-purpose processor.

What keeps the SWEET16 from being a general-purpose processor is its lack of data manipulation instructions. Woz had a special use case for the SWEET16. It was responsible for dealing with 16-bit address space operations. So the only data manipulation operations the SWEET16 needed was simple addition and subtraction. Most of the instructions provide byte and word-sized stack operations, and byte and word-sized memory move operations. A more general-purpose processor, however, needs several more data manipulation instructions such as logical AND, OR, XOR, and NOT. I also wanted to add bit shifting and rotation and multiply and divide instructions. However, with the instruction encoding scheme of the SWEET16, there simply wasn’t enough open opcodes to support the new instructions.

So, my first task was to figure out how I could modify Steve’s original instruction set and maintain the simplicity of the original design. I studied the instruction set of the SWEET16, which is neatly organized to allow a developer to assemble machine code on his local biologically embedded device (his brain) without the need for an external assembler running on a nonbiological machine (computer). I wanted to maintain the ease of assembling the instruction set. Steve’s solution was to use a single byte opcode and break it into two single-digit hex values. This division was instrumental to the ease of biological assembly. However, I needed more instructions than the original opcodes could support. What to do…

I decided that I could do without a lot of the branch instructions provided in the original instruction set. Woz had positive and negative versions of each branch on flag instruction. Also, there was an instruction meant for returning to the 6502 from the emulated SWEET16 that simply wasn’t needed in a real hardware processor. So I took half the branch instructions (all the negative ones such as BNE and re-coded them for other purposes. This meant that a little flexibility in test and branch was lost at the cost of having a more general instruction set.

 

Instruction Set Architecture

The GP instruction opcode setup is very similar to that of the SWEET16. All register instruction opcodes are formed by using a byte where the lower 3 bits represent the register and the upper 5 bits represent the operation to be performed. This implies that the GP has fewer registers than the SWEET16 and indeed it does. While the SWEET16 provided 16 registers the GP only includes 8 registers. Having 5 bits instead of four to indicate the operation requires that we cut the number of registers in half. So registers in the GP are R0 (ACC), R1 – R3 (General purpose), R4 (RETURN STACK), R5 (COMPARE RESULT), R6 (STATUS), R7 (PC).

The non-register instructions have the upper 5-bits of the byte set to 0. The lower 3 bits then represent the operation to be performed. This limits the non-register operations to values 0 – 7. In the Sweet16-GP emulator, the 0x00 code will be used for a HALT instruction. This opcode was used to return to Apple BASIC in the original SWEET16.

 

Registers

The Sweet16-GP registers are designated R0 – R7. Registers R1 through R3 are general purpose. Other registers have special purposes:

  • Register R0 is the ACC (accumulator).
  • Register R4 is the subroutine return address stack pointer RETSTACK.
  • Register R5 is the compare result register COMP.
  • Register R6 is the status flag register STATUS.
  • Register R7 is the program counter PC.

These registers may be referred to by their numerical ids or their names i.e.: ACC, RETSTACK, COMP, STATUS, PC.

 

Status Flags

The Sweet16-GP processor contains four status flags. These flags are set to 1 to reflect the results of operations. These flags are tested by branch instructions to determine if the branch should be taken.

The flags contained in the lower 8 bits of the status register STATUS (R6) are:

  • Bit 0 is the Carry flag (C).
  • Bit 1 is the Zero flag (Z).
  • Bit 6 is the Overflow flag (V).
  • Bit 7 is the Negative flag (N).

The Carry flag is set when an ADD or SUB instruction results in a carry into the 17th bit or a borrow occurs from the 17th bit. It is also set when the high order bit is shifted out by the SHL instruction is a 1, or the low order bit shifted out by the SHR instruction is a 1.

The Zero flag is set anytime the result of an operation is zero. This includes math and logic operations.

The Overflow bit is set anytime a math operation results in a two’s complement overflow. It is up to the programmer to determine if this flag is relevant for any given operation.

The Negative flag is set any time the results of any operation on register data leaves the 16th-bit set.

All flags are cleared on boot up. Flags are generally referred to by their designated letter: C for Carry, Z for Zero, V for Overflow, and N for Negative.

Non Register Instructions

HALT

[00]

(Halt)

The program execution stops when this instruction is encountered.

Example:

end:          HALT                                ;Halt processor

 

BRA

[0 1] [dd]

(Branch Always)

An effective address (ea) is calculated by adding the signed displacement byte (dd) to the program counter. The program counter contains the address of the instruction immediately following the BRA instruction, or the address of BRA + 2. The displacement is a signed two complement value in the range of -128 to +127. Branch conditions are not changed.

Example:

;
; Simple test for BRA
;
start: SET R0, 0xFFF0
SET R2, 0x000F
ADD R2
BRA 0x03
BYTE 0xff
BYTE 0xff
done: HALT

 

BRC

[02] [dd]

(Branch Carry Set)

A branch is effected only if the carry bit of the status register is set. Branch conditions are not changed.

Example:

;
; Test for BRC
; On Exit ACC = 0x0000, N flag is set, and PC = 0x000C, R2 = 0x0001
;
start: SET R0, 0xFFF0 ; Load ACC with 0x00FF
SET R2, 0x0001 ; Load R2 with 0x1F00
ADD R2 ; ADD R2 to ACC, results in ACC
BRC 0x02 ; Branch to HALT instruction
BRA 0xFC ; Data block
end: HALT ; STOP Execution

 

BRZ

[03] [dd]

(Branch Zero Set)

A branch is effected only if the prior operation’s result was zero. Branch conditions are not changed.

Example:

;
; Test for BRZ
; On Exit: ACC = 0x0, R6 = 0x80, PC = 0x09
start: SET R0, 0x000F ; Load ACC with 0x000F
DEC ACC ; Decrement ACC
BRZ 0x02 ; Branch to HALT instruction
BRA 0xFC ; Data block
end: HALT ; STOP Execution

 

BRN

[04] [dd]

(Branch Negative)

A branch is effected only if the prior operation resulted in a negative (MSB = 1). Branch conditions are not changed.

Example:

;
; Test for BRN
; On Exit: ACC = 0x8000, R2 = 0x01, PC = 0x0C, N and V flags Set
;
start: SET R0, 0x7FF0 ; Load ACC with 0x00FF
SET R2, 0x0001 ; Load R2 with 0x1F00
ADD R2 ; ADD R2 to ACC, results in ACC
BRN 0x02 ; Branch to HALT instruction
BRA 0xFC ; Data block
end: HALT ; STOP Execution

 

BRV

[05] [dd]

(Branch Overflow)

A branch is effected only if the prior operation resulted in a two’s complement overflow. Branch conditions are not changed.

Example:

;
; Test for BRN
; On Exit: ACC = 0x8000, R1 = 0x0010, R2 = 0x01, PC = 0x13, N and V flags Set
;
start: SET R0, 0x7FF0 ; Load ACC with 0xFFF0
SET R2, 0x0001 ; Load R2 with 0x1F00
SET R1, 0x0000 ; Load R1 with 0x00
SET R3, 0x0000
calc: ADD R2 ; ADD R2 to ACC, results in ACC
INC R1 ; Count how many times we Add R2 to ACC (R0)
BRV end ; Branch to HALT instruction
BRA calc ; Data block FB
end: HALT ; STOP Execution

 

BSR

[06] [a-low] [a-high]

(Branch Sub Routine)

The current PC + 2 is pushed on a “return address” stack whose pointer is in R4 (RETSTACK) and R4 is incremented by 2. The carry is cleared and branch conditions are set to represent the value in R0 (ACC). Then, the branch to the effective address (PC + 2 + dd) is taken.

Example:

;
; Test of BSR and RTS
;
init: SET RETSTACK, 0x0020 ; Sets the return address stack pointer

start0: SET R0, 0x000f
BSR sub_cnt
start1: DEC R0
BRN done
BRA start1
BYTE 0xff
BYTE 0xff

done: HALT
BYTE 0xff
BYTE 0xff

sub_cnt: INC R1
RTS
BYTE 0xff
BYTE 0xff

 

RTS

[07]

(Return from Sub Routine)

RETSTACK (R4) is decremented by 2 and the return value is popped off the stack and loaded into the PC. Effectively returning execution to the address placed on the stack during execution of the BSR instruction.

Example: (See: BRS)

 

Register Instructions

SET

[08] + Rn [c-low] [c-high]

(Set)

The two byte constant (c-low, and c-high) are loaded in to Rn (n = 0 – 7) and branch conditions are set to represent Rn’s new value.

Example:

;
; On Exit R3 contains the hex value 0xdefa
;
start: SET R3, 0xdefa

 

LD Rn

[10] + Rn

(Load)

The ACC (R0) is loaded from Rn. Branch conditions are updated to reflect the new value of ACC.

Example:

;
; On Exit ACC (R0) contains 0xF0FA
; Branch conditions are set to reflect
; the new value in ACC.
;
start: SET R3, 0xF0FA
LD R3

 

ST Rn

[18] + Rn

(Store)

The contents of ACC are stored in Rn and the branch conditions are set to reflect the new value of Rn. The carry flag is cleared, and the contents of ACC are left undisturbed.

Example:

;
; On Exit R2 contains the hex value 0xffef
; which were transferred from ACC. ACC is
; left altered.
;
start: SET ACC, 0xffef
ST R2

 

LD @Rn

[20] + Rn

(Load indirect)

The low order ACC byte is loaded from the memory location whose address resides in Rn, and the high order byte of ACC is cleared. Branch conditions are set to reflect the final state of ACC whose contents will always be positive. The carry flag is cleared. After transfer Rn is incremented by 1.

Example:

;
; Test LD @Rn
;
start: SET R3, 0x00FF ; Load pointer
LD @R3 ; Load ACC from memory pointer in R3

 

ST @Rn

[28] + Rn

(Store indirect)

The low order byte of ACC is stored into memory location whose address is resides in Rn. Branch conditions are set to reflect the two byte contents of ACC (R0). The carry flag is cleared and Rn is incremented by 1.

Example:

;
; Test ST @Rn
;
start: SET R3, 0xFFE0 ; Set pointer
SET ACC, 0x0A ; Set ACC value
ST @R3 ; Store to memory#b9b9b9

 

LDD @Rn

[30] + Rn

(Load double byte indirect)

The low order byte of ACC (R0) is loaded from the memory location whose address resides in Rn, and Rn is incremented by 1. Then the high order byte of ACC is loaded from the updated memory location pointed to be Rn. Rn is incremented once again by 1. Branch conditions are set to reflect the final ACC contents, and the carry flag is cleared.

Example:

;
; Test LD @Rn
;
start: SET R3, 0x00F0 ; Load pointer
LDD @R3 ; Load ACC from memory pointer to by R3

 

STD @Rn

[38] + Rn

(Store double indirect)

The low order byte of ACC (R0) is stored into the memory location whose address resides in Rn, and Rn is incremented by 1. The high order byte of ACC is then stored into the memory location pointed to by (the incremented) Rn. Rn is again incremented by 1. Branch conditions are set to reflect the contents of ACC, and the carry flag is cleared.

Example:

;
; Test STD @Rn
;
start: SET R3, 0xFFE0 ; Set pointer
SET R0, 0x0A0B ; Set ACC value
ST @R3 ; Store to memory

 

POP @Rn

[40] + Rn

(Pop indirect)

Rn is decremented by 1. Then the low order byte of ACC (R0) is loaded from the memory locations whose address resides in (the decremented) Rn. The high order byte of ACC is cleared and the status bits are set to reflect the final value of ACC whose content will always be positive. The carry flag is cleared. Because Rn is decremented before prior to loading the ACC, single byte stacks can be implemented using ST @Rn and POP @Rn.

Example:

;
; Test POP @Rn
; On Exit: ACC contains 0xaa
;
start: SET R3, 0x00fe ; Load data pointer
SET R0, 0xaa ; Load ACC with data
ST @R3 ; Store data in memory at 0xfe
SET R0, 0x00 ; Clear ACC
POP @R3 ; Reload data into ACC
HALT

 

STP @Rn

[48] + Rn

(Store Pop indirect)

Rn is decremented by 1 and the low order byte of ACC (R0) is stored in to the memory location that resides in Rn. Status bits are set to reflect the final 16 bit value of ACC. The contents of ACC are left undisturbed. STP @Rn and POP @Rn can be used together to move blocks of memory beginning with the greatest address and working downward. 

Example:

;
; Test STP
; This instruction is used in combination
; with POP @Rn to move blocks of memory
; starting from the high address and working
; down to the low end of the block.
;
start: SET R3, B000 ; initialize pointers.
SET R2, A000
POP @R3 ; Get data from source block
STP @R2 ; Store in destination block
POP @R3
STP @R2

 

ADD Rn

[50] + Rn

(Add)

The value in ACC and the value in Rn are added and the result is placed in ACC. The status flags are set to reflect the final value of ACC.

Example:

;
; Test ADD Rn
; On Exit:
;
start: SET R1, 0x00fe ; initialize pointers.
SET R0, 0x01 ; load ACC with 1
ADD ; ACC gets 0cfe + 0x01 = 0xff
HALT

 

SUB Rn

[58] + Rn

(Subtract)

The value in Rn is subtracted from the value in ACC (R0) and the results are placed into ACC. The status flags are set to reflect the final value of ACC.

Example:

;
; Test SUB Rn
; On Exit:
;
start: SET R1, 0xfe ; load R1 with 0xfe
SET R0, 0x01 ; load ACC with 1
SUB ; ACC gets 0cfe - 0x01 = 0xfd
end: HALT ; STOP Execution

 

MUL Rn

[60] + Rn

(Multiply)

The value in ACC (R0) is multiplied by the value in Rn. The result is stored in ACC and the status flags are set to reflect the final value of ACC.

Example:

;
; Test MUL Rn
; On Exit:
;
start: SET R1, 0x02 ; load R1 with 2
SET R0, 0x02 ; load ACC with 2
MUL R1 ; ACC gets 0x02 * 0x02 = 0x04
end: HALT ; STOP Execution

 

DIV Rn

[68] + Rn

(Divide)

The value in ACC (R0) is divided by the value in Rn. The status flags are set to reflect the final value of ACC.

Example:

;
; Test DIV Rn
; On Exit:
;
start: SET R1, 0x02 ; load R1 with 2
SET R0, 0x02 ; load ACC with 2
DIV R1 ; ACC gets 0x02 / 0x02 = 0x01
end: HALT ; STOP Execution

 

AND Rn

[70] + Rn

(Logical and)

The value of ACC is logically AND’d with the value in Rn. The results are placed in ACC. Status flags are set to reflect the final value of ACC.

Example:

;
; Test AND
; On Exit: ACC contains 0x2
;
start: SET R0, 0xA ; Load 0b1010 into ACC
SET R2, 0x2 ; Load 0b0010 into R2
AND R2 ; Logical AND ACC with R2
HALT

 

OR Rn

[78] + Rn

(Logical or)

The value in ACC is OR’d with the value of Rn. The result is stored in ACC. The status flags are set to reflect the final value of ACC. The carry flag is cleared.

Example:

;
; Test OR
; On Exit: ACC contains 0xFA
;
start: SET R0, 0xA ; Load 0b1010 into ACC
SET R2, 0xF2 ; Load 0b0010 into R2
OR R2 ; Logical OR ACC with R2
HALT

 

XOR Rn

[80] + Rn

(Logical xor)

The value in ACC is XOR’d with Rn and the result stored in ACC. The status flags are set to reflect the final value of ACC. The carry flag is cleared.

Example:

;
; Test XOR
; On Exit: ACC contains 0xF8
;
start: SET R0, 0xA ; Load 0b1010 into ACC
SET R2, 0xF2 ; Load 0b0010 into R2
XOR R2 ; Logical XOR ACC with R2
HALT

 

NOT

[88] + Rn

(Logical not)

The contents of Rn are logically inverted. The condition flags are set to reflect the final value of Rn. The carry flag is cleared.

Example:

;
; Test NOT
; On Exit: ACC contains 0xF5
;
start: SET R0, 0xA ; Load 0b1010 into ACC
NOT R0
HALT

 

SHL

[90] + Rn

(Logical shift left)

The contents of Rn are logically shifted to the left by 1 place. The high order bit which falls off the left side is placed in the carry status flag. The branch conditions are set to reflect the final value of Rn.

Example:

;
; Test SHL
; On Exit: R1 contains 0x8
;
start: SET R1, 0x2 ; Load 0b0010 into ACC
SHL R1
SHL R1
HALT

 

SHR

[98] + Rn

(Logical shift right)

The contents of Rn are logically shifted to the right by 1 place. The low order bit that falls off the right end is placed in the carry status flag. The branch conditions are set to reflect the final value of Rn.

Example:

;
; Test SHR
; On Exit: R1 contains 0x0A
;
start: SET R1, 0xAF ; Load 0b1010_ into ACC
SHR R1
SHR R1
SHR R1
SHR R1
HALT

 

ROL

[A0] + Rn

(Rotate left)

The contents of Rn are rotated to the left by 1 bit position. The high order bit that is rotated out of bit 7 is placed in the vacant bit 0 position. The status flags are set to reflect the final value of Rn, and the carry flag is cleared.

Example:

;
; Test ROL
; On Exit: R1 contains 0xFDEA
;
start: SET R1, 0xAFDE ; Load 0b1010_1111_1101_1110 into ACC
ROL R1
ROL R1
ROL R1
ROL R1
HALT

 

ROR

[A8] + Rn

(Rotate right)

The contents of Rn are rotated right by 1 position. The low order bit that is rotated out of bit position 0 is placed in bit position 7. The status flags are set to reflect the final value of Rn, and the carry flag is cleared.

Example:

;
; Test ROR
; On Exit: R1 contains 0xEAFD
;
start: SET R1, 0xAFDE ; Load 0b1010_1111_1101_1110 into ACC
ROL R1
ROL R1
ROL R1
ROL R1
HALT

 

POPD @Rn

[E0] + Rn

(Pop double byte indirect)

Rn is decremented by 1 and the high order ACC (R0) byte is loaded from the memory location whose address resides in Rn. Rn is then decremented by 1 again, and the low order byte of ACC is loaded from the memory locations that resides in (the decremented by two) Rn. Status flags are set to reflect the final value of ACC. The carry flag is cleared. Because the value of Rn is decremented before loading each byte half of ACC, double byte (16-bit) stacks can be implemented using STD @Rn and POPD @Rn. Rn is then the stack pointer.

Example:

;
; Test POPD @Rn
; On Exit: ACC contains 0x0A0D
;
start: SET R3, 0x0D ; Load pointer
SET R0, 0x0A0D ; Load ACC with data
STD @R3 ; Store the data
SET R0, 0x00 ; Clear ACC
POPD @R3 ; Reload the data into ACC
end: HALT

 

CPR

[E8] + Rn

(Compare)

The ACC (R0) contents are compared to the contents of Rn by performing a 16-bit binary subtraction, subtracting Rn from ACC (R0). The results are placed in the R5 (COMP) register and status flags are set to reflect the final value of the COMP register. ACC is left unmodified.

Example:

;
; Test CMP Rn
; On Exit: R5 (COMP) contains 0xFB00
; and the STATUS register's N bit is set.
;
start: SET R1, 0x0F0D ; Load pointer
SET R0, 0x0A0D ; Load ACC with data
CPR @R3 ; Store the data
end: HALT

 

INC Rn

[F0] + Rn

(Increment)

The value of Rn is incremented by 1. The status flags are set to reflect the final value of Rn and the carry flag is cleared.

Example:

;
; Test INC Rn
; On Exit: R2 contains 0xAF0B
;
start: SET R2, 0xAF0A ; Load data into R2
INC R2
end: HALT

 

DEC Rn

[F8] + Rn

(Decrement)

The value of RN is decremented by 1. The status flags are set to reflect the final value of RN. The carry flag is cleared.

Example:

;
; Test DEC Rn
; On Exit: R2 contains 0xAF09
;
start: SET R2, 0xAF0A ; Load data into R2
DEC R2
end: HALT

 

To test my new instruction set I decided to develop a software emulator for what I am now calling the Sweet16-GP (GP). To keep things simple I grab Python and set out to write a software emulator. The project took only a couple days. You can learn more about the emulator in my next post. For now, familiarize yourself with the instruction set.

Series NavigationSweet16 CPU Emulator >>