Writing x86 Assembler
In part 1 we created a 1.44MB floppy disk image and placed a manually created boot sector on it with a very short machine code program that provided an infinite loop. We wont be able to do much more without a basic understanding of the x86 family of processors and their assembly language. So we’ll take a short detour here and discuss the processor first and then their assembly language.
The x86 family of processors began life as the Intel 8080 8-bit processor. Intel had produced various processors for cash registers, calculators and other such devices. The 8080 was their first general purpose processor. The 8080 had eight 8-bit registers and two 16-bit registers. Registers A, B, C, D, E, H, L, and Flag were all 8 bits. The SP (Stack Pointer) and PC (Program Counter) registers were both 16 bits. However, these 8-bit registers may be used in pairs as 16-bit registers. These pairs are BC, DE, and HL.
The Flag register contains 5 significant bits used to flag special conditions or states. These are the S, Z, AC, P, and C (or CY) flag bits. S is set if the result of an operation is negative. Z is set if the result of an operation is zero. P is set if the number of 1s in the result are even (Even Parity). The C (Carry/Borrow) flag is set if the last addition operation resulted in a carry or if the last subtraction operation required a borrow. The AC (Auxiliary Carry) flag (Sometimes referred to as the H or Half Carry flag) is set when a carry occurs from bit three to bit four during a BCD (Binary Coded Decimal) operation.
These flags are used by conditional instructions such as jump or branch instructions. The flags can be copied to the Accumulator (A register) and the A register and Flag register can be addressed as a single PSW (Program Status Word).
When power is applied to the processor it waits a short time for it’s clock signal to stabilize. Then places all it’s registers and flags into a known state called the Initial State. Next the processor loads an instruction from instruction memory. The bits in the instruction are then decoded to turn on and off various internal signals. These instructions include encoded information about what the instruction should do, what registers the operation should be performed on, and where the results should be placed. Next, the decoded operation is executed and if required, the results are stored. This set of steps is common to all microprocessors. It is often referred to as the Fetch, Decode, Execute Cycle. Even processors that execute an instruction in a single cycle complete all these steps. They just have to do it by dividing that single cycle into three or four sub-cycle time steps.
Now you may wonder why we are spending time learning about the 8080 when our focus is on x86 machines? Well, the x86 family of processors are just the younger, bigger, more powerful, siblings of the 8080.
Intel followed the 8080 with the 8085 (basically an easier to use 8080 but with slower speed) and then the 8086 & 8088. The 8086 is the first x86 processor and it contains most of the 8080 instructions and registers. However, it also contains so much more! With the 8080 most instructions work only on 8-bit data. Their are a few for manipulating 16 bit data but they are limited in scope. The 8086 however is a true 16-bit machine.
In the 8086 the data and accumulator registers became 16-bits wide. However, they still contain the original 8-bit registers of the 8080. The 16-bit registers can address both their high and low order bytes separately using the H or L suffice. Some of the registers where renamed to reduce confusion I’m sure, with the new meaning of H and L.
Also notice that the PC has been renamed IP (Instruction Pointer as compared to Program Counter in the 8080). It still functions exactly the same. It just points to the next instruction in instruction memory to be executed. The Stack Pointer (SP) is just a16 bit version of the SP in the 8080.
The 8086 also gained several new registers. The BP (Base Pointer), SI (Source Index), DI (Destination Index), CS (Code Segment), DS (Data Segment), SS (Stack Segment), ES (Extra Segment). In total the 8086 has 14 16-bit registers. The 16 bit general purpose registers we all given a suffix of ‘X’. So the 8080’s A register’s 16 bit counter part in the 8086 is the AX register.
The 8086 had a 20-bit address space or 1 megabyte. This was a large jump from the 64k of the 8080. The 20-bit address had to be formed from 16-bit sources. So the memory was partitioned into sixteen 64K segments. The maximum memory addressable by 16-bits. Each segment is identified by a 16-bit segment value ranging from 0x0000 to 0xFFFF. Within each segment, a memory location is selected by a 16-bit offset (the number of bytes from the beginning of the segment). This offset is often referred to as the logical address and it is not a real address but an address into a real segmented address space. This segment/offset combination is often written separating the two parts with a colon, for example: A4FB:4872h. The segment value is first, A4FB then the colon followed by the offset value. So here the we have segment A4FB with offset 4872, Here the h post script means that the values are in Hexadecimal.
To get a real address from this notation the segment value must take the offset value (A4FB) and multiply it by 16 (shift the value 4-bit to the left) and add the offset. So in this example A4FB:4872h becomes a real 20-bit physical address of A9822h.
All of this memory segmentation will be important as we move forward. The 8086 can only actually address 64K of memory at a time. This is why we need the segment registers to allow us to access additional memory beyond 64K.
With this scheme there is a large amount of overlap. Every 16 bytes we start a new segment. So any address ending in 0h is the beginning of a new segment. These 16 byte spaces are called paragraphs. Also note that in the 8086 the segments may overlap, as a result the segment:offset is not unique.
As we have seen assembly language is the lowest human readable form of program code. But assembly isn’t just a single language. In fact there is a unique assembly language for each microprocessor architecture. Also, each architecture may have multiple dialects of assembly because different Assembler (A tool used to translate Assembly Code to Machine executable code) developers chose to implement their version of assembly language differently. This is the case with the linux “AS” assembler and the “NASM” assembler. “AS” uses the AT&T dialect of x86 assembly language and NASM uses the Intel dialect. These two dialects are very different in terms of syntax:
Intel Syntax: mov eax, 1 (instruction destination, source)
AT&T Syntax: movl $1, %eax (instruction source, destination)
The Intel syntax is pretty self explanatory. In the above example, the amount of data which is to be moved is inferred from the size of the register (32 bits in the case of eax). The addressing mode used is inferred from the operands themselves.
There are some quirks when it comes to the AT&T syntax. First, notice the suffix at the end of the
mov instruction. This stands for
long and signifies 32 bits of data. Other instruction suffixes include
w for a word (16 bits – not to be confused with the word size of your CPU!),
q for a quad-word (64 bits) and
b for a single byte. While not always required, typically you will see assembly code which uses AT&T syntax explicitly state the amount of data being operated on by the instruction.
We will write our assembly code using the Intel style of x86 assembly code. To translate our assembly code to machine code we will be using the nasm assembler.
Another reason we will need assembly code is that many of the lowest level machine instructions cannot be accessed through high level languages such as C or C++. So under some circumstances assembly or machine code is our only option.
Some Useful Assembly Instructions
Let’s look at a typical x86 assembly instruction:
mov ax, 0x10
This instruction has three parts. The first is the instruction mnemonic “mov”. which in this case represents the move operation. The second part is the destination register “ax”. Which is the 16-bit version of the A register. X is to denote this register as extended from the original 8-bit version. The final part of the instruction above is the value 0f 0x10. A hexadecimal value. This instruction will move the value of 0x10 into the A register.
Now, you might be wondering about that EAX register shown in the AT&T and Intel examples above. We’ll these are 32-bit extended-extended registers that was first used in the 80386 processor. We will discuss 32-bit architectures in a later post. First, I want to get a 16-bit system to boot a real stub OS. There are also 64-bit registers. The 64-bit architecture was first introduced by Intel in their Itanium processor back in 1998. Through 64 bit chips had been produced for specialty computers (i.e. super computers). In 1998 Intel and HP partnered to produce the first Itanium chips. The Itanium was released in 2001. Intel and HP focused this architecture on the server and high end systems market. The original x64 instruction set was designed by AMD (an Intel competitor) and was released in 2000. The instruction set was then implemented by AMD, Intel, and VIA. The 8080 and 8086 registers are still found in the x64 architecture. Their 64-bit counterparts are prefixed with an “R”. I guess it would be confusing to have an extended-extended-extended register.
Here’s a table of Register Naming from 8 to 64 bits
The suffix on the new R8-R15 registers represents ‘B’ for byte, ‘W’ for word (16-bits), ‘D’ for double word (32-bits), and no suffix for the full 64-bits.
OK, One more detour before we write a bit of x86 assembly code.
Interrupts are a mechanism that allow the CPU to be notified of some event. Basically, an interrupt stops the processing of the current task and branches the CPU to a new location in the code called the Interrupt Service Routine or ISR. In the x86 architecture interrupts can be triggered by hardware or software signals. One important use for interrupts is in the handling of I/O devices. Let’s take the keyboard for example. If the CPU had to poll the keyboard for data it would eat up a lot of processor time. This time would be more efficiently used to execute our program instead of continuously asking the keyboard if a key had been pressed. Now if we allow the keypress on the keyboard to trigger an interrupt, then the ISR can simply read the keystroke from the keyboard and pass it on to our program. Once the ISR is finished handling the keypress, it routines to the same code and state before the interrupt occurred. If you think about it, this is just a hardware triggered function call.
Interrupts cal also be turned off and off within the CPU. When they are turned off the CPU will ignore any interrupts that occur. The x86 also include some interrupts that cannot be turned off. These are called “Non-Maskable Interrupts”.
As it turns out the BIOS uses software interrupts to call it’s routines. The BIOS defines many ISRs for typical tasks such as ‘int 0x10’ for for display related routines, and ‘int 0x13’ for disk related routines. The actual method within the BIOS’s ISRs that is called looks at the CPU AX register (in most cases) to determine the actual function to call. So prior to triggering a software interrupt for a BIOS routine, you must first load the function code into one of the CPU registers. This is typically the AX/AH register. However, you need to look up the actual registers used and those that are modified during the ISR. Here’s a document that gives a great list of useful DOS and BIOS routines. Another great source of info on BIOS routines is the Phoenix BIOS User Manual.
Interrupts also have a priority. Higher priority interrupts my interrupt a lower priority interrupts and instructions. Each interrupt is represented by a number. This number is also used as the index into an vector table. When an interrupt occurs, the CPU stops what it is doing, looks up the address of the IRS for the current interrupt in the interrupt vector table and then jumps to that location. The ISR must be placed at the location stored in the vector table.
OK, so we’ve covered the basics of assembly instructions, registers, and interrupts. I guess it’s time to get down to the metals and write some code!
Let’s start by trying to get our boot sector to print something. This will give us a visual indication that the BIOS actually found and executed our code.
; ; File: salsa/src/hello-salsa.asm ; Desc: A simple boot sector that prints "Salsa Booting..." using the BIOS routines. ; Auth: Randall Morgan ; Date: 02/27/2019 ; mov ah, 0x0e ; int 10/ah = 0eh -> scrolling teletype BIOS routine mov al, 'S' int 0x10 mov al, 'a' int 0x10 mov al, 'l' int 0x10 mov al, 's' int 0x10 mov al, 'a' int 0x10 mov al, ' ' int 0x10 mov al, 'B' int 0x10 mov al, 'o' int 0x10 mov al, 'o' int 0x10 mov al, 't' int 0x10 mov al, 'i' int 0x10 mov al, 'n' int 0x10 mov al, 'g' int 0x10 mov al, '.' int 0x10 mov al, '.' int 0x10 mov al, '.' int 0x10 jmp $ ; Jump to the current address ; ; Padding and magic BIOS number. ; times 510-($-$$) db 0 ; Pad the boot sector with zeros dw 0xaa55 ; Last two bytes are the magic number.
This code is not the most efficient. I wanted to keep things simple for our first program. The first thing we do here is move the value 0x0E into the the high order 8 bits of the AX register (AH). This value is the function code for the BIOS function we want to execute. We will pass this function code in AH to the BIOS ISR for interrupt 0x10. This function, called the “Scrolling Teletype Character Write” routine simply takes what it finds in the lower 8 bits of the AX register (AL) and tries to print it to the screen at the next character position available. The rest of the program simple passes each character of the string “Salsa Booting…” to the BIOS routine. Once the entire string has been printed, we execute a jump to the currently location. This results in an infinite loop.
The times 510-($-$$) db 0 line may be cryptic at first. So let’s break it down as you’ll need to use it or similar commands often. The ‘db 0’ instructs the assembler to ‘define a byte’ at the current location with the value 0. The single dollar sign ‘$’ is the assembler’s token for the current address and the double dollar sign ‘$$’ is the assembler’s token for the origin address (where our code began). So ‘$-$$’ results in calculating the length of our code. Next this value is subtracted from the value 510 and then assembler macro “times” is called with this value. Times then repeats the “db 0” command for (510-length of our code) which results in padding our code with bytes of zeros out to 510 total bytes. At this point our code is 510 bytes long. We then call “dw 0xaa55”. This defines a word (two bytes) at the current location with the value 0xaa55. Which of course is our magic number that tells the BIOS it has found a boot sector.
One note here. You may have noticed that I used db (define byte) for the 0 padding values. This made the calculation easier. However, I used dw (define word) for the magic number. Using define word here garuantees that the two eight bit bytes of the word will be stored in the correct order i.e. with the correct endianess.
Running Salsa Booting
Now it’s time to run our code. First we need to assemble it into machine code and the make a disk image and place our machine code into it’s boot sector. Run the following command in the salsa/src directory to assemble the file.
$ nasm salsa.asm -o hello-salsa.bin
This will give you a binary file of machine code. Next move the hello-salsa.bin file to your salsa/bin folder.
In the salsa/bin folder run the following command:
$ dd if=/dev/zero of=hello-salsa.img bs=512 count=2880
This will build our blank 1.44MB floppy image. Now we need to copy our binary file to the beginning of this disk. We can do that with the following command:
$ dd if=hello-salsa.bin of=hello-salsa.img
Now that we have our floppy disk image we can run it in qemu with the following command:
$ qemu-system-i386 hello-salsa.img
The result is nothing spectacular. However, we’ve had to cover a lot of ground to get here! So be proud of yourself!
Here’s a screen shot of the program.
I think we’ve covered quite a bit for this post. Go read about x86 registers and interrupts and the BIOS routines. I’ll get to work on the next post where we will implement our boot loader.
Until then, Happy Coding!