Header Image - Randall Morgan

Category Archives

3 Articles

Salsa – A Teaching Operating System – Part 3

This entry is part 3 of 3 in the series Salsa OS

Last time we discussed the x86 assembly language. We really only scratch the surface of this subject. There is much more to learn. Today we need to address addressing modes (pun intended) and the stack.

First, what is an address mode? When we have an instruction such as “mov” that accesses memory, that address must come from somewhere. The address mode determines how the address for the memory location in the instruction is obtained. For example:

mov ebx, 0x7cff

mov word [ebx], 0x7ce0

In the above example, the first instruction “mov ebx, 0x7cff” load the immediate value “0x7cff” into the EBX register. This is known as immediate mode. In this mode the data to be loaded into the register is part of the instruction and is encoded into the actual machine instruction. You’ll recognize immediate mode by the fact that no memory access (other than fetching the instruction itself) is needed to perform the instruction.

In the second example above, the value 0x7ce0 is moved not into the ebx register but into the memory location pointed to by the value in ebx. So, with the two instructions above 0x7ceo would be placed into memory at address 0x7cff because that is the value in ebx. Anytime you see a register in square braces you can think of this as the register saying “Use the address I contain”. This mode of addressing is called register indirect.The x86 family supports many addressing modes. However, we wont need them all. We’ll introduce others as we move along.

One more thing we should comment on is the use of the “word” value in the second instruction. Many assemblers can infer the proper data size to transfer from memory based on the register used. However, nasm may require the use of a data size to tell it how many bytes to transfer from memory. Here we use the “word” size to transfer 16-bits (two bytes) of data. We can also use byte (8-bits), dword (32-bits), and qword (64-bits). However, for our boot loader we wont be using any 64 bit instructions or registers. This is because our processor will initially be in “real” mode (16-bit mode).

Let’s see another example of storing a byte to memory using an address in the base register bx:

mov byte [ebx], al

As stated, this instruction will move the byte stored in register al (the low order byte of ax) to the memory location pointed to by the value in base register bx.

Now what if we want to transfer the high order byte of ax to memory?

We, we can just use the ah register from the ax pair. Like so:

mov byte [ebx+1], ah

Notice that here we passed a constant offset to the destination by adding 1 to the ebx register. This is very handy when we know ahead of time where the byte should be placed in respect to the base value.

Most x86 assembly instructions (using Intel Syntax) will follow the pattern:

<operation> <destination>, <source>

For example:

mov [ebx], eax

This instruction moves the value in eax the location pointed to by ebx.

Some instructions have an implicit source or destination. For example:

add ebx, 4

This instruction uses ebx as both a source and destination as the instruction takes the value in ebx, adds the immediate value 4 and then places the result back into ebx.

Now let’s look at a simple loop example:

   xor eax, eax ; zeros out eax
   mov ecx, 10 ; load counter with 10
   inc eax ; increment eax, our loop code goes here
loop start

The ecx register is usually used to store a count when doing repetitive operations. The line “start:” is a label. Labels are special markers in the code that basically names the current code address and must end with a colon. This allows you to refer to points in your code by a human readable label rather than trying to calculate and use the address actual value.

Here we’ve loaded ecx with the number of times we want to repeat our block of code (in this case the inc eax instruction) and marked the start of the block of code with the label “start”. Finally, we use the “loop” operation followed by the label of the block we want to repeat.

The loop operation decrements the value of ecx. Then if ecx is non-zero, it jumps to the address (label) provided. When ecx is zero, execution will continue to the instruction following the “loop” operation.

We’ve seen enough assembly language to get our feet wet. Now let’s return to our operating system.

Displaying Text

One of the first task we want our OS to perform will be to print something to the screen so we know it’s alive. We’ve seen how to use the BIOS’s teletype function to print characters to the screen. But how does the BIOS do this?

When the PC is first powered on the BIOS initializes the video display to a 16 color text mode. It places video memory at location 0xB8000. We can write directly to video memory to display text. However, we need to know a bit more first.

The default video mode stores it’s display data as two byte values. The low order byte determines the character to display and the high order byte determines the foreground and background color of the character.

So how to we know what values correspond to which characters? Well the default is to support the ASCII character set. You can find ASCII character set tables all over the internet. But for completeness check out this one.

Now, what about that color byte? The high order byte of each pair contains two nibbles (4-bit values). The lower order nibble (bits 0-3) contain the four bit code for the foreground color and the high order nibble contains the background color. To see the sixteen possible colors supported check out this link: https://en.wikipedia.org/wiki/BIOS_color_attributes.

Now let’s try displaying some colorful text to the screen in video mode 0. In your favorite text editor, enter the following x86 assembly code:

; File: alphabet.asm
; Auth: Randall Morgan <rmorgan62@gmail.com>
; Desc: Program to display the English alphabet on the display in 
; Mode 0, a 25 (row) x 40 (cols) x 16 color display mode.

    mov byte al, 'A'     ; Character
    mov byte ah, 0x0F    ; Color: background 0-black, foreground f = white
    mov ecx, 26     ; 26 letters in the alphabet
    mov ebx, 0xB8000 ; Set base register to start of video memory
    mov word [ebx], ax  ; Move character and color code into video memory
    inc al              ; Increment al for next character
    inc ah              ; Increment next color pair
    add ebx, 2          ; Move to next character
loop loop_start
    jmp end         ; infinitly jump here
 times 510-($-$$) db 0 ; Pad file to 510 bytes
    dw 0xaa55          ; Add the magic number

Save this file in slasa/src as alphabet.asm. Then cd to the salsa directory and execute the following command at the terminal:

nasm src/alphabet.asm -o /bin/alphabet.bin

This will assemble the program into machine code and the BIOS will happily execute it for us as long as we half the magic number properly placed at the end of a 512 byte code block. We wont even need to place it on a 1.44MB disk image for qemu to run it.

Next we need to run the binary file in qemu. We can do this with the following command:

qemu-system-i386 bin/alphabet.bin

This command requires that you are still in the salsa directory. If everything works you should see something like this:

Printing to Bios default Mode 0 screen

Notice the alphabet in the top left of the screen. Notice that we’ve not only printed the alphabet but we have also changed the foreground and background colors of each character. Now we’re getting somewhere…

Ok, next we need to learn a little more about the x86 architecture before we move on to our next step in OS development.

The Stack

A stack is a simple First-In-Last-Out data structure. The hardware stack used by the x86 is a reserved section of RAM and is used to store temporary data such as local variables and intermediate values, and parameters.

Keeping track of the stack and the next open location in memory that we can push data to is the job of the SP (Stack Pointer) register. On the 8086 this register is 16-bits and is named SP. On 32 bit x86 architectures the register has been extended to 32 bits and has been renamed ESP (Extended Stack Pointer).

To initialize a stack we only need to set the stack pointer to a valid location in memory. It should be noted that the x86 stack grows downward. So each time we push a byte onto the stack the stack pointer (sp) is decremented. If our stack is initialized to 0xFFFF and we push the al register onto the stack it will save the contents of al to 0xFFFF and then decrement the stack pointer to 0xFFFE. If we then push the bl register, the contents of bl will be placed in memory at 0xFFFE and then the stack pointer (sp) will be decremented again, leaving SP with a value of 0xFFFD. When we pop (remove) a value off the stack, it is first incremented and then the value at it’s new location is returned.

In our example above, the stack is pointing to 0xFFFD which is the next available memory location we can use. If we pop bl, SP is incremented to point to 0xFFFE and then the value at 0XFFFE is returned. Now, we don’t have to push and pop values to/from the same registers all the time. Though often this is the case. It should also be mentioned that it is convention to refer to the top of the stack as the high address (the address SP was initialized with) and the bottom of the stack as the address currently in SP. All memory address are unsigned. So they never go negative. If we initialized SP to 0xFFFF in a 64KB memory system ans the popped a value without first pushing a value onto the stack, SP would underflow with out error and wrap it’s value around to 0x0000. However, with x86 we need to initialize the stack pointer to a high value so it has space to ground downward.

Branch Instructions

We’ve been using the jmp instruction without really touching on it. The x86 architecture has many branch (sometimes called flow-control) instructions. Some of these instructions are:

  • jmp – unconditional jump, i.e. jump always
  • je – jump when equal
  • jne – jump when not equal
  • jz – jump when last result was zero
  • jg – jump when greater than
  • jge – jump when greater than or equal to
  • jl – jump when less than
  • jle – jump when less than or equal to

These instructions are usually used in combination with a label:

jz start ; jump if the result of the last operation was zero

These instructions are the only instructions that can modify the value of the IP (Instruction Pointer) register. The IP cannot be directly modified.

This brings us to the Flag register. This register contains various bits that are used to indicate various state information. The most common types of flags used in the Flag register are the Error flags which indicate error states such as arithmetic overflow. Status flags such as the interrupt enable flag, and result flags, such as the carry flag which is set if a carry or borrow occurs during an arithmetic operation.

The Flag register cannot be accessed or modified except for the use of “popfd” and “pushfd” instructions.

The most common use of the Flag register is in combination with the “cmp” (Compare) instruction which subtracts one operand from the other, setting the c (carry), z (zero), and sign (negative). Usually a “cmp” instruction is followed by one of the conditional branch operations. For example:

mov eax, [ebx]
cmp eax, 0x0a
jne some_label

The above code load a value pointed to be ebx into the eax register then compares that value with the decimal value 10 (0x0a). If the value in eax is not equal to 10 then the conditional jump, “jne some_label” (jump not equal) is taken. Otherwise execution continues with the next instruction.

We’re gaining ground in our understanding of the x86 architecture and assembly language. To get comfortable with writing and assembling programs take a little time and try to write the following programs:

  • Write an x86 assembly program to fill the BIOS boot screen with ‘Q’ characters.
  • Modify the above program to print each row in a different foreground and background color.
  • Write a program that loads the last byte of the boot sector (byte 512) and subtracts 0x55 from it. If the result is zero, print “zero” to the screen. If the result is not zero, print the resulting value to the screen in decimal. Hint: Checkout the ascii character codes for decimal digits.

OK, that’s enough for today. Next time we’ll look at a few registers we haven’t mentioned before. The x86 debug registers. We’ll also start on a few routines we’ll need for our boot loader and OS.

Until next time, Happy Coding!

Salsa – An Operating System for Teaching -Part 2

This entry is part 2 of 3 in the series Salsa OS

Writing x86 Assembler

In part 1 we created a 1.44MB floppy disk image and placed a manually created boot sector on it with a very short machine code program that provided an infinite loop. We wont be able to do much more without a basic understanding of the x86 family of processors and their assembly language. So we’ll take a short detour here and discuss the processor first and then their assembly language.

x86 Processors

The x86 family of processors began life as the Intel 8080 8-bit processor. Intel had produced various processors for cash registers, calculators and other such devices. The 8080 was their first general purpose processor. The 8080 had eight 8-bit registers and two 16-bit registers. Registers A, B, C, D, E, H, L, and Flag were all 8 bits. The SP (Stack Pointer) and PC (Program Counter) registers were both 16 bits. However, these 8-bit registers may be used in pairs as 16-bit registers. These pairs are BC, DE, and HL.

Intel 8080 Registers

The Flag register contains 5 significant bits used to flag special conditions or states. These are the S, Z, AC, P, and C (or CY) flag bits. S is set if the result of an operation is negative. Z is set if the result of an operation is zero. P is set if the number of 1s in the result are even (Even Parity). The C (Carry/Borrow) flag is set if the last addition operation resulted in a carry or if the last subtraction operation required a borrow. The AC (Auxiliary Carry) flag (Sometimes referred to as the H or Half Carry flag) is set when a carry occurs from bit three to bit four during a BCD (Binary Coded Decimal) operation.

These flags are used by conditional instructions such as jump or branch instructions. The flags can be copied to the Accumulator (A register) and the A register and Flag register can be addressed as a single PSW (Program Status Word).

CPU Operation

When power is applied to the processor it waits a short time for it’s clock signal to stabilize. Then places all it’s registers and flags into a known state called the Initial State. Next the processor loads an instruction from instruction memory. The bits in the instruction are then decoded to turn on and off various internal signals. These instructions include encoded information about what the instruction should do, what registers the operation should be performed on, and where the results should be placed. Next, the decoded operation is executed and if required, the results are stored. This set of steps is common to all microprocessors. It is often referred to as the Fetch, Decode, Execute Cycle. Even processors that execute an instruction in a single cycle complete all these steps. They just have to do it by dividing that single cycle into three or four sub-cycle time steps.

Now you may wonder why we are spending time learning about the 8080 when our focus is on x86 machines? Well, the x86 family of processors are just the younger, bigger, more powerful, siblings of the 8080.

Intel followed the 8080 with the 8085 (basically an easier to use 8080 but with slower speed) and then the 8086 & 8088. The 8086 is the first x86 processor and it contains most of the 8080 instructions and registers. However, it also contains so much more! With the 8080 most instructions work only on 8-bit data. Their are a few for manipulating 16 bit data but they are limited in scope. The 8086 however is a true 16-bit machine.

Intel 8086 Registers

In the 8086 the data and accumulator registers became 16-bits wide. However, they still contain the original 8-bit registers of the 8080. The 16-bit registers can address both their high and low order bytes separately using the H or L suffice. Some of the registers where renamed to reduce confusion I’m sure, with the new meaning of H and L.

Also notice that the PC has been renamed IP (Instruction Pointer as compared to Program Counter in the 8080). It still functions exactly the same. It just points to the next instruction in instruction memory to be executed. The Stack Pointer (SP) is just a16 bit version of the SP in the 8080.

The 8086 also gained several new registers. The BP (Base Pointer), SI (Source Index), DI (Destination Index), CS (Code Segment), DS (Data Segment), SS (Stack Segment), ES (Extra Segment). In total the 8086 has 14 16-bit registers. The 16 bit general purpose registers we all given a suffix of ‘X’. So the 8080’s A register’s 16 bit counter part in the 8086 is the AX register.

8086 Memory

The 8086 had a 20-bit address space or 1 megabyte. This was a large jump from the 64k of the 8080. The 20-bit address had to be formed from 16-bit sources. So the memory was partitioned into sixteen 64K segments. The maximum memory addressable by 16-bits. Each segment is identified by a 16-bit segment value ranging from 0x0000 to 0xFFFF. Within each segment, a memory location is selected by a 16-bit offset (the number of bytes from the beginning of the segment). This offset is often referred to as the logical address and it is not a real address but an address into a real segmented address space. This segment/offset combination is often written separating the two parts with a colon, for example: A4FB:4872h. The segment value is first, A4FB then the colon followed by the offset value. So here the we have segment A4FB with offset 4872, Here the h post script means that the values are in Hexadecimal.

To get a real address from this notation the segment value must take the offset value (A4FB) and multiply it by 16 (shift the value 4-bit to the left) and add the offset. So in this example A4FB:4872h becomes a real 20-bit physical address of A9822h.

All of this memory segmentation will be important as we move forward. The 8086 can only actually address 64K of memory at a time. This is why we need the segment registers to allow us to access additional memory beyond 64K.

With this scheme there is a large amount of overlap. Every 16 bytes we start a new segment. So any address ending in 0h is the beginning of a new segment. These 16 byte spaces are called paragraphs. Also note that in the 8086 the segments may overlap, as a result the segment:offset is not unique.

Assembly Language

As we have seen assembly language is the lowest human readable form of program code. But assembly isn’t just a single language. In fact there is a unique assembly language for each microprocessor architecture. Also, each architecture may have multiple dialects of assembly because different Assembler (A tool used to translate Assembly Code to Machine executable code) developers chose to implement their version of assembly language differently. This is the case with the linux “AS” assembler and the “NASM” assembler. “AS” uses the AT&T dialect of x86 assembly language and NASM uses the Intel dialect. These two dialects are very different in terms of syntax:

Intel Syntax: mov eax, 1 (instruction destination, source)

AT&T Syntax: movl $1, %eax (instruction source, destination)

The Intel syntax is pretty self explanatory. In the above example, the amount of data which is to be moved is inferred from the size of the register (32 bits in the case of eax). The addressing mode used is inferred from the operands themselves.

There are some quirks when it comes to the AT&T syntax. First, notice the suffix at the end of the mov instruction. This stands for long and signifies 32 bits of data. Other instruction suffixes include w for a word (16 bits – not to be confused with the word size of your CPU!), q for a quad-word (64 bits) and b for a single byte. While not always required, typically you will see assembly code which uses AT&T syntax explicitly state the amount of data being operated on by the instruction.

We will write our assembly code using the Intel style of x86 assembly code. To translate our assembly code to machine code we will be using the nasm assembler.

Another reason we will need assembly code is that many of the lowest level machine instructions cannot be accessed through high level languages such as C or C++. So under some circumstances assembly or machine code is our only option.

Some Useful Assembly Instructions

Let’s look at a typical x86 assembly instruction:

mov ax, 0x10

This instruction has three parts. The first is the instruction mnemonic “mov”. which in this case represents the move operation. The second part is the destination register “ax”. Which is the 16-bit version of the A register. X is to denote this register as extended from the original 8-bit version. The final part of the instruction above is the value 0f 0x10. A hexadecimal value. This instruction will move the value of 0x10 into the A register.

Now, you might be wondering about that EAX register shown in the AT&T and Intel examples above. We’ll these are 32-bit extended-extended registers that was first used in the 80386 processor. We will discuss 32-bit architectures in a later post. First, I want to get a 16-bit system to boot a real stub OS. There are also 64-bit registers. The 64-bit architecture was first introduced by Intel in their Itanium processor back in 1998. Through 64 bit chips had been produced for specialty computers (i.e. super computers). In 1998 Intel and HP partnered to produce the first Itanium chips. The Itanium was released in 2001. Intel and HP focused this architecture on the server and high end systems market. The original x64 instruction set was designed by AMD (an Intel competitor) and was released in 2000. The instruction set was then implemented by AMD, Intel, and VIA. The 8080 and 8086 registers are still found in the x64 architecture. Their 64-bit counterparts are prefixed with an “R”. I guess it would be confusing to have an extended-extended-extended register.

Here’s a table of Register Naming from 8 to 64 bits


The suffix on the new R8-R15 registers represents ‘B’ for byte, ‘W’ for word (16-bits), ‘D’ for double word (32-bits), and no suffix for the full 64-bits.

OK, One more detour before we write a bit of x86 assembly code.


Interrupts are a mechanism that allow the CPU to be notified of some event. Basically, an interrupt stops the processing of the current task and branches the CPU to a new location in the code called the Interrupt Service Routine or ISR. In the x86 architecture interrupts can be triggered by hardware or software signals. One important use for interrupts is in the handling of I/O devices. Let’s take the keyboard for example. If the CPU had to poll the keyboard for data it would eat up a lot of processor time. This time would be more efficiently used to execute our program instead of continuously asking the keyboard if a key had been pressed. Now if we allow the keypress on the keyboard to trigger an interrupt, then the ISR can simply read the keystroke from the keyboard and pass it on to our program. Once the ISR is finished handling the keypress, it routines to the same code and state before the interrupt occurred. If you think about it, this is just a hardware triggered function call.

Interrupts cal also be turned off and off within the CPU. When they are turned off the CPU will ignore any interrupts that occur. The x86 also include some interrupts that cannot be turned off. These are called “Non-Maskable Interrupts”.

As it turns out the BIOS uses software interrupts to call it’s routines. The BIOS defines many ISRs for typical tasks such as ‘int 0x10’ for for display related routines, and ‘int 0x13’ for disk related routines. The actual method within the BIOS’s ISRs that is called looks at the CPU AX register (in most cases) to determine the actual function to call. So prior to triggering a software interrupt for a BIOS routine, you must first load the function code into one of the CPU registers. This is typically the AX/AH register. However, you need to look up the actual registers used and those that are modified during the ISR. Here’s a document that gives a great list of useful DOS and BIOS routines. Another great source of info on BIOS routines is the Phoenix BIOS User Manual.

Interrupts also have a priority. Higher priority interrupts my interrupt a lower priority interrupts and instructions. Each interrupt is represented by a number. This number is also used as the index into an vector table. When an interrupt occurs, the CPU stops what it is doing, looks up the address of the IRS for the current interrupt in the interrupt vector table and then jumps to that location. The ISR must be placed at the location stored in the vector table.

OK, so we’ve covered the basics of assembly instructions, registers, and interrupts. I guess it’s time to get down to the metals and write some code!

Let’s start by trying to get our boot sector to print something. This will give us a visual indication that the BIOS actually found and executed our code.

; File: salsa/src/hello-salsa.asm
; Desc: A simple boot sector that prints "Salsa Booting..." using the BIOS routines.
; Auth: Randall Morgan
; Date: 02/27/2019

    mov ah, 0x0e    ; int 10/ah = 0eh -> scrolling teletype BIOS routine
    mov al, 'S'
    int 0x10
    mov al, 'a'
    int 0x10
    mov al, 'l'
    int 0x10
    mov al, 's'
    int 0x10
    mov al, 'a'
    int 0x10
    mov al, ' '
    int 0x10
    mov al, 'B'
    int 0x10
    mov al, 'o'
    int 0x10
    mov al, 'o'
    int 0x10
    mov al, 't'
    int 0x10
    mov al, 'i'
    int 0x10
    mov al, 'n'
    int 0x10
    mov al, 'g'
    int 0x10
    mov al, '.'
    int 0x10
    mov al, '.'
    int 0x10
    mov al, '.'
    int 0x10
    jmp $       ; Jump to the current address
    ; Padding and magic BIOS number.
    times 510-($-$$) db 0   ; Pad the boot sector with zeros
    dw 0xaa55               ; Last two bytes are the magic number.

This code is not the most efficient. I wanted to keep things simple for our first program. The first thing we do here is move the value 0x0E into the the high order 8 bits of the AX register (AH). This value is the function code for the BIOS function we want to execute. We will pass this function code in AH to the BIOS ISR for interrupt 0x10. This function, called the “Scrolling Teletype Character Write” routine simply takes what it finds in the lower 8 bits of the AX register (AL) and tries to print it to the screen at the next character position available. The rest of the program simple passes each character of the string “Salsa Booting…” to the BIOS routine. Once the entire string has been printed, we execute a jump to the currently location. This results in an infinite loop.

The times 510-($-$$) db 0 line may be cryptic at first. So let’s break it down as you’ll need to use it or similar commands often. The ‘db 0’ instructs the assembler to ‘define a byte’ at the current location with the value 0. The single dollar sign ‘$’ is the assembler’s token for the current address and the double dollar sign ‘$$’ is the assembler’s token for the origin address (where our code began). So ‘$-$$’ results in calculating the length of our code. Next this value is subtracted from the value 510 and then assembler macro “times” is called with this value. Times then repeats the “db 0” command for (510-length of our code) which results in padding our code with bytes of zeros out to 510 total bytes. At this point our code is 510 bytes long. We then call “dw 0xaa55”. This defines a word (two bytes) at the current location with the value 0xaa55. Which of course is our magic number that tells the BIOS it has found a boot sector.

One note here. You may have noticed that I used db (define byte) for the 0 padding values. This made the calculation easier. However, I used dw (define word) for the magic number. Using define word here garuantees that the two eight bit bytes of the word will be stored in the correct order i.e. with the correct endianess.

Running Salsa Booting

Now it’s time to run our code. First we need to assemble it into machine code and the make a disk image and place our machine code into it’s boot sector. Run the following command in the salsa/src directory to assemble the file.

$ nasm salsa.asm -o hello-salsa.bin

This will give you a binary file of machine code. Next move the hello-salsa.bin file to your salsa/bin folder.

In the salsa/bin folder run the following command:

$ dd if=/dev/zero of=hello-salsa.img bs=512 count=2880

This will build our blank 1.44MB floppy image. Now we need to copy our binary file to the beginning of this disk. We can do that with the following command:

$ dd if=hello-salsa.bin of=hello-salsa.img

Now that we have our floppy disk image we can run it in qemu with the following command:

$ qemu-system-i386 hello-salsa.img

The result is nothing spectacular. However, we’ve had to cover a lot of ground to get here! So be proud of yourself!

Here’s a screen shot of the program.

I think we’ve covered quite a bit for this post. Go read about x86 registers and interrupts and the BIOS routines. I’ll get to work on the next post where we will implement our boot loader.

Until then, Happy Coding!

Salsa – An Operating System for Teaching – Part 1

by SysOps 0 Comments
This entry is part 1 of 3 in the series Salsa OS

It seems as one grows older you naturally take on the responsibility of teaching the next generation. Whether it be ranching, machining , or hardware/software development. What ever your background you will naturally find those younger who look to you for advice on one subject or another.

I too seem to have come to that point in my life when those younger than I seem to look to me for advice on hardware, software, and life issues. I find myself teaching how things came to be as they are now. Having some historical context helps many understand the why and how.

As you know, I’ve given many talks on hardware and software and regularly teach topics on the intersection of hardware and software to software developers. Many of these developers have little to no understanding of how software controls hardware or how hardware enables software manipulation of a system. I believe that having a basic understanding of these topics only makes for better developers. Just as I believe that having a knowledge of multiple programming languages and programming paradigms can only make you a better developer.

I have long taught that a good developer has one or two languages they are well versed in and another half dozen that they are familiar enough with to become productive in a weekend. One language doesn’t fit all problems well. In the same way a carpenter wouldn’t have a single hammer in his toolbox a software developer shouldn’t know only a single language. Who would want to write a simple text parsing script in assembly or C? Who would tackle a high speed computer vision problem in python without calling out to C/C++ libraries?

But I digress, the purpose of this series of articles is to introduce a basic Operating System for teaching OS development and the hardware boot process. As usual, I will start with some background. We will move on to a legacy BIOS (Basic Input Output System) boot system and then eventually cover the move to UEFI (Unified Extensible Firmware Interface). So let’s get started.

A Little History

To write your own OS you need to understand a few basics. We will concentrate on x86 and x86-64 hardware. My aim here is not to produce a fully functional OS that can replace Windows, Linux, or iOS. Instead, I only want to show you how to get a basic OS up and running. I’ll give you enough information and point you in the right direction so that if you’re interested, you can find further knowledge. Sometimes the trick to learning a complex subject is knowing just enough to be able to ask the right questions. But you have to have some knowledge to know what questions to ask, and where to ask them. That’s what Salsa is all about, teaching you what to ask next and help you identify where to learn more.

Before we get to far into this article let me say that for a novice developer OS development is difficult. I may leave a lot of details out that a novice may need. However, I’ll try to reduce each topic to the core essentials. Some of the material will require you to seek more information than I have provided here. At the very least you’ll need a familiarity with some kind of programming language.

The first thing we should cover is what is an operating system? In the early days of computing computers came as hardware only. You were provided a manual of the hardware and instruction set. These instructions where either binary machine code or later assembly mnemonics. The programmer had to develop an intimate knowledge of the computer’s hardware and how it all worked together. Each system was very different from the others. No one language ran on more than a single computer (later, a single computer manufacturer’s line). These computers had no operating system. When the machine was turned on, the programmer had to input a loader program that read in the program to be ran. This loader program was usually rewritten as the programs it loaded changed. Eventually, some programmers began to write more universal loaders that could load any program into the computer’s memory and then jump to the beginning of the program to start executing it.

One issue with loaders and the programs of the day was that the addresses of the various subroutines and indeed the address of the program itself was hard-coded into the program. So if the size or location of the loader changed the programs had to have all addresses recalculated. The addition of a single byte to the loader caused every line of the program to be recompiled. At this time these programs where compiled by hand! There were no compilers or interpreters.

To give you a feel for how this was done let’s look at a fictitious machine and it’s machine language:

The Fiction 1 Computational Machine

Instruction Set
Machine Code : Instruction
00000000 : No operation
00000001 : Load Accumulator A with the value following the current byte
00000010 : Load Accumulator B with the value following the current byte
00000011 : Add the values in Accumulators A and B and place the results in A
00000100 : Store the contents of Accumulator A at the memory address stored in the following byte
00000101 : Load the Accumulator A with the value stored at the address stored in the following byte
00000110 : Branch to program address stored in the following byte
00000111 : Subtract the value in Accumulator B from the value in Accumulator A and store in A
00001000 : Test Carry/Barrow Flag. The flag is set when an addition or subtraction results in a carry or barrow. When tested, sets the Accumulator A to zero if the flag was set, otherwise 255 is placed in Accumulator A.
00001001 : Branch to the address stored in the following byte if the Accumulator A contains 255
00001010 : Move the value in Accumulator A into Accumulator B
00001011 : Move the value in Accumulator B into Accumulator A

OK, looking at the machine instructions above let’s try to write a simple program that takes two numbers from memory locations 1000 and 1001 and multiplies them:

The first thing you may notice is that there is no multiplication instructions. So how do we multiply two numbers? Well, multiplication is just repetitive addition. So we simply use addition to do our multiplication. Here’s the pseudo code:
1. Load A with first value
2. Load B with second value
3. Add A and B, result is stored in A
4. Decrement B
5. Is B zero?
6. Yes, we’re done
7. No, go back to step 3

Our program:
Notes: Data Address 1011 will be used to store intermediate results
Data Address 1100 will be used for temporary storage of counter
Address Instruction
00000000 00000101 # Load Accumulator A with second value
00000001 00001001 # Address of value to place in B
00000010 00001010 # Move value to B
00000011 00000101 # Load A with first value

00000100 00001100 # Address of value to place
00000100 00000011 # Add A and B and store result in A
00000101 00000100 # Save value in A
00000110 00001011 # Data Memory location to hold intermediate result
00000111 00001011 # Move value in B to A
00001000 00000010 # Load B with the value 1
00001001 00000111 # Subtract B from A and store result in A, Also sets the Carry/Borrow flag
00001010 00000100 # Save value in A
00001011 00001100 # Data Memory counter storage location
00001100 00001000 # Test flag and set A value, if flag was set, done!
00001101 00001001 # Branch if flag was cleared, destroys A
00001110 00000110 # Branch to address stored in next byte
00001111 00001110 # Forever loop… jump to self
00010000 00000101 # Retrieve counter in A
00010001 00001100 # Counter temp storage address
00010010 00001010 # Move A to B
00010011 00000101 # Retrieve current value in A
00010100 00000110 # Jump back to step 3 (Add)
00010101 00000100 # Address of step 3

The above machine program for our fictitious machine had to be hand assemble. You simply look up the value of each of the instructions in the instruction listing and then write them down. Once this was done, you then entered these values into the control panel of the machine. This was usually done via a set of switches. For our machine we would need to set 8 switches for the address and 8 more switches for the instruction byte. Then press some other switch to load the data into program memory.

Once our program is finished it simply loops at address 00001110 forever. In a real system you would store the result in a Data Memory location to be read by another program, or display it on a set of lamps. Note that all the addresses of the data and routines are hard-coded.

Eventually, programmers developed mnemonics for the machine instruction values. These came to be known as Assembly code. Assembly is a very low level abstraction on the actual machine code. A single assembly code instructions such as LDA #3 (load A with a value of 3) usually results in a single machine code instruction. However, some assembly language instructions can also result in a handful of machine instructions. Assembly language is much easier for humans to read and write. So it has become the standard language for ultra-low-level code and no-one but the chip designers use machine code anymore.

Fiction 1 Assembly Mnemonics

Mnemonic : Instruction
NOP : No operation
LDA <val> : Load Accumulator A with the value following the current byte
LDB <val> : Load Accumulator B with the value following the current byte
ADD : Add the values in Accumulators A and B and place the results in A
STA ADDR : Store the contents of Accumulator A at the memory address stored in the following byte
LDA [ADDR] : Load the Accumulator A with the value stored at the address stored in the following byte
BRA ADDR : Branch to program address stored in the following byte
SUB : Subtract the value in Accumulator B from the value in Accumulator A and store in A
TST : Test Carry/Barrow Flag. The flag is set when an addition or subtraction results in a carry or barrow. When tested, sets the Accumulator A to zero if the flag was set, otherwise 255 is placed in Accumulator A.
BRC ADDR : Branch to the address stored in the following byte if the Accumulator A contains 255
MOV A, B : Move the value in Accumulator A into Accumulator B
MOV B, A : Move the value in Accumulator B into Accumulator A

Using the assembly mnemonics our program becomes:

Address Instruction
LDA [00001001b] # Load Accumulator A with
MOV A, B # Move value to B
LDA [00001100b] # Load A with value
lp01: ADD # Add A and B and store result in A
STA [00001011b] # Save value in A
MOV B, A # Move value in B to A
LDB 00000001b # Load B with the value 1
SUB # Subtract B from A and store result in A, Also sets the Carry/Borrow flag
STA [00001100b] # Save value in A
TST # Test flag and set A value, if flag was set, done!
loop: BRC loop # Branch if flag was cleared, destroys A
LDA [0001100b] # Retrieve counter in A counter from temp storage
MOV A, B # Move A to B
LDA [00001100b] # Retrieve current value in A
BRA lp01 # Jump back to step 3 (Add

As you can see, using assembly mnemonics the code becomes much easier to read and understand. Looking through the code it doesn’t take long to understand that when a value is given in square brackets it refers to the value stored at the given memory address. If the value is not in brackets it is used as presented. These Mnemonics are much easier to remember and understand when compared to the machine code values shown above.

To see an example of what early computers were like visit: http://oldcomputers.net/altair-8800.html. You may also enjoy this video demonstration of the Altair 8800: https://www.youtube.com/watch?v=vAhp_LzvSWk. The Altair was my first computer. My father purchased a kit and put it together and allowed me to help. We then spent many hours writing simple programs for it and building our own peripherals. The Altair was the start of personal computers. While there were a couple of small computers before it (see http://www.computermuseum.20m.com/kenbak.htm), they never quite took off. The Altair however, found it’s way to the cover of Popular Electronics and found admiration from the likes of Bill Gates who wrote the first BASIC interpreter for the Altair 8800. Which resulted in the Microsoft Corporation.

Ok, so we’ve seen how it was done in the early days. Now our program doesn’t use a loader. However, for most computers loaders are required. The programmer would write a program that started just past the end of the loader program and would use addresses from that point on. Eventually, David Wheeler wrote a relocatable linker/loader for the EDSAC computer. This allowed programmers to forget about the program’s actual location in memory and let the linker/loader figure it out.

With the advent of larger more complex computer systems the need arose to have a set of routines that were permanently stored on the computer and could be called by a management program. These libraries of routines provided canned code for interacting with I/O devices like paper tape readers. They also contained the loaders to load programs from paper tape into memory and begin executing that code. These libraries grew as did the complexity and number of peripherals. Eventually, these machine were produced with a large library of hardware management routines and schedulers to allow compute jobs to be multiplexed in a time-sharing operation. It was these early libraries that eventually became the kernels of today’s operating systems.

Today’s Operating Systems now provide a host of services. However, there is still a small library of code that comes with the hardware and is responsible for initializing the hardware. This code, called the BIOS or UEFI and is executed when the hardware is first powered on. It is responsible for calling the OS loader or “Boot Loader”. The BIOS is older than the UEFI and was introduced in the first PCs back in the 1970’s. It can still be found in an emulation mode in current PCs using UEFI. UEFI was designed to replace the BIOS. It provides many more routines and services and is in step with today’s hardware. That said, we’ll first use the BIOS as it is important to understand the BIOS and how to boot using a BIOS system. In later articles we’ll examine the UEFI.

The Boot Process´╗┐

The topics we’re going to cover are:

  • How a computer boots using the BIOS
  • How to write low-level programs where there is no operating system.
  • How to configure the CPU so that we can use it’s extended feature.
  • How to bootstrap code written in higher level languages so we can use it to write more complex code.
  • How to create some fundamental services such as device drivers, file systems, and multi-tasking systems.

When we boot our computer, it must start up without any notion of an operating system. Somehow, it must load the operating system. Whether it be Windows, Linux, or DOS. It must load the operating system from some form of permanent storage such as ROM, Hard Disk, Floppy Disk, SD Card, or USB thumb drive.

The pre-OS environment of our computers offers very little in the way of services. At this this stage even a simple file system doesn’t exist. What we do have is a simple collection of basic routines stored in a ROM (usually EEPROM these days). These routines provide access to the system’s basic I/O devices such as screen, keyboard, and disks.

The BIOS performs some low-level tasks such as initializing hardware and testing the basic system, it boots the operating system which is stored on a connected device such as a hard drive. However, the BIOS cannot simply load the operating system from the device. It has no idea of a file system. So the BIOS must read the operating system from a predetermined location. This location is called the Boot Sector. The boot sector is usually 512 bytes long. It is easiest for the BIOS to locate the OS at at the beginning of the disk at Cylinder 0, Head 0, Sector 0. This is the location known as the boot sector. However, most disks will not contain the OS and it would be a shame to waste this storage space on those disks when it could be used for data storage. So the BIOS needs a way to detect if an OS is present in the boot sector. To do this, the BIOS looks for a “Magic Number”, placed at the end of the 512 byte boot sector. This two byte magic number is 0xAA55. So on power up after some initialization work, the BIOS searches through the connected storage devices looking for the magic number. When it finds the magic number it knows that it has found the OS and can load it into memory and jump to the location to start execution of the OS.

If we use a program such as a Hex-Editor that will let us write raw byte values to a file, we can craft a simple boot sector.

e9 fd ff 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa

Referring to the code above, the first three bytes are a simple infinite loop in machine code. The last two bytes are the magic number. You may wonder if I got this wrong? I thought you said the magic number was AA55?

Well, I did and it is. CPU’s can store data either Lest Significant Byte (LSB) first or Most Significant Byte (MSB) First . This data storage is known as Endianess. If a CPU stores the the data LSB first, it is said to be Little-Endian. If it stores the data MSB first it is Big-Endian. The x86 CPU family is little endian and so the LSB is stored first.

Most of the time you can ignore a CPU’s endianess. However, if you have to debug compiled code in a debugger you need to know how the machine code is stored or you may completely misread it.

This program is likely the smallest program you can write. But it is a valid bootable program. We can run this program either using an emulator such as qemu or Bochs. I’ll concentrate on using Qemu here.

Creating And Booting Our Stub

First you’ll need a Hex Editor. On Linux you can install Bless Hex Editor with the command: apt install bless On Window you can install wxHexEditor. It can be downloaded from SourceForge or, you may use the hex editor of your choosing. The only requirement is that it must allow you to save the raw bytes to a file.

On Windows 10 you can use the Linux Windows Sub-System. You can find more info here: https://docs.microsoft.com/en-us/windows/wsl/install-win10 This will give you access to most of the needed Linux commands and tools used here. On Windows 10 or earlier versions such as Vista , you can install Cygwin, or MinGW to gain access to the Unix-like tools we will be using. This includes the gcc compiler and the nasm assembler.

In addition to the hex editor you will need an emulator. For Linux, MacOS, and Windows I recommend using Qemu. However, Bochs Emulator can also be used. Note that Bochs requires a configuration file to be setup before using it. Qemu will work out of the box. Qemu can be installed on Linux using the command: apt install qemu On Windows you can download the Windows installer here: https://qemu.weilnetz.de/w64/ . On Windows you’ll need to add it to your path after installation. Their is a good Youtube video on this: https://www.youtube.com/watch?v=al1cnTjeayk&t=17s. You can find the CygWin Project home page here: https://www.cygwin.com/. When installing CygWin make sure to install the gcc compiler and the nasm assembler.

Now that you have your tools installed let’s create our stub program. This program will do nothing but loop infinity. However, it will prove that we can position code and the magic number in the right places to be found and executed by the BIOS.

Open your hex editor and the three hex values e9, fd, ff into the first three locations. Next fill the editor with 00 byte values up to the 0x1ff position (511 in hex). Then replace the last two byte with the magic number remembering to place them using little-endian format. So your file should look like the binary listing above with 55, aa being the last two bytes.

Create a project folder named “salsa”. In that folder create the sub-folders salsa/scr and salsa/bin. Save your hex file as “stub.bin” in the salsa/bin folder. Next, open a terminal in salsa/bin and run the command:

$ dd if=/dev/zero of=salsa.img bs=512 count=2880

Be careful using dd as you can corrupt or even erase your hard drives using it. dd is a utility on Linux for creating disk images. It is powerful and with power comes great responsibility! Read the docs on dd and get to know it. The command I presented above takes an input file defined by the if parameter. Here we passed /dev/zero which is is a special file that provides as many null characters (ASCII NUL, 0x00) as are read from it. A typical uses is to provide a character stream for initializing data storage. Which is exactly what we use it for here.

The bs parameter tells dd the block size to produce and the of parameter is the output file name. The count parameter tell dd how many blocks to copy. If we do the math, 2880 blocks of 512 bytes gives us 1474560. This is the size we need for our 1.44 MB disk image.

Next we need to copy our binary data to the boot sector of our disk image. To do this we use the command:

$ dd if=stub.bin of=salsa.img

Now we have a 1.44 MB floppy disk image file that we can use with Qemu or Bochs. So let’s try and boot it. Use the command:

$ qemu-system-i386 salsa.img

You should see something like:

Since our stub only runs an infinite loop nothing more will happen. No matter how long you wait. So the next step will be to get our stub to do something useful. For that we’re going to need an x86 Assembler.

We’ve covered a lot in this post. So we’ll end here. Be proud that you got the BIOS to locate your code and execute it. In the next post we will learn about the processor and the x86 assembly language. We;ll need to have some basic knowledge of these topics before moving on to build something useful.

Until next time; Happy Coding!

Newsletter Powered By : XYZScripts.com