Salsa – An Operating System for Teaching – Part 1

by SysOps
This entry is part 1 of 3 in the series Salsa OS

It seems as one grows older you naturally take on the responsibility of teaching the next generation. Whether it be ranching, machining , or hardware/software development. What ever your background you will naturally find those younger who look to you for advice on one subject or another.

I too seem to have come to that point in my life when those younger than I seem to look to me for advice on hardware, software, and life issues. I find myself teaching how things came to be as they are now. Having some historical context helps many understand the why and how.

As you know, I’ve given many talks on hardware and software and regularly teach topics on the intersection of hardware and software to software developers. Many of these developers have little to no understanding of how software controls hardware or how hardware enables software manipulation of a system. I believe that having a basic understanding of these topics only makes for better developers. Just as I believe that having a knowledge of multiple programming languages and programming paradigms can only make you a better developer.

I have long taught that a good developer has one or two languages they are well versed in and another half dozen that they are familiar enough with to become productive in a weekend. One language doesn’t fit all problems well. In the same way a carpenter wouldn’t have a single hammer in his toolbox a software developer shouldn’t know only a single language. Who would want to write a simple text parsing script in assembly or C? Who would tackle a high speed computer vision problem in python without calling out to C/C++ libraries?

But I digress, the purpose of this series of articles is to introduce a basic Operating System for teaching OS development and the hardware boot process. As usual, I will start with some background. We will move on to a legacy BIOS (Basic Input Output System) boot system and then eventually cover the move to UEFI (Unified Extensible Firmware Interface). So let’s get started.

A Little History

To write your own OS you need to understand a few basics. We will concentrate on x86 and x86-64 hardware. My aim here is not to produce a fully functional OS that can replace Windows, Linux, or iOS. Instead, I only want to show you how to get a basic OS up and running. I’ll give you enough information and point you in the right direction so that if you’re interested, you can find further knowledge. Sometimes the trick to learning a complex subject is knowing just enough to be able to ask the right questions. But you have to have some knowledge to know what questions to ask, and where to ask them. That’s what Salsa is all about, teaching you what to ask next and help you identify where to learn more.

Before we get to far into this article let me say that for a novice developer OS development is difficult. I may leave a lot of details out that a novice may need. However, I’ll try to reduce each topic to the core essentials. Some of the material will require you to seek more information than I have provided here. At the very least you’ll need a familiarity with some kind of programming language.

The first thing we should cover is what is an operating system? In the early days of computing computers came as hardware only. You were provided a manual of the hardware and instruction set. These instructions where either binary machine code or later assembly mnemonics. The programmer had to develop an intimate knowledge of the computer’s hardware and how it all worked together. Each system was very different from the others. No one language ran on more than a single computer (later, a single computer manufacturer’s line). These computers had no operating system. When the machine was turned on, the programmer had to input a loader program that read in the program to be ran. This loader program was usually rewritten as the programs it loaded changed. Eventually, some programmers began to write more universal loaders that could load any program into the computer’s memory and then jump to the beginning of the program to start executing it.

One issue with loaders and the programs of the day was that the addresses of the various subroutines and indeed the address of the program itself was hard-coded into the program. So if the size or location of the loader changed the programs had to have all addresses recalculated. The addition of a single byte to the loader caused every line of the program to be recompiled. At this time these programs where compiled by hand! There were no compilers or interpreters.

To give you a feel for how this was done let’s look at a fictitious machine and it’s machine language:

The Fiction 1 Computational Machine

Instruction Set
Machine Code : Instruction
00000000 : No operation
00000001 : Load Accumulator A with the value following the current byte
00000010 : Load Accumulator B with the value following the current byte
00000011 : Add the values in Accumulators A and B and place the results in A
00000100 : Store the contents of Accumulator A at the memory address stored in the following byte
00000101 : Load the Accumulator A with the value stored at the address stored in the following byte
00000110 : Branch to program address stored in the following byte
00000111 : Subtract the value in Accumulator B from the value in Accumulator A and store in A
00001000 : Test Carry/Barrow Flag. The flag is set when an addition or subtraction results in a carry or barrow. When tested, sets the Accumulator A to zero if the flag was set, otherwise 255 is placed in Accumulator A.
00001001 : Branch to the address stored in the following byte if the Accumulator A contains 255
00001010 : Move the value in Accumulator A into Accumulator B
00001011 : Move the value in Accumulator B into Accumulator A

OK, looking at the machine instructions above let’s try to write a simple program that takes two numbers from memory locations 1000 and 1001 and multiplies them:

The first thing you may notice is that there is no multiplication instructions. So how do we multiply two numbers? Well, multiplication is just repetitive addition. So we simply use addition to do our multiplication. Here’s the pseudo code:
1. Load A with first value
2. Load B with second value
3. Add A and B, result is stored in A
4. Decrement B
5. Is B zero?
6. Yes, we’re done
7. No, go back to step 3

Our program:
Notes: Data Address 1011 will be used to store intermediate results
Data Address 1100 will be used for temporary storage of counter
Address Instruction
00000000 00000101 # Load Accumulator A with second value
00000001 00001001 # Address of value to place in B
00000010 00001010 # Move value to B
00000011 00000101 # Load A with first value

00000100 00001100 # Address of value to place
00000100 00000011 # Add A and B and store result in A
00000101 00000100 # Save value in A
00000110 00001011 # Data Memory location to hold intermediate result
00000111 00001011 # Move value in B to A
00001000 00000010 # Load B with the value 1
00001001 00000111 # Subtract B from A and store result in A, Also sets the Carry/Borrow flag
00001010 00000100 # Save value in A
00001011 00001100 # Data Memory counter storage location
00001100 00001000 # Test flag and set A value, if flag was set, done!
00001101 00001001 # Branch if flag was cleared, destroys A
00001110 00000110 # Branch to address stored in next byte
00001111 00001110 # Forever loop… jump to self
00010000 00000101 # Retrieve counter in A
00010001 00001100 # Counter temp storage address
00010010 00001010 # Move A to B
00010011 00000101 # Retrieve current value in A
00010100 00000110 # Jump back to step 3 (Add)
00010101 00000100 # Address of step 3

The above machine program for our fictitious machine had to be hand assemble. You simply look up the value of each of the instructions in the instruction listing and then write them down. Once this was done, you then entered these values into the control panel of the machine. This was usually done via a set of switches. For our machine we would need to set 8 switches for the address and 8 more switches for the instruction byte. Then press some other switch to load the data into program memory.

Once our program is finished it simply loops at address 00001110 forever. In a real system you would store the result in a Data Memory location to be read by another program, or display it on a set of lamps. Note that all the addresses of the data and routines are hard-coded.

Eventually, programmers developed mnemonics for the machine instruction values. These came to be known as Assembly code. Assembly is a very low level abstraction on the actual machine code. A single assembly code instructions such as LDA #3 (load A with a value of 3) usually results in a single machine code instruction. However, some assembly language instructions can also result in a handful of machine instructions. Assembly language is much easier for humans to read and write. So it has become the standard language for ultra-low-level code and no-one but the chip designers use machine code anymore.

Fiction 1 Assembly Mnemonics

Mnemonic : Instruction
NOP : No operation
LDA <val> : Load Accumulator A with the value following the current byte
LDB <val> : Load Accumulator B with the value following the current byte
ADD : Add the values in Accumulators A and B and place the results in A
STA ADDR : Store the contents of Accumulator A at the memory address stored in the following byte
LDA [ADDR] : Load the Accumulator A with the value stored at the address stored in the following byte
BRA ADDR : Branch to program address stored in the following byte
SUB : Subtract the value in Accumulator B from the value in Accumulator A and store in A
TST : Test Carry/Barrow Flag. The flag is set when an addition or subtraction results in a carry or barrow. When tested, sets the Accumulator A to zero if the flag was set, otherwise 255 is placed in Accumulator A.
BRC ADDR : Branch to the address stored in the following byte if the Accumulator A contains 255
MOV A, B : Move the value in Accumulator A into Accumulator B
MOV B, A : Move the value in Accumulator B into Accumulator A

Using the assembly mnemonics our program becomes:

Address Instruction
LDA [00001001b] # Load Accumulator A with
MOV A, B # Move value to B
LDA [00001100b] # Load A with value
lp01: ADD # Add A and B and store result in A
STA [00001011b] # Save value in A
MOV B, A # Move value in B to A
LDB 00000001b # Load B with the value 1
SUB # Subtract B from A and store result in A, Also sets the Carry/Borrow flag
STA [00001100b] # Save value in A
TST # Test flag and set A value, if flag was set, done!
loop: BRC loop # Branch if flag was cleared, destroys A
LDA [0001100b] # Retrieve counter in A counter from temp storage
MOV A, B # Move A to B
LDA [00001100b] # Retrieve current value in A
BRA lp01 # Jump back to step 3 (Add

As you can see, using assembly mnemonics the code becomes much easier to read and understand. Looking through the code it doesn’t take long to understand that when a value is given in square brackets it refers to the value stored at the given memory address. If the value is not in brackets it is used as presented. These Mnemonics are much easier to remember and understand when compared to the machine code values shown above.

To see an example of what early computers were like visit: You may also enjoy this video demonstration of the Altair 8800: The Altair was my first computer. My father purchased a kit and put it together and allowed me to help. We then spent many hours writing simple programs for it and building our own peripherals. The Altair was the start of personal computers. While there were a couple of small computers before it (see, they never quite took off. The Altair however, found it’s way to the cover of Popular Electronics and found admiration from the likes of Bill Gates who wrote the first BASIC interpreter for the Altair 8800. Which resulted in the Microsoft Corporation.

Ok, so we’ve seen how it was done in the early days. Now our program doesn’t use a loader. However, for most computers loaders are required. The programmer would write a program that started just past the end of the loader program and would use addresses from that point on. Eventually, David Wheeler wrote a relocatable linker/loader for the EDSAC computer. This allowed programmers to forget about the program’s actual location in memory and let the linker/loader figure it out.

With the advent of larger more complex computer systems the need arose to have a set of routines that were permanently stored on the computer and could be called by a management program. These libraries of routines provided canned code for interacting with I/O devices like paper tape readers. They also contained the loaders to load programs from paper tape into memory and begin executing that code. These libraries grew as did the complexity and number of peripherals. Eventually, these machine were produced with a large library of hardware management routines and schedulers to allow compute jobs to be multiplexed in a time-sharing operation. It was these early libraries that eventually became the kernels of today’s operating systems.

Today’s Operating Systems now provide a host of services. However, there is still a small library of code that comes with the hardware and is responsible for initializing the hardware. This code, called the BIOS or UEFI and is executed when the hardware is first powered on. It is responsible for calling the OS loader or “Boot Loader”. The BIOS is older than the UEFI and was introduced in the first PCs back in the 1970’s. It can still be found in an emulation mode in current PCs using UEFI. UEFI was designed to replace the BIOS. It provides many more routines and services and is in step with today’s hardware. That said, we’ll first use the BIOS as it is important to understand the BIOS and how to boot using a BIOS system. In later articles we’ll examine the UEFI.

The Boot Process´╗┐

The topics we’re going to cover are:

  • How a computer boots using the BIOS
  • How to write low-level programs where there is no operating system.
  • How to configure the CPU so that we can use it’s extended feature.
  • How to bootstrap code written in higher level languages so we can use it to write more complex code.
  • How to create some fundamental services such as device drivers, file systems, and multi-tasking systems.

When we boot our computer, it must start up without any notion of an operating system. Somehow, it must load the operating system. Whether it be Windows, Linux, or DOS. It must load the operating system from some form of permanent storage such as ROM, Hard Disk, Floppy Disk, SD Card, or USB thumb drive.

The pre-OS environment of our computers offers very little in the way of services. At this this stage even a simple file system doesn’t exist. What we do have is a simple collection of basic routines stored in a ROM (usually EEPROM these days). These routines provide access to the system’s basic I/O devices such as screen, keyboard, and disks.

The BIOS performs some low-level tasks such as initializing hardware and testing the basic system, it boots the operating system which is stored on a connected device such as a hard drive. However, the BIOS cannot simply load the operating system from the device. It has no idea of a file system. So the BIOS must read the operating system from a predetermined location. This location is called the Boot Sector. The boot sector is usually 512 bytes long. It is easiest for the BIOS to locate the OS at at the beginning of the disk at Cylinder 0, Head 0, Sector 0. This is the location known as the boot sector. However, most disks will not contain the OS and it would be a shame to waste this storage space on those disks when it could be used for data storage. So the BIOS needs a way to detect if an OS is present in the boot sector. To do this, the BIOS looks for a “Magic Number”, placed at the end of the 512 byte boot sector. This two byte magic number is 0xAA55. So on power up after some initialization work, the BIOS searches through the connected storage devices looking for the magic number. When it finds the magic number it knows that it has found the OS and can load it into memory and jump to the location to start execution of the OS.

If we use a program such as a Hex-Editor that will let us write raw byte values to a file, we can craft a simple boot sector.

e9 fd ff 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa

Referring to the code above, the first three bytes are a simple infinite loop in machine code. The last two bytes are the magic number. You may wonder if I got this wrong? I thought you said the magic number was AA55?

Well, I did and it is. CPU’s can store data either Lest Significant Byte (LSB) first or Most Significant Byte (MSB) First . This data storage is known as Endianess. If a CPU stores the the data LSB first, it is said to be Little-Endian. If it stores the data MSB first it is Big-Endian. The x86 CPU family is little endian and so the LSB is stored first.

Most of the time you can ignore a CPU’s endianess. However, if you have to debug compiled code in a debugger you need to know how the machine code is stored or you may completely misread it.

This program is likely the smallest program you can write. But it is a valid bootable program. We can run this program either using an emulator such as qemu or Bochs. I’ll concentrate on using Qemu here.

Creating And Booting Our Stub

First you’ll need a Hex Editor. On Linux you can install Bless Hex Editor with the command: apt install bless On Window you can install wxHexEditor. It can be downloaded from SourceForge or, you may use the hex editor of your choosing. The only requirement is that it must allow you to save the raw bytes to a file.

On Windows 10 you can use the Linux Windows Sub-System. You can find more info here: This will give you access to most of the needed Linux commands and tools used here. On Windows 10 or earlier versions such as Vista , you can install Cygwin, or MinGW to gain access to the Unix-like tools we will be using. This includes the gcc compiler and the nasm assembler.

In addition to the hex editor you will need an emulator. For Linux, MacOS, and Windows I recommend using Qemu. However, Bochs Emulator can also be used. Note that Bochs requires a configuration file to be setup before using it. Qemu will work out of the box. Qemu can be installed on Linux using the command: apt install qemu On Windows you can download the Windows installer here: . On Windows you’ll need to add it to your path after installation. Their is a good Youtube video on this: You can find the CygWin Project home page here: When installing CygWin make sure to install the gcc compiler and the nasm assembler.

Now that you have your tools installed let’s create our stub program. This program will do nothing but loop infinity. However, it will prove that we can position code and the magic number in the right places to be found and executed by the BIOS.

Open your hex editor and the three hex values e9, fd, ff into the first three locations. Next fill the editor with 00 byte values up to the 0x1ff position (511 in hex). Then replace the last two byte with the magic number remembering to place them using little-endian format. So your file should look like the binary listing above with 55, aa being the last two bytes.

Create a project folder named “salsa”. In that folder create the sub-folders salsa/scr and salsa/bin. Save your hex file as “stub.bin” in the salsa/bin folder. Next, open a terminal in salsa/bin and run the command:

$ dd if=/dev/zero of=salsa.img bs=512 count=2880

Be careful using dd as you can corrupt or even erase your hard drives using it. dd is a utility on Linux for creating disk images. It is powerful and with power comes great responsibility! Read the docs on dd and get to know it. The command I presented above takes an input file defined by the if parameter. Here we passed /dev/zero which is is a special file that provides as many null characters (ASCII NUL, 0x00) as are read from it. A typical uses is to provide a character stream for initializing data storage. Which is exactly what we use it for here.

The bs parameter tells dd the block size to produce and the of parameter is the output file name. The count parameter tell dd how many blocks to copy. If we do the math, 2880 blocks of 512 bytes gives us 1474560. This is the size we need for our 1.44 MB disk image.

Next we need to copy our binary data to the boot sector of our disk image. To do this we use the command:

$ dd if=stub.bin of=salsa.img

Now we have a 1.44 MB floppy disk image file that we can use with Qemu or Bochs. So let’s try and boot it. Use the command:

$ qemu-system-i386 salsa.img

You should see something like:

Since our stub only runs an infinite loop nothing more will happen. No matter how long you wait. So the next step will be to get our stub to do something useful. For that we’re going to need an x86 Assembler.

We’ve covered a lot in this post. So we’ll end here. Be proud that you got the BIOS to locate your code and execute it. In the next post we will learn about the processor and the x86 assembly language. We;ll need to have some basic knowledge of these topics before moving on to build something useful.

Until next time; Happy Coding!

Series NavigationSalsa – An Operating System for Teaching -Part 2 >>