Sunday, April 13, 2014

Writing a Very, Very Simple Emulator


In this post I will be covering an interesting topic that I myself came in contact with a few years ago now; Writing an emulator. For the purposes of simplicity and learning, I will present a very minimal architecture that we will model in software, but the principles learned here will be applicable to existing hardware.

Common for most architectures is the presence of various forms of state tracking mechanisms. Random access memory is one such mechanism. Here an application can store data that can be fetched and manipulated to progress or revert the state of the application. Another state tracking mechanism that is usually less known unless you are interested in hardware design or have been working with low-level languages are the so-called processor registers. Many architectures can only perform operations on registers, meaning data has to be loaded from RAM to a register, manipulated, and then stored back into RAM. Some registers on a processor has a specific purpose, and is not available for general purpose computing. Two such registers are the stack pointer which points to the top value of the stack, and the instruction pointer which points to current instruction that is being executed by the processor.

For the purposes of this tutorial we will be emulating a 8-bit CPU architecture called CRAP, which is an acronym for something. Use your imagination. Being an 8-bit architecture entails that the processor can only distinguish between 256 unique memory locations. On our architecture the size of each location is 8 bits. CRAP has an instruction pointer, but no stack pointer or any general purpose registers, since it can work directly on memory in RAM. All CRAP machines are expected to have the full amount of RAM available to function properly.

Each instruction is three bytes wide. The first byte is the opcode. It tells the processor what to do (addition, subtraction, multiplication, stack push, etc). The second byte will be the destination operand (let's call it A), while the third byte will be the source operand (called B), so that a complete instruction will result in something similar; A = A + B should the opcode be addition. All operations are unsigned.

Both code and program storage share the 256 bytes of RAM. Code is stored in reverse from address 255 down to zero. The programmer is free to store program state in any memory location since we don't have any stack pointer. However, the programmer must be certain not to store program state where code is stored, lest the program become corrupted, or the programmer actually intends to modify the code at run-time.

The instruction set contains the following opcodes where A and B are the operands:

0 Does nothing.
1 Returns control to parent process (terminate execution).
2 Sets value in address A to value in address B.
3 Sets value in address A to value B.
4 Adds value in address A and value in address B and stores result in address A.
5 Subtracts value in address A and value in address B and stores result in address A.
6 Multiplies value in address A and value in address B and stores result in address A.
7 Divides value in address A and value in address B and stores result in address A.
8 Ands value in address A and value in address B and stores result in address A.
9 Ors value in address A and value in address B and stores result in address A.
10 Xors value in address A and value in address B and stores result in address A.
11 Shifts value in address A by value in address B to the right and stores result in address A.
12 Shifts value in address A by value in address B to the left and stores result in address A.
13 Skips one instruction if value in address A is less than value in address B.
14 Skips one instruction if value in address A is greater than value in address B.
15 Skips one instruction if value in address A is equal to value in address B.
16 Sets instruction pointer to value in address A.
17 Prints a string of values starting from address A with the size of value in address B.
18 Prints a string of characters starting from address A with the size of value in address B.*

What does a machine like this look like in code. Here's one implementation.

class CRAP
{
private:
  unsigned char I; // instruction pointer
  unsigned char RAM[256]; // random access memory
public:
  void SetProgram(const std::list<unsigned char> &program);
  void Run( void );
};

First, we need to load a program into RAM. This is what the SetProgram function is for. Remember that the program is loaded into memory in reverse beginning from address 255.

void CRAP::SetProgram(const std::list<unsigned char> &program)
{
  std::list<unsigned char>::const_iterator instr = program.begin();
  for (unsigned char i = 0; i < (unsigned char)program.size(); ++i, ++ instr) {
    RAM[255-i] = *instr;
  }
  RAM[255 - program.size()] = 1; // safe-guard in case there is no program-return
}

Second, we need to implement the Run() function. This is the meat of the emulator. Basically, this function will initially set the instruction pointer to 255. This is the starting point for all programs. The program will then read three bytes from RAM in reverse using the instruction pointer and then perform an operation depending on . This process will repeat until the operation byte has the value 1 at which point the function will terminate.

There are two ways of writing the main execution loop; Using either a switch-case-centric design, or function table. According to my tests a switch-case is about four times faster than the function table, so we'll go with that.

void CRAP::Run( void )
{
  I = 255; // initialize program counter
  while (true) {
    unsigned char O = RAM[I--]; // operation
    unsigned char A = RAM[I--]; // first operand
    unsigned char B = RAM[I--]; // second operand
    switch (O) {
      case 0:  break;
      case 1: return;
      case 2: RAM[A] = RAM[B]; break;
      case 3: RAM[A] = B; break;
      case 4: RAM[A] += RAM[B]; break;
      case 5: RAM[A] -= RAM[B]; break;
      case 6: RAM[A] *= RAM[B]; break;
      case 7: RAM[A] /= RAM[B]; break;
      case 8: RAM[A] &= RAM[B]; break;
      case 9: RAM[A] |= RAM[B]; break;
      case 10: RAM[A] ^= RAM[B]; break;
      case 11: RAM[A] >>= RAM[B]; break;
      case 12: RAM[A] <<= RAM[B]; break;
      case 13: if (RAM[A] < RAM[B]) { I -= 3; } break;
      case 14: if (RAM[A] > RAM[B]) { I -= 3; }; break;
      case 15: if (RAM[A] == RAM[B]) { I -= 3; }; break;
      case 16: I = RAM[A]; break;
      case 17: for (unsigned int i = 0; i < RAM[B]; ++i) { std::cout << (unsigned int)RAM[(unsigned char)(A+i)]; } break;
      case 18: for (unsigned int i = 0; i < RAM[B]; ++i) { std::cout << RAM[(unsigned char)(A+i)]; } break;
      default: std::cout << "Illegal instruction"; return;
    }
  }
}

I would also like to mention that you programmers that have ever implemented a calculator have already implemented the core ideas that make an emulator work. Take a look at this.

// Format: A + B (where + can be substituted for -, *, or /)

void ProcessOperation(const std::vector<std::string> &input)
{
  // input: Let's say the input is a user string that has
  // been split up into individual words by a parser
  if (input.size() != 3 || input[1].size() != 1) {
    std::cout << Invalid formatting.” << std::endl;
    return;
  }

  int A = atoi(input[0]);
  char instruction = input[1][0];
  int B = atoi(input[2]);

  switch (instruction) {
    case '+': std::cout << ans=” << A + B << std::endl; break;
    case '-': std::cout << ans=” << A - B << std::endl; break;
    case '*': std::cout << ans=” << A * B << std::endl; break;
    case '/': std::cout << ans=” << A / B << std::endl; break;
    default: std::cout << Invalid operation << std::endl;
  }

}

It's the exact same thing that an emulator does! It reads the operation and the operands, and performs an action on those operands based on the operation. The major difference is that an emulator may or may not have more complex operations, operations that involve memory addresses rather than constant values, and state preservation, but hey, an advanced calculator can do that as well!

Now we really only need one more component to complete our emulator. Need is somewhat of a strong word. We already have a working emulator. However, it would be nice and convenient to actually write a program externally and load it into memory so we don't need to hard-code each program and recompile the entire emulator to run it. I'll keep this as simple as possible:

bool LoadProgram(const char *filename, std::list<unsigned char> &out)
{
  out.clear();
  std::ifstream fin(filename);
  if (!fin.is_open()) {
    std::cout << "Could not read file" << std::endl;
    return false;
  }
  unsigned int op;
  while (fin >> op) {
    out.push_back((unsigned char)op);
  }
  const float instructionCount = float(out.size()) / 3.0f;
  if (instructionCount != int(instructionCount)) {
    std::cout << "Incomplete instruction" << std::endl;
    out.clear();
    return false;
  }
  if (out.size() > 255) {
    std::cout << "Program too large" << std::endl;
    out.clear();
    return false;
  }
  return true;
}

This function reads plain text files rather than binary files. That makes reading and writing programs a bit easier, but not as easy as if we were to write an assembler that builds programs from human-readable text like a modern assembler. I'll leave implementing that as an exercise for the reader.

To test out the emulator create a new file with the following contents and load it using the LoadProgram function. See if you can decipher what it does before you run it.

3 0 72
3 1 101
3 2 108
3 3 108
3 4 111
3 5 44
3 6 32
3 7 119
3 8 111
3 9 114
3 10 108
3 11 100
3 12 10
3 13 13
18 0 13
3 0 67
3 1 111
3 2 117
3 3 110
3 4 116
3 5 32
3 6 116
3 7 111
3 8 32
3 9 49
3 10 48
3 11 48
3 12 58
3 13 10
3 14 14
18 0 14
3 0 1
3 1 1
3 2 10
3 3 100
3 4 147
17 0 1
18 2 1
4 0 1
14 0 3
16 4 0
1 0 0

Knowing what you know now, I will leave writing an emulator for the CHIP-8 architecture using only the specs for that system as an exercise. By reading only the spec and avoiding looking at the slew of other CHIP-8 emulators that exist, you will know what it is like to write an emulator for a system that has yet to be fully understood by the emulation community.

* Note that the spec is ambiguous in this matter. It says nothing about what value corresponds to what character. A common mistake would be to just assume that the host system character table is the one we should use, but that is not necessarily true. Furthermore, the host system character table might not be the same across all platforms, meaning there will definitely be times where values do not correspond to the correct characters. However, we lack information on the matter, and so we must accept using the host character table for now. This is a perfect example of ambiguities that are faced when emulating actual hardware.