The Decoding Table

Let us talk about instruction decoding. What is instruction decoding? It is the process of taking opcodes and operands and doing what they tell you. Instruction decoding is the heart of a processor emulation and in mine takes up the vast majority of lines of code.

In his blog, Andre Weissflog characterises two strategies for instruction decoding:

  • the giant switch
  • the algorithmic decoder

The giant switch is a huge switch statement with a case for every opcode and all the native code for a particular operation is in the case for its opcode. As long as the compiler has an efficient way to render a very large switch as a machine code, this is probably the fastest way to implement an emulator. On the 6502, the switch would have exactly 256 cases (some of which are for undocumented effects). On the Z80, it would be substantially more due to prefixed instructions. Andre’s first emulator for the Z80 used this model, but he created a Python program to generate the giant switch on the fly.

The algorithmic decoder relies on the fact that opcodes generally have a logical structure and takes them apart to find things like address modes, operation, etc. There is still a switch, in fact, multiple switches to perform the operation and fetch operands etc, but the switches are much smaller and can be easily hand coded.

I take a different approach which is a variation on the algorithmic decoder:

  • The decoding table

With the algorithmic decoder, you have to do a load of shifts and masks to get the various bits out, but for any one opcode these shifts and masks always produce the same result. For example, most of the eight bit loads split into three parts:

01 ppp qqq

where ppp and qqq represent the destination and source register respectively. The algorthmic decoder has to mask all but the top two bits to find out it is a load, then mask and shift the p group and then mask the q group to get the two registers1. Why not create a table containing the outcome of all the shifts and masks? Then all we have to do is look up the opcode in the table (the decoder table). In fact, once we have the table, it doesn’t matter where in the opcode the fields are and we can add other information like the number of t-states needed. In fact, we don’t actually care what the fields in the opcode are, we can describe a much more abstract structure based on the semantics of the opcode.

Most Z80 instructions have a source operand and a destination operand, one of which might be implicit. For example, all the 8-bit arithmetic opcodes have a source explicitly defined and a destination which is implicitly the a register. Our decoder table defines the source for each opcode, the destination for each opcode (always the a register) and the operation e.g. addd8, sub8 etc. Here are the table entries for some of the 8-bit adds.

OpcodeInfo(opcode: 0x80, cycles: 4, functionGroup: .addA8, source: .b, dest: .a),
OpcodeInfo(opcode: 0x81, cycles: 4, functionGroup: .addA8, source: .c, dest: .a),
OpcodeInfo(opcode: 0x82, cycles: 4, functionGroup: .addA8, source: .d, dest: .a),
OpcodeInfo(opcode: 0x83, cycles: 4, functionGroup: .addA8, source: .e, dest: .a),
OpcodeInfo(opcode: 0x84, cycles: 4, functionGroup: .addA8, source: .h, dest: .a),
OpcodeInfo(opcode: 0x85, cycles: 4, functionGroup: .addA8, source: .l, dest: .a),
OpcodeInfo(opcode: 0x86, cycles: 7, functionGroup: .addA8, source: .addressInHL, dest: .a),
OpcodeInfo(opcode: 0x87, cycles: 4, functionGroup: .addA8, source: .a, dest: .a),

OpcodeInfo is a struct containing all of the decoding information for one instruction.  Most of the fields have defaults, for many of the fields to make the code cleaner. Foe example, some instructions require a displacement byte. There is a boolean field in the opcode info called “dispacement” which defaults to false (and hence is not shown above). When it’s true, the decoder will get the displacement byte. For example, add a,(ix+d) has this entry:

OpcodeInfo(opcode: 0x86, cycles: 15, functionGroup: .addA8, displacement: true, source: .addressInIX_d, dest: .a),

Every entry in the table starts with the opcode itself. This is included as a check that the entry is in the right place. When the Z80 loads the table, it runs through every entry checking that the opcode matches its index in the table.

My execute loop looks something like this:

let opcodeInfo = decoderTable[instruction]
if opcodeInfo.displacment
    // fetch the displacement byte
switch opcodeInfo.source
    // Fetch the source operand
switch opcodeInfo.functionGrooup
    // Perform the operation
switch opcodeInfo.dest
    // Save the destination
cycleCount += opcodeInfo.cycles

I’m almost certain that this approach results in a slower machine than either of the other two approaches but it does have some advantages.

  • Easy to add new instructions. Once the first add instruction is done, all the others are just new entries in the table with the same function group.
  • You can have entries in the table that are not opcodes. For example, you could make entry number 256 a pseudo instruction that does a reset.
  • It feels more satisfying. It almost feels like microprogramming a proper processor.

The decoding table also makes the prefix instructions almost trivially easy to implement: they just temporarily switch the decoding table to a new different decoding table. The opcode info above for add a,(ix+d) is actually not in the normal decoding table but in a completely separate decoding table that is switched in for one fetch-execute iteration when the opcode is 0xdd.

This last point leads us to the other major disadvantage as compared to the algorithmic decoder, the size. There is a separate decoder table for the standard opcodes and each prefix i.e. for: no prefix, dd, ed, fd, cb. Also two extra tables for dd, cb and fd, cb i.e. a dd prefix followed by a cb prefix or an fd prefix followed by a cb prefix. The source file for these tables is 87kb which is bigger, on its own, than all of  Andre Weissflog’s decoding source put together. He doesn’t comment as much as I do though! The transistor count of the original Z80 was around 8,000 for the whole CPU. I’m pretty sure my compiled tables must come in at well over 8,000 bytes.

1The registers are numbered 0 = b, 1 = c, 2 = d, 3 = e, 4 = h, 5 = l, 7 = a. 6 should be f – the flags – but the opcode is repurposed for hl indirect mode.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.