My Fifth Rethink on Execution Tokens

In which I continue my war against associated values

In my previous post, I ended by trying to eliminate the CompiledWord cases that use associated values. These are:

case parsedLiteral(Int)
case undefined(Word)
case redefined(Word)
case definition(Cell)

I’ve got rid of the last of these, but it was a difficult struggle. I did it by introducing a new word .jsr similar to .pushNext. So, .definition(n) becomes .jsr .parsedLiteral(n). Unfortunately, I hit many issues as a result. In particular, it transpired I needed to implement two versions of .jsr, one for normal words and one for immediate words, otherwise, even immediate words got compiled. I am somewhat suspicious of my solution, as a result. It doesn’t feel particularly clean.

Anyway, the code is tagged blog-1176-1. For those interested, the performance hasn’t suffered much

Time of run: 3.510276212s, words executed: 4778203, mips: 1.3612042789298315
["121393"]

The next step was to remove .undefined and .redefined. This turned out to be much simpler than .definition. They were only there to allow me to feed the raw string into the interpreter function as a compiled word. Once all the processing done in the needsWordName state had been factored out into a separate function, they could go. Timings were pretty much unchanged:

Time of run: 3.488767197s, words executed: 4778203, mips: 1.3695964018776572
["121393"]

The code is tagged blog 1176-2.

That just left .parsedLiteral to remove. It was only there to wrap an arbitrary integer as a CompiledWord which is an enumeration. We could remove it if we used a different kind of type as CompiledWord. For example, it could be an alias for Int. Or, it could be encoded as an OptionSet. This seems more interesting in the short term because it opens the possibility for flag easy flag manipulation, for example, the fact that a word is a primitive is already encoded in the top bit. There might be a performance advantage in encoding the immediate and compile only flags there too and also small constants. My plan was

  • Change CompiledWord to an OptionSet type with a rawValue of Int type
  • Use a raw valued enumeration to allocate numbers to the primitives.
  • .parsedLiteral is just any CompiledWord treated as its rawValue.

Having made the changes (and eliminated the token property in favour of rawValue), performance was disappointingly slightly down. The loss seemed mostly due to the elimination of token. I can only guess that token, being read only, allowed extra optimisations where it was used.

Time of run: 3.649850963s, words executed: 4778203, mips: 1.3091501676201458
["121393"]

The code is tagged blog-1176-3.

Changing CompiledWord to an OptionSet opened up a number of possibilities. For example, we can store the compile-only and immediate bits directly in CompiledWord. We can also eliminate .jsr and .jsrImmediate as primitives and use the combination of the primitive bit and the immediate bit to tell us which one to do. The layout of a CompiledWord at this point looks like:

 63 62 61 60                                         0     
+--+--+--+--------------------------------------------+
|  |  |  | Word list index (61 bits)                  |
+--+--+--+--------------------------------------------+
 ^  ^  ^
 |  |  +- compile only
 |  +---- immediate
 +------- primitive

Implementing the flags as shown above improves performance considerably.

Time of run: 2.575161858s, words executed: 4778203, mips: 1.8554961837276482
["121393"]

The code is tagged blog-1176-4

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.