In which we interpret Forth using Forth and a bit of cheating
With an implementation of refill
, we are in the position of implementing the interpreter loop in Forth itself. In the previous post, we defined the interpreter loop like this:
attempt to read a line into the input buffer
while last read was successful
while there are words in the input buffer
*read a word* and find its execution token
if we are interpreting
execute the word
else
compile the word
attempt to read a line into the input buffer
At first, we are going to cheat. Everything inside the outer loop is going to be as it is now and we will create a primitive that will call a suitably modified version of interpret(buffer:)
, which, for now, I’ll call interpetCurrentBuffer()
. Over time, more of the code in the new function will be replaced with pure Forth. What we need to implement in Forth is
refill
while stack top != 0
interpret current buffer
refill
exit interpreter
I put in an explicit line to “exit the interpreter” because I’m not quite sure how that is going to work at this point. If we just run off the end of the word list, a stack underflow exception will be thrown. We need to avoid that – or process it differently for the return stack.
Another point is that the Forth code in this loop can use only primitives because the non primitives that are loaded at initialisation require the interpreter to be working in order to define them. In particular, none of the high level control structures can be used: we can’t use begin
… while
… repeat
for example. With the control flow implemented just with branches, our loop becomes:
branch doRefill
doInterpret:
interpret current buffer
doRefill:
refill
0<>branch? doInterpret
exit interpreter
This is equivalent to the previous version. Note that it uses the trick of moving the loop condition to the end of the loop and jumping to it straight away. This saves one refill
instruction and one branch for each iteration of the loop at the expense of one extra branch for the first time through.
All we need to do is hand code that into Forth and create the necessary primitives. The code for this is tracked in the branch feature/interpreter-in-forth.
Some time passes...
It turns out that there was a problem with my plan. It turns out that interpreting the current buffer involves a call to the Forth interpreter, which would mean a recursive call, if we just use the current code. The compiling side is fine, it’s just interpreting that is a problem.
The solution, I think, is to implement the inner loop which iterates for each word on the current line in Forth and have the function that performs the “interpret or compile” bit manufacture a call to the interpreted word. At a high level, the inner loop looks like this (assuming that “interpret or compile” leaves the word on the stack and a control word to tell Forth what happened):
parse a word
while word parsed successfully
interpret or compile word
if interpreting
if execution token
execute
else
do nothing (would be a number and it's already on the stack)
parse a word
Thus, we need a primitive to parse a word that leaves a flag on the stack to say “parsed successfully”, and we need a primitive to “interpret or compile” that leaves a flag on the stack that determines what happened and, below it, the execution token or literal that was found (if interpreting). We’ll implement the code in Swift first before re-implementing in Forth.
Phase 1
I extracted the code in interpretCurrentBuffer()
that looks up a word and either executes or compiles it into interpretOrCompile(word:)
. This returns an enumeration so that the caller knows if the word was compiled or if something needs executing. The something that needs executing is still placed in wordLists[0]
as it was before. The code for this stage is tagged blog-1872-1.
Phase 2
Instead of an array to be executed or compiled, lookup(word:)
returns an enumeration case to differentiate between literals and words. Literals are just placed on the stack instead of executing [.pushNext, value]
. wordLists[0]
is only filled in in the case when interpretation is needed. The code for this phase is tagged blog-1872-2.
Phase 3
We no longer construct a small program to be executed in wordLists[0]
when we want to execute a word from the input source. We now put its execution token on the data stack and call execute
. wordLists[0] no longer needs to change for every word to be interpreted. Instead, it’s hard coded to just call execute
.
Phase 4
We start chiselling away at the bits implemented in Swift, reimplementing them in wordLists[0]
, starting with interpretOrCompile(word:)
not returning a result but putting a flag on the stack. wordLists[0]
only calls execute
if the flag is true. This code is tagged blog/1849/4.
Phase 5
We create a primitive called .interpretOrCompile
that calls the Swift function from phase 4. The interpreter code is modified to use it and the Swift interpreter just calls the Forth interpreter code. The Forth interpreter code is a hand compiled version of
.interpretOrCompile 2 0=branch execute
This calls interpretOrCompile(word:)
(it assumes Swift has parsed a word) and then, if that results in a word that does not need executing (top of the data stack is false), it branches forward over the execute
.
The Swift interpreter loop now looks like
private func interpretCurrentBuffer() throws
{
var shouldExit = false
while !shouldExit
{
if let word = inputParser.next()
{
currentWord = word
assert(wordNumber == 0, "Interpreting new word with non zero word number")
wordIndex = 0
try resume()
}
else if !returnStackIsEmpty
{
try doExit()
try resume()
}
else
{
shouldExit = true
}
}
}
This code is tagged blog/1872/5.
Phase 6
In phase 6, more of the interpreter is written in Forth. In particular, the code to read a word from the input is now in Forth and the the Swift loop has been replaced by a Forth loop. I had hoped to use the Forth word word
to read each word, but its behaviour was problematic in a couple of ways. It’s not designed to handle all white space as the same delimiter and it doesn’t handle reaching the end of a line in quite the right way for my interpreter and it has a problem with comments. The last two issues are bugs, but I couldn’t be bothered to fix it for the purposes of the current exercise. Instead I created a new Forth word that is used only for interpreting input.
This is the Forth code for the interpretation loop:
6 branch
.interpretOrCompile 2 0=branch? execute
.wordForInterpreter 1 + dup 1 - c@
dup
-17 0<>branch?
drop drop
This code can’t use any non primitive words which is why the loop is implemented as a couple of branches (one conditional). Also, the five words following .wordForInterpreter
are my implementation of the Forth word count
which is a non primitive.
The Swift interpreter loop has gone and the code now looks like this (tag blog/1872/6):
private func interpretCurrentBuffer() throws
{
assert(wordNumber == 0, "Cannot interpret a buffer if not in interpreter")
wordIndex = 0
try resume()
while !returnStackIsEmpty
{
// We are at the end of the current buffer but evaluating
// We need to go back into the end evaluate routine
try doExit()
try resume()
}
}
Even this is more complex than it needs to be. The while
loop is only there because resume()
always exits when it reaches the end of word list 0, even when it’s not at the bottom of the stack. Changing the loop condition in resume()
from
while wordNumber != 0 || wordIndex < wordLists[wordNumber].endIndex
to
while !returnStackIsEmpty || (wordNumber == 0 && wordIndex < wordLists[0].endIndex)
allows us to do away with the loop in interpretCurrentBuffer()
. In fact, the function now just sets wordIndex
to zero and runs word 0, so we can do that inline in the caller. It loses about five MIPS, but is simpler code (tag blog/1872/7).
At this point, the loop that interprets a line of Forth is now fully implemented in Forth. This has simplified our Swift code significantly. In particular, evaluate
is much simplified and improved. The next stage is to implement the “outer” loop that fetches each line of input in Forth. But that is for another blog.
One thought on “Forth interpreting Itself”