Implementing Forth in Go and C

(eli.thegreenplace.net)

126 points | by Bogdanp 12 hours ago

7 comments

  • AdieuToLogic 45 minutes ago
    Speaking of Forth... Here's a fun bit of trivia.

    PowerPC and Intel Macs used to use Open Firmware[0] to define their firmware (BIOS). Macs with M1[1] architectures and subsequent generations may also, but I cannot attest to this.

    For those Macs which do use Open Firmware[0], a key implementation component of it is FCode[2]:

      FCode is a Forth dialect compliant to ANS Forth, that is 
      available in two different forms: source and bytecode. 
      FCode bytecode is the compiled form of FCode source.
    
    So if anyone asks, "who uses Forth anymore?", a correct answer is - a large percentage of everyday people.

    0 - https://www.openbios.org/Open_Firmware.html

    1 - https://en.wikipedia.org/wiki/Apple_M1

    2 - https://www.openbios.org/Forth_FCode.html

  • s-macke 11 hours ago
    Nice to see another developer who tried to implement Forth in Go (here’s mine: [0]). It’s not easy and usually takes several iterations.

    His implementation, while working for his use cases, is actually quite far from the originals. He uses separate Go structures for words, memory, for-loops, etc. Most of the default Forth implementations of words will fail, since they expect the heap and stack to behave in a certain way. Also, every major word is implemented directly in Go, with no bootstrapping.

    > However, it's insufficient for the hacker level, because the host language interpreter (the one in Go) has all the control, so it's impossible to implement IF...THEN in Forth

    Well, I did it with a ... heap []any ... data structure. Works well enough.

    [0] https://github.com/s-macke/Forthly

    • eliben 11 hours ago
      Thanks for the note. Much of the blog post is dealing with this - how the approach I took for the Go implementation is not sufficient for the hacker level - because the code isn't stored anywhere but remains as text. This isn't a Go issue, it's a result of the specific design decision of how to implement it.

      The second implementation - in C - does it the "right way", with both code and data living in the same memory, and thus allows all the usual Forth shenanigans. Again, this has all to do with the design approach and not the implementation language. Nothing precludes one from writing such an implementation in Go :-)

    • n4r9 5 hours ago
      One of your "Click to run" examples is called "faculty calculation". I'm not a forth expert but it looks like a factorial function. Is "faculty" a replacement for "factorial" in some contexts?
      • chielk 4 hours ago
        German, Dutch, Danish, Norwegian, and Swedish use the same word for faculty (university) and factorial (mathematics), so I'm guessing it's a mistranslation from Fakultät/faculteit/fakultet.
  • bertili 11 hours ago
    Forth can be beautifully and efficiently implemented in portable c++ using the using continuation passing style via the clang musttail attribute. See Tails [1].

    [1] https://github.com/snej/tails

  • anyfoo 1 hour ago
    I must say, implementing Forth in almost any high-level language, even C, can be awkward and almost antithetical. A "high-level" Forth implementation often merely presents a Forth-compatible runtime, which runs the risk of missing what makes Forth so unique and useful. (The author actually seems to understand this: "The first implementation I tried is stubbornly different. Can we just make a pure interpreter?")

    That's because of two main reasons. First, Forth relies heavily on passing control to the next word, instead of calling word implementations as subroutines. This is difficult to do correctly, and dynamically, in C. I suppose you could get by with computed gotos, but...

    Secondly, Forth and assembly (or more generally any lower-level instruction/register view of what it's running on) go hand in hand. Forth words may carry their own assembly implementation within them, or they may be partially assembly.

    Without those two things, you don't really have a Forth system, just a Forth interpreter. And you cannot define new primitive words, or words with very custom compilation semantics, from within Forth itself.

    And therein also lies Forth's strength. It is very easily on-the-fly expandable, and it can even bootstrap itself. It is a great tool for machine-level exploration: Do complex low-level things with a lot less boilerplate.

    The author mentions that this second attempt is a "hacker level" implementation, but short of reading the source code does not seem to go into any detail what that means exactly, and how it's implemented?

  • johnisgood 9 hours ago
    I feel like as if the article itself is not finished at the end:

    > How can you know the arity of the functions without adding explicit comments? Sure, if you have a handful of words like bar and foo you know like the back of your hand, this is easy. But imagine reading a large, unfamiliar code base full of code like this and trying to comprehend it.

    Cool project by the way! Well done.

    • eliben 6 hours ago
      Thanks for the feedback! What do you feel is not finished?
      • johnisgood 5 hours ago
        I think (unless I missed it) it could be mentioned that typically you are supposed to have as small words as possible, with least amount of stack shuffling, and comment code as much as possible (stack effects). That said, I have not had to maintain large Forth projects so I have no clue how well I would fare ^^.

        BTW, have you heard of https://factorcode.org? You might like it!

  • NoToP 10 hours ago
    Go forth and C
  • astrobe_ 7 hours ago
    > Look at that stack wrangling! It's really hard to follow what goes where without the detailed comments showing the stack layout on the right of each instruction (a common practice for Forth programs).

    The proposed definition is not particularly difficult to follow. This a readability argument, and we know it is 90% subjective. OP could be a rookie Lisper and I could be a grey beard Lisper: I don't see a problem here, cause I don't see the parens any more.

    Second, Forth as 2 stacks, and it shouldn't be overlooked:

        : +! dup >R @ + R> ! ;
    
    Of course, this will still be opaque to those here not familiar with Forth, but basically this definition copies the address you need twice to the "other" stack, fetches the memory value, adds, then retrieves the memory address to store the result.

    Third, stack comments are as common as training wheels on bicycles. Ok, a bit more common actually. But you get the idea: you need them when you have not yet learned to feel forces and balance your weight accordingly. This is also a key point which is not visible because it is not syntax nor semantics, but design. Forth programmers are very insistent on simplification and factoring for this reason. Take them seriously. If you see 2+ instances of the same sequence of words, don't worry about execution speed, factor it.

    Moore explained that definitions are like abbreviations except they can take a parameter or two. Follow that. He also isn't joking when he says than messing with more than two parameters on the stack (like the proposed +!) is already too much. The author even agrees, in a way.

    When you do all that, this is when you can make cherries rain on your cake: you can rewrite in machine code the critical definitions, and it is easy to do because they are so short and simple. You can make up for the cost of interpretation and the cost of convenience factoring, if needed.

    > How can you know the arity of the functions without adding explicit comments?

    Oh, the classic example with foo and bar that give zero clue about what they do. Compare with the "+!" proposed by the author himself - any Forth programmer instantly knows it most likely takes 2 arguments and returns nothing. If it doesn't they will argue that it is as terrible name and you should know better.

    This is one of the biggest challenges of Forth: stop FUDing yourself with "what ifs". They won't happen. Ok, sometimes they do happen. But by this time you will have built some confidence, stopped guarding against alien invasions from the next multiverse (which simplifies coding a lot), and you will be able to deal with those unexpected turn of events.

    I won't pretend the problem doesn't exist at all, though. But it is far smaller than this type of argument makes it seem to be. It can be a problem when reading someone else's code who has an unfamiliar coding style, granted. But I have written thousands of lines of Forth and I rarely have this problem.

    Also, a note on the C equivalent: again, it is focused on the syntax, but we know that semantics are where the fun is: do foo or bar have nasty side effects? Doesn't bar return a signed integer when foo expects an unsigned? I would argue that C's more "natural" syntax creates an illusion of understanding which makes it more friendly than it really is. Maybe you've felt it if you reviewed some code once, and then later had to debug or adapt it.