Took Dr. Thain's compilers class in college! It was the best. He's an excellent instructor, and the course project made me build a working C-style compiler step by step. I think the sample project here is pretty much the project we did; highly recommend following through the entire thing!
Dragon book is a weird one. While a classic, it only decently covers basics of the theory of computation, while only doing a brief run through the rest of compilers.
None of modern grad-level compiler topics are decently covered. The dragon is also not a very practical book.
Tiger book is a much better introductory read, balancing between practical and theoretical aspects.
Most truly advanced compiler topics are not really covered by any single book.
Even the preface to the 2006 edition suggest that they think of it as, largely, a graduate level text:
> It takes at least two quarters or even two semesters to cover all or most of the material in this book. It is common to cover the first half in an undergraduate course and the second half of the book -- stressing code optimization -- in a second course at the graduate or mezzanine level.
Sometimes I see people who design languages and build compilers, and I find them truly amazing. I once tried making a language myself because I was curious, but it was so difficult that I just settled for a simple C backend. The people contributing to LLVM probably know everything down to assembly generation.
they're truly incredible.
>The people contributing to LLVM probably know everything down to assembly generation. they're truly incredible.
Not really. I was webdev who then switched into compilers job with LLVM being foundation
LLVM itself is huge, it is not trivial to be familiar with every it's areas/mechanisms, but writing not-complex passes, bug fixing, regression fixing does not require some fancy knowledge
If you don’t mind me asking, why and how did you make the switch? Going from webdev to compilers seems like a strong U turn that’s not easy to pull off, especially because the resources on compilers out there are extremely scarce
I've been working full time for like 3 yrs as C# dev + doing higher edu at weekends at the same time and I was about to decide thesis topic.
I've been searching for something challenging and found some very random post on programming forums about how compilers are hard etc and decided to give it a try.
I had kind of advantage that I accidentally had some significant amount of experience with handwritten parsers (at first job we were doing custom-markdown-like-language renderer as PoC or even when doing apprenticeship in high school I was rolling out csv parser instead of using libs, because... I'm not sure why, I probably didnt know how to use package manager or something)
I started reading about it a lot like dragon book (but it wasnt that useful tbh, too much math heavy)
And after year of jumping into it from time to time I've implemented small, custom-lang to LLVM IR to webassembly (via LLVM) compiler.
Then I had to find new job (we were very poorly paid) and I was interested in semiconductors industry because it was gaining traction (e.g chip war book) and it felt way more engineering oriented unlike web dev. Web dev tech decisions felt for me very religious, like fancy-conferences/blogposts oriented.
And since semiconductor industry often touches compilers, then that was opportunity for: better salary, interesting projects and in future transition to compilers
I've joined semico company as C# dev and then due to project cancellations/lay offs I managed to join compiler team and stress hard during first months since I had to learn new lang, new ecosystem, tools, approaches, techniques (e.g debugging) and only familiar thing was LLVM, which I was very beginner at.
but after that initial shock things were better, but I feel like I still need to improve my knowledge related to modern hardware, modern computer architecture, etc.
Debugging is very, very useful, cross-stack skill :)
Assembly generation is actually pretty simple, it's optimizing everything that's difficult. Writing an assembler is a great way to get acquainted with compiler construction, because you don't need to think about optimization and types and the other features that make high level languages complicated aren't needed.
I actually started my first compiler my allowing (only) inline assembler first, and then starting to wrap higher level constructs around it.
It adds a little bit of complexity (you need to be very clear on how you handle registers) but it worked surprisingly well, and it makes it easy to built up the complexity step by step.
It also meant I could bootstrap the compiler itself with just an assembler.
Sadly I lost the source decades ago.
(Making assembler an integral construct of a higher-level language is also not a unique approach - there's Randall Hyde's High-Level Assembly[1] and others.)
I fell you, My dream is to build my own compiler for WASM to use it later for my own pet projects. But after 9-5 job I'm exhusted and only able to work for max 1 hour.
But I still have hope that I will make it. Maybe coding agent will help me but I dont like the idea that I need help from AI to build this. I want to create compilers through my own hands.
Honestly whipping up a lexer/parser and a REPL is one of my favorite ways to learn a new language. You can cover a lot of ground in a "real" language by just doing the frontend implementation of your own made-up language grammar and a little eval loop and its great for learning/teaching because you don't get bogged down in trying to solve some actual problem.
Which is to say: no shame in just settling for that simple C backend!
Along with TAPL, I like Essentials of Programming Languages. It's an introductory text going through a series of interpreters each adding new language features and covering alternative ways to implement things or accomplish a goal (an example, it covers various methods of parameter passing like pass by reference or value and so on).
Types and Programming Languages (TAPL) by Benjamin C. Pierce. Basically IMO the main differentiator between languages is their type system. It’s basically table stakes now for programming languages to provide generics so users can write their own type-safe containers, and one simply cannot implement this without some theoretical background like TAPL.
In contrast you can easily skim Chapter 7, Semantic Analysis and realize this book gives extremely rudimentary information on type systems. Even if you were to design a dynamically typed language, this book doesn’t cover user-defined structs let alone modern essentials like user-defined sum types or closures.
Programming language design is way weirder than any textbook would make you think.
For example, the first rule of language design is: "Sandwich Helix." https://xkcd.com/3003/
You'll also want to study how human language evolves, e.g. through "vernacularization," which is the laziness that drives people to create short words and phrases for ideas they need to communicate often.
You'll want to learn about the differing purposes of formal and informal constructions, and about why and how meaning itself drifts with time.
You'll want to learn why "DSL" is a nonsense word and what it means for a language to be embedded in more than one domain (e.g. English is both spoken and written).
Even a simpler discipline like API design will be incredibly enlightening. How do you get users to upgrade from an old version of your API to a new version? You need to learn how to guide people towards behaving how you want without any kind of coercion: the only power you have is the power to offer incentives. Can you figure out how to create a stable equilibrium of social behavior out of dreamed-up nonsense? If you remember nothing else remember this: every language design decision that now seems inevitable and set in stone, once seemed completely, pointlessly arbitrary. Survivorship bias is very tricky to reason about! Every attempt to create a language is "creating a 15th standard" (https://xkcd.com/927/), but you have to remember that that doesn't mean that it will fail. Every language that has ever succeeded has passed through this seemingly-impassable gate!
The last thing to know is why it is hard. It's like you're writing a dictionary for a made up language, Elvish, say. It would be a moderate size task to write a dictionary for Elvish with the definitions for your words all in English. What's hard would be writing an Elvish dictionary all in Elvish. If you change the definition of one word that's easy, but then you need to revisit the whole rest of the dictionary to use the new definition idiomatically.
The best part of the blind AI hatred is you can call literally anything AI slop without presenting evidence and the anti-slop loyalists will hate it without any evidence.
I do value both correct high quality AI usage and non-AI works, would be nice if we could have a bar for the AI stuff that makes sense instead of dismissing peoples work blindly.
I guarantee half the folk commenting “ai slop” on people’s projects are folk who never read people’s code even before AI. Now they get to dismiss it without providing any specifics and feel superior.
50% of folks seems a pretty strong signal, though. No?
While I agree that simply calling something "AI slop" is not constructive, it is not my job to voluntarily review LLM-extruded crap. In the past I would provide constructive criticism because there was an actual conversation taking place. The producer had put at least enough thought into it such that my engagement didn't feel like replying to a chatbot, but that's what it feels like now, so unless I see some considerable effort and original thought on the author's part, I am likely to drop a "slop" comment and move on with my day.
https://news.ycombinator.com/threads?id=swordlucky666 - Sort of embarrassing that anyone vouched for this comment. Reading their history, they copy/paste the same comment or use an autogenerator (probably not an LLM, that'd produce better results) to generate comments like this one.
> The discussion on Spanish traders set the standa raises interesting points. In practical applications, the key challenge is balancing performance with maintainability. Would be valuable to see more concrete examples of trade-offs. [emphasis added]
They glob out part of the submission title (and took too much and cut off in the middle of a word) generating a delightful nonsense descriptor (the italicized bit). The title being:
> Spanish traders set the standard for GnuCash database design
> Sort of embarrassing that anyone vouched for this comment
I'm the one who vouched it. I don't check commenters' history before doing it. Maybe I should. My LLM-detector is apparently broken (especially on short posts). At face value I saw nothing wrong with it, so I vouched it.
I often check the history for comments that start off [dead] because I like to see if:
1. There's a problem with the account (and then inform them and maybe notify the mods myself). Sometimes people get shadowbanned without good reason (usually the result of automoderation, not an explicit human moderator action) or for something that happened years ago, but their history is pretty clean since.
2. To see if the person is just a spammer (as is the case here).
Though I only do this if the comment seems like something that oughtn't be dead (as this one does on a first glance). The comment is shallow, but not necessarily wrong or bad. It just adds little to the discussion since it boils down to "The book is an introductory text." which we also get from reading the submission title.
Yeah, I take the same approach, and as much as I'm generally very pro AI use, to the point I wouldn't really mind AI comments if they added value (but it's fine it'll remain against the rules here), it really has become necessary to read the comments history before vouching, sadly, given they all seem to be low quality noise.
>(probably not an LLM, that'd produce better results)
The last 3 years was a paradigm shift. How do you know if a comment was generated by AI? If it's written Better than a human comment... if it's too good.
Introduction to Compilers and Language Design (2021) - https://news.ycombinator.com/item?id=31388741 - May 2022 (68 comments)
Introduction to Compilers and Language Design - https://news.ycombinator.com/item?id=19728749 - April 2019 (1 comment)
> This book offers a one semester introduction [...] enabling the reader to build a simple compiler that accepts a *C-like language*
But they also felt kind of detached from realities of industrial languages and compilers.
These days Crafting Interpreters is probably the best suggestion.
None of modern grad-level compiler topics are decently covered. The dragon is also not a very practical book.
Tiger book is a much better introductory read, balancing between practical and theoretical aspects.
Most truly advanced compiler topics are not really covered by any single book.
Also versions in C and Java.
> It takes at least two quarters or even two semesters to cover all or most of the material in this book. It is common to cover the first half in an undergraduate course and the second half of the book -- stressing code optimization -- in a second course at the graduate or mezzanine level.
Not really. I was webdev who then switched into compilers job with LLVM being foundation
LLVM itself is huge, it is not trivial to be familiar with every it's areas/mechanisms, but writing not-complex passes, bug fixing, regression fixing does not require some fancy knowledge
I've been searching for something challenging and found some very random post on programming forums about how compilers are hard etc and decided to give it a try.
I had kind of advantage that I accidentally had some significant amount of experience with handwritten parsers (at first job we were doing custom-markdown-like-language renderer as PoC or even when doing apprenticeship in high school I was rolling out csv parser instead of using libs, because... I'm not sure why, I probably didnt know how to use package manager or something)
I started reading about it a lot like dragon book (but it wasnt that useful tbh, too much math heavy)
or https://www.cs.cornell.edu/courses/cs6120/2020fa/self-guided...
or playlist like this: https://www.youtube.com/watch?v=wgHIkdUQbp0
And after year of jumping into it from time to time I've implemented small, custom-lang to LLVM IR to webassembly (via LLVM) compiler.
Then I had to find new job (we were very poorly paid) and I was interested in semiconductors industry because it was gaining traction (e.g chip war book) and it felt way more engineering oriented unlike web dev. Web dev tech decisions felt for me very religious, like fancy-conferences/blogposts oriented.
And since semiconductor industry often touches compilers, then that was opportunity for: better salary, interesting projects and in future transition to compilers
I've joined semico company as C# dev and then due to project cancellations/lay offs I managed to join compiler team and stress hard during first months since I had to learn new lang, new ecosystem, tools, approaches, techniques (e.g debugging) and only familiar thing was LLVM, which I was very beginner at.
but after that initial shock things were better, but I feel like I still need to improve my knowledge related to modern hardware, modern computer architecture, etc.
Debugging is very, very useful, cross-stack skill :)
It adds a little bit of complexity (you need to be very clear on how you handle registers) but it worked surprisingly well, and it makes it easy to built up the complexity step by step.
It also meant I could bootstrap the compiler itself with just an assembler.
Sadly I lost the source decades ago.
(Making assembler an integral construct of a higher-level language is also not a unique approach - there's Randall Hyde's High-Level Assembly[1] and others.)
[1] https://en.wikipedia.org/wiki/High_Level_Assembly
But I still have hope that I will make it. Maybe coding agent will help me but I dont like the idea that I need help from AI to build this. I want to create compilers through my own hands.
Anyway, I hope you will make it anyway. :)
Which is to say: no shame in just settling for that simple C backend!
Both compilers and language design are as old as this industry, and have too much knowledge for a single course.
This one is ok, better than most similar courses based on, say, the dragon book.
In contrast you can easily skim Chapter 7, Semantic Analysis and realize this book gives extremely rudimentary information on type systems. Even if you were to design a dynamically typed language, this book doesn’t cover user-defined structs let alone modern essentials like user-defined sum types or closures.
For example, the first rule of language design is: "Sandwich Helix." https://xkcd.com/3003/
You'll also want to study how human language evolves, e.g. through "vernacularization," which is the laziness that drives people to create short words and phrases for ideas they need to communicate often.
You'll want to learn about the differing purposes of formal and informal constructions, and about why and how meaning itself drifts with time.
You'll want to learn why "DSL" is a nonsense word and what it means for a language to be embedded in more than one domain (e.g. English is both spoken and written).
Even a simpler discipline like API design will be incredibly enlightening. How do you get users to upgrade from an old version of your API to a new version? You need to learn how to guide people towards behaving how you want without any kind of coercion: the only power you have is the power to offer incentives. Can you figure out how to create a stable equilibrium of social behavior out of dreamed-up nonsense? If you remember nothing else remember this: every language design decision that now seems inevitable and set in stone, once seemed completely, pointlessly arbitrary. Survivorship bias is very tricky to reason about! Every attempt to create a language is "creating a 15th standard" (https://xkcd.com/927/), but you have to remember that that doesn't mean that it will fail. Every language that has ever succeeded has passed through this seemingly-impassable gate!
I do value both correct high quality AI usage and non-AI works, would be nice if we could have a bar for the AI stuff that makes sense instead of dismissing peoples work blindly.
While I agree that simply calling something "AI slop" is not constructive, it is not my job to voluntarily review LLM-extruded crap. In the past I would provide constructive criticism because there was an actual conversation taking place. The producer had put at least enough thought into it such that my engagement didn't feel like replying to a chatbot, but that's what it feels like now, so unless I see some considerable effort and original thought on the author's part, I am likely to drop a "slop" comment and move on with my day.
How do you guarantee it, exactly? AI told you in an authoritative tone?
Here's a comical one:
https://news.ycombinator.com/item?id=48445529
> The discussion on Spanish traders set the standa raises interesting points. In practical applications, the key challenge is balancing performance with maintainability. Would be valuable to see more concrete examples of trade-offs. [emphasis added]
They glob out part of the submission title (and took too much and cut off in the middle of a word) generating a delightful nonsense descriptor (the italicized bit). The title being:
> Spanish traders set the standard for GnuCash database design
I'm the one who vouched it. I don't check commenters' history before doing it. Maybe I should. My LLM-detector is apparently broken (especially on short posts). At face value I saw nothing wrong with it, so I vouched it.
1. There's a problem with the account (and then inform them and maybe notify the mods myself). Sometimes people get shadowbanned without good reason (usually the result of automoderation, not an explicit human moderator action) or for something that happened years ago, but their history is pretty clean since.
2. To see if the person is just a spammer (as is the case here).
Though I only do this if the comment seems like something that oughtn't be dead (as this one does on a first glance). The comment is shallow, but not necessarily wrong or bad. It just adds little to the discussion since it boils down to "The book is an introductory text." which we also get from reading the submission title.
The last 3 years was a paradigm shift. How do you know if a comment was generated by AI? If it's written Better than a human comment... if it's too good.