Biome of Ideas

Understanding Pratt Parsing

Table of Contents

Introduction

I've forgotten how I came across Pratt parsing specifically. I had been working on an interpreter for a programming language based on the one loosely described in Greg Michaelson's An Introduction to Functional Programming Through Lambda Calculus. I had managed to implement simple arithmetic, and even extended the basic lambda calculus spec with assignment expressions (a feat which I was very proud of.)

However, some parts of my implementation felt a bit hacky (for example, how I had implemented letrec), and my implementation of lazy evaluation, while mostly complete, ultimately turned out to be buggy.

At first, I decided to rewrite the project from scratch. One major guiding factor was to narrow the scope of the project, borrowing some advice from Zed Shaw. I got around to writing a new tokenizer, and then I was on to writing the parser. My initial parser used the recursive descent technique, taking advantage of the simplified lambda calculus grammar used in Michaelson's text (for example, parentheses are always used for application terms there.)

This time though, I wanted to try something different. And so, rummaging through the internets, I stumbled across Pratt parsing.

"It's like a burrito"

Understanding Pratt parsing ended up being much harder than I expected. I ended up searching through a bunch of examples online:

  1. Alex Kladov's Rust-based tutorial.
  2. Eli Bendersky's Python-based tutorial.
  3. Bob Nystrom's intro to the subject, as well as the relevant chapter in his Crafting Interpreters.
  4. Vaughan Pratt's original paper. Shout-out to the legend who put this up as a GitHub Pages site!
  5. Douglas Crockford's celebrated article on the subject deserves honorable mention, though my JavaScript is currently rusty and so I didn't look into it in any depth.

Alex Kladov in his post calls Pratt parsing the "monad tutorial of syntactic analysis". As I had recently become familiar with the concept of "monad" in a philosophical sense, I aksed ChatGPT where the reference comes from: it turns out that it's an inside joke involving Haskell monads. I'm not an expert, but my readings have given me enough of an inkling to see the connection: Pratt parsing isn't a discrete "thing" with exactly one shape: it's more of a technique, if you will—a design pattern—which can assume various manifestations.

A good example of this is iteration, because we all recognize it when we see it, even when we don't know the language—but no one syntactic construction sufficiently defines it. For-loops, while-loops, Python generator functions, and optimized tail-recursive functions all count as iteration.

After struggling for some time, I finally managed to distill the essence of the algorithm, which I present here as pseudocode:

parse(level):
    t ← next(stream)
    acc ← nud_dispatch(t)

    while level < precedence(peek(stream)):
        t ← next(stream)
        acc ← led_dispatch(t, acc)

    return acc

Unwrapping what this does exactly is a surprisingly nuanced task, precisely because the algorithm is more about technique than structure. Because of this, I won't pretend to be up to the task here. Nevertheless, a brief synopsis is warranted.

There is a global stream of tokens, such that next(stream) consumes and returns the next token, and peek(stream) returns the next token without consuming it. The function nud_dispatch interprets t as a null denotation, or "nud" for short, and initializes acc. The function led_dispatch interprets t as a left denotation, or "led" for short. It accumulates a value into acc using the existing value of acc (hence the choice of name.) Both dispatch functions call parse recursively in all but the most trivial cases.

When peeking the stream reveals a token with higher precedence than the current level, the while loop exits and acc is returned.

The algorithm is initialized by calling parse(0).

Down To Brass Tacks

My approach was to take Eli Bendersky's full source code at the bottom of his post, and start chiseling away at it. What I ended up with was the same simple arithmetic calculator, only with a different architecture: I moved away from Eli's object-oriented approach towards something closer to the formulation given in the previous section.

In the end, I was amazed at how simple and robust the actual implementation turned out to be! I feel that what I came up with (at this stage, anyway) is arguably simpler than even many of the examples I initially came across: for example, it isn't necessary to add space between precedence levels (10, 20, etc.), since you can use an enum to take care of any ordering needed. Also, using even and odd precedence levels (for handling right associativity) is unnecessary. For example, say you have precedence levels MULTIPLICATION = 2 and EXPONENTIATION = 3. The algorithm cleverly avoids clashing EXPONENTIATION-1 with MULTIPLICATION when enforcing right associativity for exponentiation. I found this to be one of the more remarkable aspects of the algorithm.

Wanting More

To be fair, my calculator app technically doesn't parse arithmetic expressions: it evaluates them wholesale. This is OK: instead of accumulating an AST, I'm accumulating an arithmetic result.

Because of how compelling the calculator app turned out to be, I decided to stop work on the lambda calculus project, and instead work on expanding the calculator into a full-blown programming language, albeit a simple one. I've already made progress in this direction: in addition to arithmetic (including trig functions!), the application currently supports variable assignment.

Ideally, I'd like something with

I'll see how many of these I manage.