Trivial Recursive-Descent Parsing

Adding a tiny parsing module to Relision

Having gotten the basic REPL working, I needed to begin building the parsing stuff. For that, I decided on porting my idiotically simple recursive-descent parser library to Rust because that seemed like a thing I might do, and I’m nothing if not me.

The original Elision parser used ANTLR. (I would link to this, but I think it predates the history of Elision in Github, which starts with the handover to ORNL.) The ANTLR-based parser worked well for a while but the files soon became numerous and very large and bootstrap parsing was taking too long.

Logan Lamb and I each built a parser (a friendly competition fueled by lack of communication) to replace the ANTLR parser. Logan used a recursive descent parser library (I think it was the Parboiled parser, seen here), and I… wrote a tiny class in Scala that used a two-buffer approach (it would fill one buffer while the other was parsed) and allowed for rapid parsing of left-linear grammars (as seen here).

I then had a different, embedded project that required faster parsing of files in C, so I ported the basics to C99. You can still find this as SPSPS, along with a JSON parser. Again, it turned out to be faster than the alternatives. Good design? Bad alternatives? You decide.

The C99 version handles ASCII-encoded files. I had started to rewrite it to handle UTF-8 encoded files… but got busy with other things. Now that I’m starting on a parser for Relision, I’ve decided to start with the little recursive descent structures that just seem to keep working for me. I’ve rewritten the parser primitives in Rust and we will see how it goes. It consists of three structs (Parser, Loc, and ParserError). The Parser provides a set of simple methods to “peek” at the character stream and “consume” characters from it. It’s a bit complicated by the error handling, but not that much! The Scala code used exceptions. The C99 code used an error field. Rust uses a custom Result.

use relision::parser;

fn parse_unsigned_integer<R: io::Read>(parser: &mut parser::Parser<R>) -> parser::Result<u64> {
    let result = parser.take_while(|ch| ch.is_digit(10))?;
    match result.parse::<u64>() {
        Ok(number) => Ok(number),
        Err(err) => Err(parser.error(err.to_string())),
    }
}

At this point it seems I can start designing a parser, so that’s probably a good next step.

Leave a Reply

Your email address will not be published. Required fields are marked *