{"id":1109,"date":"2019-01-06T21:48:14","date_gmt":"2019-01-07T02:48:14","guid":{"rendered":"https:\/\/stacyprowell.com\/blog\/?p=1109"},"modified":"2020-09-02T17:23:30","modified_gmt":"2020-09-02T21:23:30","slug":"parsing","status":"publish","type":"post","link":"https:\/\/stacyprowell.com\/blog\/2019\/01\/06\/parsing\/","title":{"rendered":"Trivial Recursive-Descent Parsing"},"content":{"rendered":"\n<p>Having gotten the basic REPL working, I needed to begin building the parsing stuff.  For that, I decided on porting my idiotically simple recursive-descent parser library to Rust because that seemed like a thing I might do, and I&#8217;m nothing if not me.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>The original <em>Elision <\/em>parser used <a data-external=\"true\" href=\"https:\/\/www.antlr.org\/\">ANTLR<\/a>.  (I would link to this, but I think it predates the history of Elision in Github, which starts with the handover to ORNL.)  The ANTLR-based parser worked well for a while but the files soon became numerous and very large and bootstrap parsing was taking too long.<\/p>\n\n\n\n<p>Logan Lamb and I each built a parser (a friendly competition fueled by lack of communication) to replace the ANTLR parser.  Logan used a recursive descent parser library (I think it was the Parboiled parser, seen <a href=\"https:\/\/github.com\/elision\/elision\/blob\/cd359940dc8d32758958f62a3fd563a4f1a0215a\/Elision\/src\/ornl\/elision\/parse\/AtomParser.scala\">here<\/a>), and I&#8230; wrote a tiny class in <a data-external=\"true\" href=\"https:\/\/scala-lang.org\">Scala<\/a> that used a two-buffer approach (it would fill one buffer while the other was parsed) and allowed for rapid parsing of left-linear grammars (as seen <a href=\"https:\/\/github.com\/elision\/elision\/blob\/master\/Elision\/src\/ornl\/elision\/parse\/ParseWorker.scala\">here<\/a>).<\/p>\n\n\n\n<p>I then had a different, embedded project that required faster parsing of files in C, so I ported the basics to C99.  You can still find this as <a data-external=\"true\" href=\"https:\/\/github.com\/sprowell\/spsps\">SPSPS<\/a>, along with a JSON parser.  Again, it turned out to be faster than the alternatives.  Good design?  Bad alternatives?  You decide.<\/p>\n\n\n\n<p>The C99 version handles ASCII-encoded files.  I had started to rewrite it to handle UTF-8 encoded files&#8230; but got busy with other things.  Now that I&#8217;m starting on a parser for <g class=\"gr_ gr_155 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace\" id=\"155\" data-gr-id=\"155\"><strong>Relision<\/strong><\/g>, I&#8217;ve decided to start with the little recursive descent structures that just seem to keep working for me.  I&#8217;ve rewritten the parser primitives in Rust and we will see how it goes.  It consists of three structs (<code>Parser<\/code>, <code>Loc<\/code>, and <code>ParserError<\/code>).  The <code>Parser<\/code> provides a set of simple methods to &#8220;peek&#8221; at the character stream and &#8220;consume&#8221; characters from it.  It&#8217;s a bit complicated by the error handling, but not that much!  The Scala code used exceptions.  The C99 code used an error field.  Rust uses a custom <code>Result<\/code>.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"rust\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">use relision::parser;\n\nfn parse_unsigned_integer&lt;R: io::Read>(parser: &amp;mut parser::Parser&lt;R>) -> parser::Result&lt;u64> {\n    let result = parser.take_while(|ch| ch.is_digit(10))?;\n    match result.parse::&lt;u64>() {\n        Ok(number) => Ok(number),\n        Err(err) => Err(parser.error(err.to_string())),\n    }\n}\n<\/pre>\n\n\n\n<p>At this point it seems I can start designing a parser, so that&#8217;s probably a good next step.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Adding a tiny parsing module to Relision<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/stacyprowell.com\/blog\/wp-json\/wp\/v2\/posts\/1109"}],"collection":[{"href":"https:\/\/stacyprowell.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/stacyprowell.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/stacyprowell.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/stacyprowell.com\/blog\/wp-json\/wp\/v2\/comments?post=1109"}],"version-history":[{"count":10,"href":"https:\/\/stacyprowell.com\/blog\/wp-json\/wp\/v2\/posts\/1109\/revisions"}],"predecessor-version":[{"id":1164,"href":"https:\/\/stacyprowell.com\/blog\/wp-json\/wp\/v2\/posts\/1109\/revisions\/1164"}],"wp:attachment":[{"href":"https:\/\/stacyprowell.com\/blog\/wp-json\/wp\/v2\/media?parent=1109"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/stacyprowell.com\/blog\/wp-json\/wp\/v2\/categories?post=1109"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/stacyprowell.com\/blog\/wp-json\/wp\/v2\/tags?post=1109"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}