The Long Road

Life is short, and the world is wide

When I started work on rewriting the term rewriter I thought “Here’s a fun project I can work on occasionally.” That turned into almost never, thanks to a number of factors.

  • My kids became fascinated with Minecraft and I had to build complex Redstone contraptions for them, which has been fun!
  • My kids became fascinated with Roblox and I had to build complex Lua scripts for them, which has been fun!
  • I was running 10 (ten!) different projects at work. This was less fun, but actually, I have an awesome job and really, really love being a researcher. ORNL is a wonderful place to work.
  • I decided to start teaching graduate classes for Tennessee Technological University on automated reverse engineering of compiled software because I love teaching.

So the rewriter took a back seat to everything else, and time passed.

In a surprising event, I actually lost data! This will be shocking for folks who know me because I am obsessive about backups, and backups of backups. I didn’t lose much data, but my hosting provider turned out to be… less than good. I have since switched to a new hosting provider. This meant I lost a lot of design work done in the Wiki on the old site. So ist das Leben. Anyway, that further discouraged me and I’ve been slow (over a year, man!) to pick up the thread.

I build a pathological backup system (partially detailed in articles I wrote for Medium) that sits in the corner and quietly backup up all my sites (and the kiddos Minecraft servers!) and then replicates data off-site. Everything is straight replicated, then checked into a git repo for tracking and then the repo is replicated. Spinning disks are cheap. Data isn’t.

What to do with all those spinning drives? Build a NAS, of course.

Oh. We also bought a new house and moved. Yeah, that too. The kiddos all need their own bedrooms.

So basically no real excuse until our rescue dog got cancer. A round of chemo left him in complete remission for six months. Now he’s starting another round of chemo because the cancer is back.

Little S’more, on our couch

So I am slowly starting back on rewriting the term rewriter. We’ll see how it goes.

Picking Apart the Legion PowerShell

Kindred Security does a great job of pulling apart the Legion PowerShell credential stealer on YouTube, but I thought I would do a little more work to break down the PowerShell commands used in all their gory detail.

If you haven’t watched Kindred Security’s video, go do it now. It should be linked above. I’ll wait.

Continue reading “Picking Apart the Legion PowerShell”

Documentation

Documentation is a story about the past. Usually fictional.

A job is not done until you have bragged about it. Let’s call that “documentation.”

I’ve created a wiki to host this documentation, and you can find it here.

One of the magical things about creating user documentation is that you quickly realize what a horrible user experience you have created, and the code starts changing (and improving, one hopes). “Just create a FooFactory instance from the AbstractFooFactoryGenerator, after initializing the configuration system and creating a Context object with your custom Frabulator.” Madness!

In any case, user documentation is now materializing on the wiki, and the code is changing in response. Let’s hope this is a positive development. Feel free to constructively criticize.

Trivial Recursive-Descent Parsing

Adding a tiny parsing module to Relision

Having gotten the basic REPL working, I needed to begin building the parsing stuff. For that, I decided on porting my idiotically simple recursive-descent parser library to Rust because that seemed like a thing I might do, and I’m nothing if not me.

The original Elision parser used ANTLR. (I would link to this, but I think it predates the history of Elision in Github, which starts with the handover to ORNL.) The ANTLR-based parser worked well for a while but the files soon became numerous and very large and bootstrap parsing was taking too long.

Logan Lamb and I each built a parser (a friendly competition fueled by lack of communication) to replace the ANTLR parser. Logan used a recursive descent parser library (I think it was the Parboiled parser, seen here), and I… wrote a tiny class in Scala that used a two-buffer approach (it would fill one buffer while the other was parsed) and allowed for rapid parsing of left-linear grammars (as seen here).

I then had a different, embedded project that required faster parsing of files in C, so I ported the basics to C99. You can still find this as SPSPS, along with a JSON parser. Again, it turned out to be faster than the alternatives. Good design? Bad alternatives? You decide.

The C99 version handles ASCII-encoded files. I had started to rewrite it to handle UTF-8 encoded files… but got busy with other things. Now that I’m starting on a parser for Relision, I’ve decided to start with the little recursive descent structures that just seem to keep working for me. I’ve rewritten the parser primitives in Rust and we will see how it goes. It consists of three structs (Parser, Loc, and ParserError). The Parser provides a set of simple methods to “peek” at the character stream and “consume” characters from it. It’s a bit complicated by the error handling, but not that much! The Scala code used exceptions. The C99 code used an error field. Rust uses a custom Result.

use relision::parser;

fn parse_unsigned_integer<R: io::Read>(parser: &mut parser::Parser<R>) -> parser::Result<u64> {
    let result = parser.take_while(|ch| ch.is_digit(10))?;
    match result.parse::<u64>() {
        Ok(number) => Ok(number),
        Err(err) => Err(parser.error(err.to_string())),
    }
}

At this point it seems I can start designing a parser, so that’s probably a good next step.

The Relision REPL

Putting the REPL before the horse

I’ve made a few starts at a rewrite of Elision focusing on implementing the terms, but I have been much too busy to make much progress. One of the issues is that as the terms are implemented I need to write a large number of tests. That’s okay – testing is good – but I end up with a lot of code that gets thrown away at some point.

Instead, I decided to write the read, evaluate, print loop (REPL) first this time, followed by iterative implementation of the parser and terms. This means that I can write the tests in the rewriter’s language and just modify the plumbing as I make changes. It also means I can “prime” the system by writing the bootstrapping library as I go.

So that’s the plan. The REPL is now done, complete with configuration, history, command line processing, etc

So now it is on to implementing the terms.

Relision

Rewriting a term rewriter library… in Rust!

I’m the primary author of the Elision term rewriter library, which I donated to Oak Ridge National Laboratory a few years back.  Elision was a core part of the Hyperion static analysis tool, and that tool has subsequently been licensed to a private company, Lenvio (now being renamed as “Affirm Logic”), to grow and improve.  Elision is written in Scala and fits well with the Hyperion system, which is written in a mix of Java and Python.

There are many things I like about Elision, and many things I don’t.  In particular there was a notion of “metavariables” that I really disliked, and some unusually cryptic notation that really just needs to go.  Finally, the choice of Scala had some consequences (such as running on the JVM) which made some of the things we hoped to do (like running on Titan or Summit) hard.

Relision is not a rewrite of Elision, but a new term rewriting library, being written (this time) in Rust.  I considered writing it in C++, but decided that the guarantees that Rust provides, combined with the fact that Rust has become (reasonably) mature, make it the right language to use.

The emphasis in Relision is going to be on performance.  Elision had quite good performance, but I think with native code and concurrency we can do better.

Anyway, that’s my goal and we will see how far I manage to get.