Skip to content

Adam Chalmers

Tag: #nom

Parsing bitstreams with Nom

Programming languages generally only manipulate bytes (groups of 8 bits). It can be pretty tricky to manipulate single bits. But sometimes you need to -- for example, a DNS header has some 4-bit numbers, and encodes some boolean flags into single bits. So we really need a way to parse binary data without chunking it up into bytes of 8 bits.

Luckily, Nom can do this! In the last blog post, we learned how to parse text files with Nom. The trick is to start with simple parsers that parse a few characters at a time. Then, using combinators, combine those simple parsers into more complex parsers that can deserialize an entire structured file. We can reuse this approach for parsing binary data too. Let's see how!

Parsing Text with Nom

"Parsing" is turning a stream of raw text or binary into some structured data types, i.e. a Rust type that your code can understand and use. This isn't the textbook definition of parsing, but damnit, this is my blog and my opinion. This tutorial is about nom, my favourite Rust parsing library. It uses a parser combinator approach: you start writing tiny parsers that match, say, a single number or a character. These become building blocks for larger parsers, that match, say, a date or a phone number. By combining many small parsers together, you can build a big parser that decodes a file or stream into nice Rust structs and enums. In this tutorial we'll use Nom to parse the input file to an Advent of Code puzzle.