# A Tour of Safe Tracing GC Designs in Rust

Posted by Manish Goregaokar on April 05, 2021 in programming, rust

I’ve been thinking about garbage collection in Rust for a long time, ever since I started working on Servo’s JS layer. I’ve designed a GC library, worked on GC integration ideas for Rust itself, worked on Servo’s JS GC integration, and helped out with a couple other GC projects in Rust.

As a result, I tend to get pulled into GC discussions fairly often. I enjoy talking about GCs – don’t get me wrong – but I often end up going over the same stuff. Being lazy I’d much prefer to be able to refer people to a single place where they can get up to speed on the general space of GC design, after which it’s possible to have more in depth discussions about the specific tradeoffs necessary.

I’ll note that some of the GCs in this post are experiments or unmaintained. The goal of this post is to showcase these as examples of design, not necessarily general-purpose crates you may wish to use, though some of them are usable crates as well.

### A note on terminology

A thing that often muddles discussions about GCs is that according to some definition of “GC”, simple reference counting is a GC. Typically the definition of GC used in academia broadly refers to any kind of automatic memory management. However, most programmers familiar with the term “GC” will usually liken it to “what Java, Go, Haskell, and C# do”, which can be unambiguously referred to as tracing garbage collection.

Tracing garbage collection is the kind which keeps track of which heap objects are directly reachable (“roots”), figures out the whole set of reachable heap objects (“tracing”, also, “marking”), and then cleans them up (“sweeping”).

Throughout this blog post I will use the term “GC” to refer to tracing garbage collection/collectors unless otherwise stated1.

## Why write GCs for Rust?

(If you already want to write a GC in Rust and are reading this post to get ideas for how, you can skip this section. You already know why someone would want to write a GC for Rust)

Every time this topic is brought up someone will inevitably go “I thought the point of Rust was to avoid GCs” or “GCs will ruin Rust” or something. As a general rule it’s good to not give too much weight to the comments section, but I think it’s useful to explain why someone may wish for GC-like semantics in Rust.

There are really two distinct kinds of use cases. Firstly, sometimes you need to manage memory with cycles and Rc<T> is inadequate for the job since Rc-cycles get leaked. petgraph or an arena are often acceptable solutions for this kind of pattern, but not always, especially if your data is super heterogeneous. This kind of thing crops up often when dealing with concurrent datastructures; for example crossbeam has an epoch-based memory management system which, while not a full tracing GC, has a lot of characteristics in common with GCs.

For this use case it’s rarely necessary to design a custom GC, you can look for a reusable crate like gc 2.

The second case is far more interesting in my experience, and since it cannot be solved by off-the-shelf solutions tends to crop up more often: integration with (or implementation of) programming languages that do use a garbage collector. Servo needs to do this for integrating with the Spidermonkey JS engine and luster needed to do this for implementing the GC of its Lua VM. boa, a pure Rust JS runtime, uses the gc crate to back its garbage collector.

Sometimes when integrating with a GCd language you can get away with not needing to implement a full garbage collector: JNI does this; while C++ does not have native garbage collection, JNI gets around this by simply “rooting” (we’ll cover what that means in a bit) anything that crosses over to the C++ side3. This is often fine!

The downside of this is that every interaction with objects managed by the GC has to go through an API call; you can’t “embed” efficient Rust/C++ objects in the GC with ease. For example, in browsers most DOM types (e.g. Element) are implemented in native code; and need to be able to contain references to other native GC’d types (it should be possible to inspect the children of a Node without needing to call back into the JavaScript engine).

So sometimes you need to be able to integrate with a GC from a runtime; or even implement your own GC if you are writing a runtime that needs one. In both of these cases you typically want to be able to safely manipulate GC’d objects from Rust code, and even directly put Rust types on the GC heap.

## Why are GCs in Rust hard?

In one word: Rooting. In a garbage collector, the objects “directly” in use on the stack are the “roots”, and you need to be able to identify them. Here, when I say “directly”, I mean “accessible without having to go through other GC’d objects”, so putting an object inside a Vec<T> does not make it stop being a root, but putting it inside some other GC’d object does.

Unfortunately, Rust doesn’t really have a concept of “directly on the stack”:

struct Foo {
bar: Option<Gc<Bar>>
}
// this is a root
let bar = Gc::new(Bar::new());
// this is also a root
let foo = Gc::new(Foo::new());
// bar should no longer be a root (but we can't detect that!)
foo.bar = Some(bar);
// but foo should still be a root here since it's not inside
// another GC'd object
let v = vec![foo];


Rust’s ownership system actually makes it easier to have fewer roots since it’s relatively easy to state that taking &T of a GC’d object doesn’t need to create a new root, and let Rust’s ownership system sort it out, but being able to distinguish between “directly owned” and “indirectly owned” is super tricky.

Another aspect of this is that garbage collection is really a moment of global mutation – the garbage collector reads through the heap and then deletes some of the objects there. This is a moment of the rug being pulled out under your feet. Rust’s entire design is predicated on such rug-pulling being very very bad and not to be allowed, so this can be a bit problematic. This isn’t as bad as it may initially sound because after all the rug-pulling is mostly just cleaning up unreachable objects, but it does crop up a couple times when fitting things together, especially around destructors and finalizers4. Rooting would be far easier if, for example, you were able to declare areas of code where “no GC can happen”5 so you can tightly scope the rug-pulling and have to worry less about roots.

### Destructors and finalizers

It’s worth calling out destructors in particular. A huge problem with custom destructors on GCd types is that the custom destructor totally can stash itself away into a long-lived reference during garbage collection, leading to a dangling reference:

struct LongLived {
dangle: RefCell<Option<Gc<CantKillMe>>>
}

struct CantKillMe {
// set up to point to itself during construction
self_ref: RefCell<Option<Gc<CantKillMe>>>
long_lived: Gc<LongLived>
}

impl Drop for CantKillMe {
fn drop(&mut self) {
// attach self to long_lived
*self.long_lived.dangle.borrow_mut() = Some(self.self_ref.borrow().clone().unwrap());
}
}

let long = Gc::new(LongLived::new());
{
let cant = Gc::new(CantKillMe::new());
*cant.self_ref.borrow_mut() = Some(cant.clone());
// cant goes out of scope, CantKillMe::drop is run
// cant is attached to long_lived.dangle but still cleaned up
}

// Dangling reference!
let dangling = long.dangle.borrow().unwrap();


The most common solution here is to disallow destructors on types that use #[derive(Trace)], which can be done by having the custom derive generate a Drop implementation, or have it generate something which causes a conflicting type error.

You can additionally provide a Finalize trait that has different semantics: the GC calls it while cleaning up GC objects, but it may be called multiple times or not at all. This kind of thing is typical in GCs outside of Rust as well.

## How would you even garbage collect without a runtime?

In most garbage collected languages, there’s a runtime that controls all execution, knows about every variable in the program, and is able to pause execution to run the GC whenever it likes.

Rust has a minimal runtime and can’t do anything like this, especially not in a pluggable way your library can hook in to. For thread local GCs you basically have to write it such that GC operations (things like mutating a GC field; basically some subset of the APIs exposed by your GC library) are the only things that may trigger the garbage collector.

Concurrent GCs can trigger the GC on a separate thread but will typically need to pause other threads whenever these threads attempt to perform a GC operation that could potentially be invalidated by the running garbage collector.

While this may restrict the flexibility of the garbage collector itself, this is actually pretty good for us from the side of API design: the garbage collection phase can only happen in certain well-known moments of the code, which means we only need to make things safe across those boundaries. Many of the designs we shall look at build off of this observation.

## Commonalities

Before getting into the actual examples of GC design, I want to point out some commonalities of design between all of them, especially around how they do tracing:

### Tracing

“Tracing” is the operation of traversing the graph of GC objects, starting from your roots and perusing their children, and their children’s children, and so on.

In Rust, the easiest way to implement this is via a custom derive:

// unsafe to implement by hand since you can get it wrong
unsafe trait Trace {
fn trace(&mut self, gc_context: &mut GcContext);
}

#[derive(Trace)]
struct Foo {
vec: Vec<Gc<Bar>>,
extra_thing: Gc<Baz>,
just_a_string: String
}


The custom derive of Trace basically just calls trace() on all the fields. Vec’s Trace implementation will be written to call trace() on all of its fields, and String’s Trace implementation will do nothing. Gc<T> will likely have a trace() that marks its reachability in the GcContext, or something similar.

This is a pretty standard pattern, and while the specifics of the Trace trait will typically vary, the general idea is roughly the same.

I’m not going to get into the actual details of how mark-and-sweep algorithms work in this post; there are a lot of potential designs for them and they’re not that interesting from the point of view of designing a safe GC API in Rust. However, the general idea is to keep a queue of found objects initially populated by the root, trace them to find new objects and queue them up if they’ve not already been traced. Clean up any objects that were not found.

### Immutable-by-default

Another commonality between these designs is that a Gc<T> is always potentially shared, and thus will need tight control over mutability to satisfy Rust’s ownership invariants. This is typically achieved by using interior mutability, much like how Rc<T> is almost always paired with RefCell<T> for mutation, however some approaches (like that in josephine) do allow for mutability without runtime checking.

Some GCs are single-threaded, and some are multi-threaded. The single threaded ones typically have a Gc<T> type that is not Send, so while you can set up multiple graphs of GC types on different threads, they’re essentially independent. Garbage collection only affects the thread it is being performed for, all other threads can continue unhindered.

Multithreaded GCs will have a Send Gc<T> type. Garbage collection will typically, but not always, block any thread which attempts to access data managed by the GC during that time. In some languages there are “stop the world” garbage collectors which block all threads at “safepoints” inserted by the compiler; Rust does not have the capability to insert such safepoints and blocking threads on GCs is done at the library level.

Most of the examples below are single-threaded, but their API design is not hard to extend towards a hypothetical multithreaded GC.

## rust-gc

The gc crate is one I wrote with Nika Layzell mostly as a fun exercise, to figure out if a safe GC API is possible. I’ve written about the design in depth before, but the essence of the design is that it does something similar to reference counting to keep track of roots, and forces all GC mutations go through special GcCell types so that they can update the root count. Basically, a “root count” is updated whenever something becomes a root or stops being a root:

struct Foo {
bar: GcCell<Option<Gc<Bar>>>
}
// this is a root (root count = 1)
let bar = Gc::new(Bar::new());
// this is also a root (root count = 1)
let foo = Gc::new(Foo::new());
// .borrow_mut()'s RAII guard unroots bar (sets its root count to 0)
*foo.bar.borrow_mut() = Some(bar);
// foo is still a root here, no call to .set()
let v = vec![foo];

// at destrucion time, foo's root count is set to 0


The actual garbage collection phase will occur when certain GC operations are performed at a time when the heap is considered to have gotten reasonably large according to some heuristics.

While this is essentially “free” on reads, this is a fair amount of reference count traffic on any kind of write, which might not be desired; often the goal of using GCs is to avoid the performance characteristics of reference-counting-like patterns. Ultimately this is a hybrid approach that’s a mix of tracing and reference counting6.

gc is useful as a general-purpose GC if you just want a couple of things to participate in cycles without having to think about it too much. The general design can apply to a specialized GC integrating with another language runtime since it provides a clear way to keep track of roots; but it may not necessarily have the desired performance characteristics.

## Servo’s DOM integration

Servo is a browser engine in Rust that I used to work on full time. As mentioned earlier, browser engines typically implement a lot of their DOM types in native (i.e. Rust or C++, not JS) code, so for example Node is a pure Rust object, and it contains direct references to its children so Rust code can do things like traverse the tree without having to go back and forth between JS and Rust.

Servo’s model is a little weird: roots are a different type, and lints enforce that unrooted heap references are never placed on the stack:

#[dom_struct] // this is #[derive(JSTraceable)] plus some markers for lints
pub struct Node {
// the parent type, for inheritance
eventtarget: EventTarget,
// in the actual code this is a different helper type that combines
// the RefCell, Option, and Dom, but i've simplified it to use
// stdlib types for this example
prev_sibling: RefCell<Option<Dom<Node>>>,
next_sibling: RefCell<Option<Dom<Node>>>,
// ...
}

impl Node {
fn frob_next_sibling(&self) {
// fields can be accessed as borrows without any rooting
if let Some(next) = self.next_sibling.borrow().as_ref() {
next.frob();
}
}

fn get_next_sibling(&self) -> Option<DomRoot<Node>> {
// but you need to root things for them to escape the borrow
// .root() turns Dom<T> into DomRoot<T>
self.next_sibling.borrow().as_ref().map(|x| x.root())
}

fn illegal(&self) {
// this line of code would get linted by a custom lint called unrooted_must_root
// (which works somewhat similarly to the must_use stuff that Rust does)
let ohno: Dom<Node> = self.next_sibling.borrow_mut().take();
}
}


Dom<T> is basically a smart pointer that behaves like &T but without a lifetime, whereas DomRoot<T> has the additional behavior of rooting on creation (and unrooting on Drop). The custom lint plugin essentially enforces that Dom<T>, and any DOM structs (tagged with #[dom_struct]) are never accessible on the stack aside from through DomRoot<T> or &T.

I wouldn’t recommend this approach; it works okay but we’ve wanted to move off of it for a while because it relies on custom plugin lints for soundness. But it’s worth mentioning for completeness.

## Josephine (Servo’s experimental GC plans)

Given that Servo’s existing GC solution depends on plugging in to the compiler to do additional static analysis, we wanted something better. So Alan designed Josephine (“JS affine”), which uses Rust’s affine types and borrowing in a cleaner way to provide a safe GC system.

Josephine is explicitly designed for Servo’s use case and as such does a lot of neat things around “compartments” and such that are probably irrelevant unless you specifically wish for your GC to integrate with a JS engine.

I mentioned earlier that the fact that the garbage collection phase can only happen in certain well-known moments of the code actually can make things easier for GC design, and Josephine is an example of this.

Josephine has a “JS context”, which is to be passed around everywhere and essentially represents the GC itself. When doing operations which may trigger a GC, you have to borrow the context mutably, whereas when accessing heap objects you need to borrow the context immutably. You can root heap objects to remove this requirement:

// cx is a JSContext, node is a JSManaged<'a, C, Node>
// assuming next_sibling and prev_sibling are not Options for simplicity

// borrows cx for 'b
let next_sibling: &'b Node = node.next_sibling.borrow(cx);
println!("Name: {:?}", next_sibling.name);
// illegal, because cx is immutably borrowed by next_sibling
// node.prev_sibling.borrow_mut(cx).frob();

// read from next_sibling to ensure it lives this long
println!("{:?}", next_sibling.name);

let ref mut root = cx.new_root();
// no longer needs to borrow cx, borrows root for 'root instead
let next_sibling: JSManaged<'root, C, Node> = node.next_sibling.in_root(root);
// now it's fine, no outstanding borrows of cx
node.prev_sibling.borrow_mut(cx).frob();

// read from next_sibling to ensure it lives this long
println!("{:?}", next_sibling.name);


new_root() creates a new root, and in_root ties the lifetime of a JS managed type to the root instead of to the JSContext borrow, releasing the borrow of the JSContext and allowing it to be borrowed mutably in future .borrow_mut() calls.

Note that .borrow() and .borrow_mut() here do not have runtime borrow-checking cost despite their similarities to RefCell::borrow(), they instead are doing some lifetime juggling to make things safe. Creating roots typically does have runtime cost. Sometimes you may need to use RefCell<T> for the same reason it’s used in Rc, but mostly only for non-GCd fields.

Custom types are typically defined in two parts as so:

#[derive(Copy, Clone, Debug, Eq, PartialEq, JSTraceable, JSLifetime, JSCompartmental)]
pub struct Element<'a, C> (pub JSManaged<'a, C, NativeElement<'a, C>>);

pub struct NativeElement<'a, C> {
name: JSString<'a, C>,
parent: Option<Element<'a, C>>,
children: Vec<Element<'a, C>>,
}


where Element<'a> is a convenient copyable reference that is to be used inside other GC types, and NativeElement<'a> is its backing storage. The C parameter has to do with compartments and can be ignored for now.

A neat thing worth pointing out is that there’s no runtime borrow checking necessary for manipulating other GC references, even though roots let you hold multiple references to the same object!

let parent_root = cx.new_root();
let parent = element.borrow(cx).parent.in_root(parent_root);
let ref mut child_root = cx.new_root();

// could potentially be a second reference to element if it was
// the first child
let first_child = parent.children[0].in_root(child_root);

// this is okay, even though we hold a reference to parent
// via element.parent, because we have rooted that reference so it's
// now independent of whether element.parent changes!
first_child.borrow_mut(cx).parent = None;


Essentially, when mutating a field, you have to obtain mutable access to the context, so there will not be any references to the field itself still around (e.g. element.borrow(cx).parent), only to the GC’d data within it, so you can change what a field references without invalidating other references to the contents of what the field references. This is a pretty cool trick that enables GC without runtime-checked interior mutability, which is relatively rare in such designs.

## Unfinished design for a builtin Rust GC

For a while a couple of us worked on a way to make Rust itself extensible with a pluggable GC, using LLVM stack map support for finding roots. After all, if we know which types are GC-ish, we can include metadata on how to find roots for each function, similar to how Rust functions currently contain unwinding hooks to enable cleanly running destructors during a panic.

We never got around to figuring out a complete design, but you can find more information on what we figured out in my and Felix’s posts on this subject. Essentially, it involved a Trace trait with more generic trace methods, an auto-implemented Root trait that works similar to Send, and compiler machinery to keep track of which Root types are on the stack.

This is probably not too useful for people attempting to implement a GC, but I’m mentioning it for completeness’ sake.

Note that pre-1.0 Rust did have a builtin GC (@T, known as “managed pointers”), but IIRC in practice the cycle-management parts were not ever implemented so it behaved exactly like Rc<T>. I believe it was intended to have a cycle collector (I’ll talk more about that in the next section).

## bacon-rajan-cc (and cycle collectors in general)

Nick Fitzgerald wrote bacon-rajan-cc to implement _“Concurrent Cycle Collection in Reference Counted Systems”__ by David F. Bacon and V.T. Rajan.

This is what is colloquially called a cycle collector; a kind of garbage collector which is essentially “what if we took Rc<T> but made it detect cycles”. Some people do not consider these to be tracing garbage collectors, but they have a lot of similar characteristics (and they do still “trace” through types). They’re often categorized as “hybrid” approaches, much like gc.

The idea is that you don’t actually need to know what the roots are if you’re maintaining reference counts: if a heap object has a reference count that is more than the number of heap objects referencing it, it must be a root. In practice it’s pretty inefficient to traverse the entire heap, so optimizations are applied, often by applying different “colors” to nodes, and by only looking at the set of objects that have recently have their reference counts decremented.

A crucial observation here is that if you only focus on potential garbage, you can shift your definition of “root” a bit, when looking for cycles you don’t need to look for references from the stack, you can be satisfied with references from any part of the heap you know for a fact is reachable from things which are not potential garbage.

A neat property of cycle collectors is while mark and sweep tracing GCs have their performance scale by the size of the heap as a whole, cycle collectors scale by the size of the actual garbage you have 7. There are of course other tradeoffs: deallocation is often cheaper or “free” in tracing GCs (amortizing those costs by doing it during the sweep phase) whereas cycle collectors have the constant allocator traffic involved in cleaning up objects when refcounts reach zero.

The way bacon-rajan-cc works is that every time a reference count is decremented, the object is added to a list of “potential cycle roots”, unless the reference count is decremented to 0 (in which case the object is immediately cleaned up, just like Rc). It then traces through this list; decrementing refcounts for every reference it follows, and cleaning up any elements that reach refcount 0. It then traverses this list again and reincrements refcounts for each reference it follows, to restore the original refcount. This basically treats any element not reachable from this “potential cycle root” list as “not garbage”, and doesn’t bother to visit it.

Cycle collectors require tighter control over the garbage collection algorithm, and have differing performance characteristics, so they may not necessarily be suitable for all use cases for GC integration in Rust, but it’s definitely worth considering!

## cell-gc

Jason Orendorff’s cell-gc crate is interesting, it has a concept of “heap sessions”. Here’s a modified example from the readme:

use cell_gc::Heap;

// implements IntoHeap, and also generates an IntListRef type and accessors
#[derive(cell_gc_derive::IntoHeap)]
struct IntList<'h> {
tail: Option<IntListRef<'h>>
}

fn main() {
// Create a heap (you'll only do this once in your whole program)
let mut heap = Heap::new();

heap.enter(|hs| {
// Allocate an object (returns an IntListRef)
let obj1 = hs.alloc(IntList { head: 17, tail: None });
assert_eq!(obj1.tail(), None);

// Allocate another object
let obj2 = hs.alloc(IntList { head: 33, tail: Some(obj1) });

// mutate tail
obj2.set_tail(None);
});
}


All mutation goes through autogenerated accessors, so the crate has a little more control over traffic through the GC. These accessors help track roots via a scheme similar to what gc does; where there’s an IntoHeap trait used for modifying root refcounts when a reference is put into and taken out of the heap via accessors.

Heap sessions allow for the heap to moved around, even sent to other threads, and their lifetime prevents heap objects from being mixed between sessions. This uses a concept called generativity; you can read more about generativity in “You Can’t Spell Trust Without Rust” ch 6.3, by Aria Beingessner, or by looking at the indexing crate.

## Interlude: The similarities between async and GCs

The next two examples use machinery from Rust’s async functionality despite having nothing to do with async I/O, and I think it’s important to talk about why that should make sense. I’ve tweeted about this before: I and Catherine West figured this out when we were talking about her GC idea based on async.

You can see some of this correspondence in Go: Go is a language that has both garbage collection and async I/O, and both of these use the same “safepoints” for yielding to the garbage collector or the scheduler. In Go, the compiler needs to automatically insert code that checks the “pulse” of the heap every now and then, and potentially runs garbage collection. It also needs to automatically insert code that can tell the scheduler “hey now is a safe time to interrupt me if a different goroutine wishes to run”. These are very similar in principle – they’re both essentially places where the compiler is inserting “it is okay to interrupt me now” checks, sometimes called “interruption points” or “yield points”.

Now, Rust’s compiler does not automatically insert interruption points. However, the design of async in Rust is essentially a way of adding explicit interruption points to Rust. foo().await in Rust is a way of running foo() and expecting that the scheduler may interrupt the code in between. The design of Future and Pin<P> come out of making this safe and pleasant to work with.

As we shall see, this same machinery can be used for creating safe interruption points for GCs in Rust.

## Shifgrethor

shifgrethor is an experiment by Saoirse to try and build a GC that uses Pin<P> for managing roots. They’ve written extensively on the design of shifgrethor on their blog. In particular, the post on rooting goes through how rooting works.

The basic design is that there’s a Root<'root> type that contains a Pin<P>, which can be immovably tied to a stack frame using the same idea behind pin-utilspin_mut!() macro:

letroot!(root);
let gc: Gc<'root, Foo> = root.gc(Foo::new());


The fact that root is immovable allows for it to be treated as a true marker for the stack frame over anything else. The list of rooted types can be neatly stored in an ordered stack-like vector in the GC implementation, popping when individual roots go out of scope.

If you wish to return a rooted object from a function, the function needs to accept a Root<'root>:

fn new<'root>(root: Root<'root>) -> Gc<'root, Self> {
root.gc(Self {
// ...
}
}


All GC’d types have a 'root lifetime of the root they trace back to, and are declared with a custom derive:

#[derive(GC)]
struct Foo<'root> {
#[gc] bar: GcStore<'root, Bar>,
}


GcStore is a way to have fields use the rooting of their parent. Normally, if you wanted to put Gc<'root2, Bar<'root2>> inside Foo<'root1> you would not be able to because the lifetimes derive from different roots. GcStore, along with autogenerated accessors from #[derive(GC)], will set Bar’s lifetime to be the same as Foo when you attempt to stick it inside Foo.

This design is somewhat similar to that of Servo where there’s a pair of types, one that lets us refer to GC types on the stack, and one that lets GC types refer to each other on the heap, but it uses Pin<P> instead of a lint to enforce this safely, which is way nicer. Root<'root> and GcStore do a bunch of lifetime tweaking that’s reminiscent of Josephine’s rooting system, however there’s no need for an &mut JsContext type that needs to be passed around everywhere.

## gc-arena

gc-arena is Catherine West’s experimental GC design for her Lua VM, luster.

The gc-arena crate forces all GC-manipulating code to go within arena.mutate() calls, between which garbage collection may occur.

#[derive(Collect)]
#[collect(no_drop)]
struct TestRoot<'gc> {
number: Gc<'gc, i32>,
many_numbers: GcCell<Vec<Gc<'gc, i32>>>,
}

make_arena!(TestArena, TestRoot);

let mut arena = TestArena::new(ArenaParameters::default(), |mc| TestRoot {
number: Gc::allocate(mc, 42),
many_numbers: GcCell::allocate(mc, Vec::new()),
});

arena.mutate(|_mc, root| {
assert_eq!(*((*root).number), 42);
root.numbers.write(mc).push(Gc::allocate(mc, 0));
});


Mutation is done with GcCell, basically a fancier version of Gc<RefCell<T>>. All GC operations require a MutationContext (mc), which is only available within arena.mutate().

Only the arena root may survive between mutate() calls, and garbage collection does not happen during .mutate(), so rooting is easy – just follow the arena root. This crate allows for multiple GCs to coexist with separate heaps, and, similarly to cell-gc, it uses generativity to enforce that the heaps do not get mixed.

So far this is mostly like other arena-based systems, but with a GC.

The really cool part of the design is the gc-sequence crate, which essentially builds a Future-like API (using a Sequence trait) on top of gc-arena that can potentially make this very pleasant to use. Here’s a modified example from a test:

#[derive(Collect)]
#[collect(no_drop)]
struct TestRoot<'gc> {
test: Gc<'gc, i32>,
}

make_sequencable_arena!(test_sequencer, TestRoot);
use test_sequencer::Arena as TestArena;

let arena = TestArena::new(ArenaParameters::default(), |mc| TestRoot {
test: Gc::allocate(mc, 42),
});

let mut sequence = arena.sequence(|root| {
sequence::from_fn_with(root.test, |_, test| {
if *test == 42 {
Ok(*test + 10)
} else {
Err("will not be generated")
}
})
.and_then(|_, r| Ok(r + 12))
.and_chain(|_, r| Ok(sequence::ok(r - 10)))
.then(|_, res| res.expect("should not be error"))
.chain(|_, r| sequence::done(r + 10))
.map(|r| sequence::done(r - 60))
.flatten()
.boxed()
});

loop {
match sequence.step() {
Ok((_, output)) => {
assert_eq!(output, 4);
return;
}
Err(s) => sequence = s,
}
}


This is very similar to chained callback futures code; and if it could use the Future trait would be able to make use of async to convert this callback heavy code into sequential code with interrupt points using await. There were design constraints making Future not workable for this use case, though if Rust ever gets generators this would work well, and it’s quite possible that another GC with a similar design could be written, using async/await and Future.

Essentially, this paints a picture of an entire space of Rust GC design where GC mutations are performed using await (or yield if we ever get generators), and garbage collection can occur during those yield points, in a way that’s highly reminiscent of Go’s design.

## Moving forward

As is hopefully obvious, the space of safe GC design in Rust is quite rich and has a lot of interesting ideas. I’m really excited to see what folks come up with here!

If you’re interested in reading more about GCs in general, “A Unified Theory of Garbage Collection” by Bacon et al and the GC Handbook are great reads.

Thanks to Andi McClure, Jason Orendorff, Nick Fitzgerald, and Nika Layzell for providing feedback on drafts of this blog post

1. I’m also going to completely ignore the field of conservative stack-scanning tracing GCs where you figure out your roots by looking at all the stack memory and considering anything with a remotely heap-object-like bit pattern to be a root. These are interesting, but can’t really be made 100% safe in the way Rust wants them to be unless you scan the heap as well.

2. Which currently does not have support for concurrent garbage collection, but it could be added.

3. Some JNI-using APIs are also forced to have explicit rooting APIs to give access to things like raw buffers.

4. In general, finalizers in GCs are hard to implement soundly in any language, not just Rust, but Rust can sometimes be a bit more annoying about it.

5. Spolier: This is actually possible in Rust, and we’ll get into it further in this post!

6. Such hybrid approaches are common in high performance GCs; “A Unified Theory of Garbage Collection” by Bacon et al. covers a lot of the breadth of these approaches.

7. Firefox’s DOM actually uses a mark & sweep tracing GC mixed with a cycle collector for this reason. The DOM types themselves are cycle collected, but JavaScript objects are managed by the Spidermonkey GC. Since some DOM types may contain references to arbitrary JS types (e.g. ones that store callbacks) there’s a fair amount of work required to break cycles manually in some cases, but it has performance benefits since the vast majority of DOM objects either never become garbage or become garbage by having a couple non-cycle-participating references get released.

# Arenas in Rust

Posted by Manish Goregaokar on March 15, 2021 in programming, rust, tidbits

There’s been some discussion about arenas in Rust recently, and I thought I’d write about them.

Arenas aren’t something you would typically reach for in Rust so fewer people know about them; you only really see them in applications for various niche use cases. Usually you can use an arena by pulling in a crate and not using additional unsafe, so there’s no need to be particularly skittish around them in Rust, and it seems like it would be useful knowledge, especially for people coming to Rust from fields where arenas are more common.

Furthermore, there’s a set of really cool lifetime effects involved when implementing self-referential arenas, that I don’t think have been written about before.

I’m mostly writing this to talk about the cool lifetime effects, but I figured it’s worth writing a general introduction that has something for all Rustaceans. If you know what arenas are and just want the cool lifetimes you can skip directly to the section on implementing self-referential arenas. Otherwise, read on.

## What’s an arena?

An arena is essentially a way to group up allocations that are expected to have the same lifetime. Sometimes you need to allocate a bunch of objects for the lifetime of an event, after which they can all be thrown away wholesale. It’s inefficient to call into the system allocator each time, and far more preferable to preallocate a bunch of memory for your objects, cleaning it all up at once once you’re done with them.

Broadly speaking, there are two reasons you might wish to use an arena:

Firstly, your primary goal may be to reduce allocation pressure, as mentioned above. For example, in a game or application, there may be large mishmash of per-frame-tick objects that need to get allocated each frame, and then thrown away. This is extremely common in game development in particular, and allocator pressure is something gamedevs tend to care about. With arenas, it’s easy enough to allocate an arena, fill it up during each frame and clear it out once the frame is over. This has additional benefits of cache locality: you can ensure that most of the per-frame objects (which are likely used more often than other objects) are usually in cache during the frame, since they’ve been allocated adjacently.

Another goal might be that you want to write self referential data, like a complex graph with cycles, that can get cleaned up all at once. For example, when writing compilers, type information will likely need to reference other types and other such data, leading to a complex, potentially cyclic graph of types. Once you’ve computed a type you probably don’t need to throw it away individually, so you can use an arena to store all your computed type information, cleaning the whole thing up at once when you’re at a stage where the types don’t matter anymore. Using this pattern allows your code to not have to worry about whether the self-referential bits get deallocated “early”, it lets you make the assumption that if you have a Ty it lives as long as all the other Tys and can reference them directly.

These two goals are not necessarily disjoint: You may wish to use an arena to achieve both goals simultaneously. But you can also just have an arena that disallows self referential types (but has other nice properties). Later in this post I’m going to implement an arena that allows self-referential types but is not great on allocation pressure, mostly for ease of implementation. Typically if you’re writing an arena for self-referential types you can make it simultaneously reduce allocator pressure, but there can be tradeoffs.

## How can I use an arena in Rust?

Typically to use an arena you can just pull in a crate that implements the right kind of arena. There are two that I know of that I’ll talk about below, though a cursory search of “arena” on crates.io turns up many other promising candidates.

I’ll note that if you just need cyclic graph structures, you don’t have to use an arena, the excellent petgraph crate is often sufficient. slotmap is also useful; it’s a map-like datastructure useful for self-referential data, based on generational indexing.

### Bumpalo

Bumpalo is a fast “bump allocator”, which allows heterogenous contents, and only allows cycles if you do not care about destructors getting run.

use bumpalo::Bump;

// (example slightly modified from bumpalo docs)

// Create a new arena to bump allocate into.
let bump = Bump::new();

// Allocate values into the arena.
let scooter = bump.alloc(Doggo {
cuteness: u64::max_value(),
age: 8,
scritches_required: true,
});

// Happy birthday, Scooter!
scooter.age += 1;


Every call to Bump::alloc() returns a mutable reference to the allocated object. You can allocate different objects, and they can even reference each other1. By default it does not call destructors on its contents; however you can use bumpalo::boxed (or custom allocators on Nightly) to get this behavior. You can similarly use bumpalo::collections to get bumpalo-backed vectors and strings. bumpalo::boxed will not be allowed to participate in cycles.

### typed-arena

typed-arena is an arena allocator that can only store objects of a single type, but it does allow for setting up cyclic references:

// Example from typed-arena docs

use std::cell::Cell;
use typed_arena::Arena;

struct CycleParticipant<'a> {
other: Cell<Option<&'a CycleParticipant<'a>>>,
}

let arena = Arena::new();

let a = arena.alloc(CycleParticipant { other: Cell::new(None) });
let b = arena.alloc(CycleParticipant { other: Cell::new(None) });

// mutate them after the fact to set up a cycle
a.other.set(Some(b));
b.other.set(Some(a));


Unlike bumpalo, typed-arena will always run destructors on its contents when the arena itself goes out of scope2.

## Implementing a self-referential arena

Self referential arenas are interesting because, typically, Rust is very very wary of self-referential data. But arenas let you clearly separate the step of “I don’t care about this object” and “this object can be deleted” in a way that is sufficient to allow self-referential and cyclic types.

It’s pretty rare to need to implement your own arena – bumpalo and typed-arena cover most of the use cases, and if they don’t cover yours you probably can find something that does on crates.io. But if you really need to, or if you’re interested in the nitty-gritty lifetime details, this section is for you.

For people less familiar with lifetimes: the lifetimes in the syntaxes &'a Foo and Foo<'b> mean different things. 'a in &'a Foo is the lifetime of Foo, or, at least the lifetime of this reference to Foo. 'b in Foo<'b> is a lifetime parameter of Foo, and typically means something like “the lifetime of data Foo is allowed to reference”.

The key to implementing an arena Arena with entries typed as Entry is in the following rules:

• Arena and Entry should both have a lifetime parameter: Arena<'arena> and Entry<'arena>
• Arena methods should all receive Arena<'arena> as &'arena self, i.e. their self type is &'arena Arena<'arena>
• Entry should almost always be passed around as &'arena Entry<'arena> (it’s useful to make an alias for this)
• Use interior mutability; &mut self on Arena will make everything stop compiling. If using unsafe for mutability, make sure you have a PhantomData for RefCell<Entry<'arena>> somewhere.

That’s basically it from the lifetime side, the rest is all in figuring what API you want and implementing the backing storage. Armed with the above rules you should be able to make your custom arena work with the guarantees you need without having to understand what’s going on with the underlying lifetimes.

Let’s go through an implementation example, and then dissect why it works.

### Implementation

My crate elsa implements an arena in 100% safe code in one of its examples. This arena does not save on allocations since elsa::FrozenVec requires its contents be behind some indirection, and it’s not generic, but it’s a reasonable way to illustrate how the lifetimes work without getting into the weeds of implementing a really good arena with unsafe.

The example implements an arena of Person<'arena> types, Arena<'arena>. The goal is to implement some kind of directed social graph, which may have cycles.

use elsa::FrozenVec;

struct Arena<'arena> {
people: FrozenVec<Box<Person<'arena>>>,
}


elsa::FrozenVec is an append-only Vec-like abstraction that allows you to call .push() without needing a mutable reference, and is how we’ll be able to implement this arena in safe code.

Each Person<'arena> has a list of people they follow but also keeps track of people who follow them:

struct Person<'arena> {
pub follows: FrozenVec<PersonRef<'arena>>,
pub reverse_follows: FrozenVec<PersonRef<'arena>>,
pub name: &'static str,
}

// following the rule above about references to entry types
type PersonRef<'arena> = &'arena Person<'arena>;


The lifetime 'arena is essentially “the lifetime of the arena itself”. This is where it starts getting weird: typically if your type has a lifetime parameter, the caller gets to pick what goes in there. You don’t get to just say “this is the lifetime of the object itself”, the caller would typically be able to instantiate an Arena<'static> if they wish, or an Arena<'a> for some 'a. But here we’re declaring that 'arena is the lifetime of the arena itself; clearly something fishy is happening here.

Here’s where we actually implement the arena:

impl<'arena> Arena<'arena> {
fn new() -> Arena<'arena> {
Arena {
people: FrozenVec::new(),
}
}

fn add_person(&'arena self, name: &'static str,
follows: Vec<PersonRef<'arena>>) -> PersonRef<'arena> {
let idx = self.people.len();
self.people.push(Box::new(Person {
name,
follows: follows.into(),
reverse_follows: Default::default(),
}));
let me = &self.people[idx];
for friend in &me.follows {
// We're mutating existing arena entries to add references,
// potentially creating cycles!
friend.reverse_follows.push(me)
}
me
}

fn dump(&'arena self) {
// code to print out every Person, their followers, and the people who follow them
}
}


Note the &'arena self in add_person.

A good implementation here would typically separate out code handling the higher level invariant of “if A follows B then B reverse_follows A”, but this is just an example.

And finally, we can use the arena like this:

fn main() {
let arena = Arena::new();
let best_friend = arena.add_person("best friend", vec![lonely]);
let threes_a_crowd = arena.add_person("threes a crowd", vec![lonely, best_friend]);
let _everyone = arena.add_person("follows everyone", vec![rando, threes_a_crowd, lonely, best_friend]);
arena.dump();
}


In this case all of the “mutability” happens in the implementation of the arena itself, but it would be possible for this code to add entries directly to the follows/reverse_follows lists, or Person could have RefCells for other kinds of links, or whatever.

So how does this work? As I said earlier, with such abstractions in Rust, the caller typically has freedom to set the lifetime based on what they do with it. For example, if you have a HashMap<K, &'a str>, the 'a will get set based on the lifetime of what you try to insert.

When you construct the Arena its lifetime parameter is indeed still unconstrained, and we can test this by checking that the following code, which forcibly constrains the lifetime, still compiles.

let arena: Arena<'static> = Arena::new();


But the moment you try to do anything with the arena, this stops working:

let arena: Arena<'static> = Arena::new();

error[E0597]: arena does not live long enough
--> examples/mutable_arena.rs:5:18
|
4  |     let arena: Arena<'static> = Arena::new();
|                -------------- type annotation requires that arena is borrowed for 'static
5  |     let lonely = arena.add_person("lonely", vec![]);
|                  ^^^^^ borrowed value does not live long enough
...
11 | }
| - arena dropped here while still borrowed


The add_person method is somehow suddenly forcing the 'arena parameter of Arena to be set to its own lifetime, constraining it (and making it impossible to force-constrain it to be anything else with type annotations).

What’s going on here is a neat interaction with the &'arena self signature of add_person (i.e. self is &'arena Arena<'self>), and the fact that 'arena in Arena<'arena> is an invariant lifetime.

Usually in your Rust programs, lifetimes are a little bit stretchy-squeezy. The following code compiles just fine:

// ask for two strings *with the same lifetime*
fn take_strings<'a>(x: &'a str, y: &'a str) {}

// string literal with lifetime 'static
let lives_forever = "foo";
// owned string with shorter, local lifetime
let short_lived = String::from("bar");

// still works!
take_strings(lives_forever, &*short_lived);


In this code, Rust is happy to notice that while lives_forever and &*short_lived have different lifetimes, it’s totally acceptable to pretend lives_forever has a shorter lifetime for the duration of the take_strings function. It’s just a reference, a reference valid for a long lifetime is also valid for a shorter lifetime.

The thing is, this stretchy-squeeziness is not the same for all lifetimes! The nomicon chapter on subtyping and variance goes into detail on why this is the case, but a general rule of thumb is that most lifetimes are “squeezy”3 like the one in &'a str above, but if some form of mutability is involved, they are rigid, also known as “invariant”. You can also have “stretchy”4 lifetimes if you’re using function types, but they’re rare.

Our Arena<'arena> is using interior mutability (via the FrozenVec) in a way that makes 'arena invariant.

Let’s look at our two lines of code again. When the compiler sees the first line of the code below, it constructs arena, whose lifetime we’ll call 'a. At this point the type of arena is Arena<'?>, where '? is made up notation for a yet-unconstrained lifetime.

let arena = Arena::new();


Let’s actually rewrite this to be clearer on what the lifetimes are.

let arena = Arena::new(); // type Arena<'?>, lives for 'a

// explicitly write the self that gets constructed when you call add_person
let ref_to_arena = &arena; // type &'a Arena<'?>
let lonely = Arena::add_person(ref_to_arena, "lonely", vec![]);



Remember the second rule I listed earlier?

Arena methods should all receive Arena<'arena> as &'arena self, i.e. their self type is &'arena Arena<'arena>

We followed this rule; the signature of add_person is fn add_person(&'arena self). This means that ref_to_arena is forced to have a lifetime that matches the pattern &'arena Arena<'arena>. Currently its lifetime is &'a Arena<'?>, which means that '? is forced to be the same as 'a, i.e. the lifetime of the arena variable itself. If the lifetime weren’t invariant, the compiler would be able to squeeze other lifetimes to fit, but it is invariant, and the unconstrained lifetime is forced to be exactly one lifetime.

And by this rather subtle sleight of hand we’re able to force the compiler to set the lifetime parameter of Arena<'arena> to the lifetime of its instance.

After this, the rest is pretty straightforward. Arena<'arena> holds entries of type Person<'arena>, which is basically a way of saying “a Person that is allowed to reference items of lifetime 'arena, i.e. items in Arena”. type PersonRef<'arena> = &'arena Person<'arena> is a convenient shorthand for “a reference to a Person that lives in Arena and is allowed to reference objects from it”.

So a thing I’ve not covered so far is how this can be safe in the presence of destructors. If your arena is allowed to have cyclic references, and you write a destructor reading from those cyclic references, whichever participant in the cycle that is deleted later on will have dangling references.

This gets to a really obscure part of Rust, even more obscure than variance. You almost never need to really understand this, beyond “explicit destructors subtly change borrow check behavior”. But it’s useful to know to get a better mental model of what’s going on here.

If we add the following code to our arena example:

impl<'arena> Drop for Person<'arena> {
fn drop(&mut self) {
println!("goodbye {:?}", self.name);
for friend in &self.reverse_follows {
// potentially dangling!
println!("\t\t{}", friend.name);
}
}
}


we actually get this error:

error[E0597]: arena does not live long enough
--> examples/mutable_arena.rs:5:18
|
5  |     let lonely = arena.add_person("lonely", vec![]);
|                  ^^^^^ borrowed value does not live long enough
...
11 | }
| -
| |
| arena dropped here while still borrowed
| borrow might be used here, when arena is dropped and runs the destructor for type Arena<'_>


The presence of destructors subtly changes the behavior of the borrow checker around self-referential lifetimes. The exact rules are tricky and explained in the nomicon, but essentially what happened was that the existence of a custom destructor on Person<'arena> made 'arena in Person (and thus Arena) a lifetime which is “observed during destruction”. This is then taken into account during borrow checking – suddenly the implicit drop() at the end of the scope is known to be able to read 'arena data, and Rust makes the appropriate conclusion that drop() will be able to read things after they’ve been cleaned up, since destruction is itself a mutable operation, and drop() is run interspersed in it.

Of course, a reasonable question to ask is how we can store things like Box and FrozenVec in this arena if destructors aren’t allowed to “wrap” types with 'arena. The reason is that Rust knows that Drop on Box cannot inspect person.follows because Box does not even know what Person is, and has promised to never try and find out. This wouldn’t necessarily be true if we had a random generic type since the destructor can call trait methods (or specialized blanket methods) which do know how to read the contents of Person, but in such a case the subtly changed borrow checker rules would kick in again. The stdlib types and other custom datastructures achieve this with an escape hatch, #[may_dangle] (also known as “the eyepatch”5), which allows you to pinky swear that you won’t be reading from a lifetime or generic parameter in a custom destructor.

This applies to crates like typed-arena as well; if you are creating cycles you will not be able to write custom destructors on the types you put in the arena. You can write custom destructors with typed-arena as long as you refrain from mutating things in ways that can create cycles; so you will not be able to use interior mutability to have one arena entry point to another.

Thanks to Mark Cohen and Nika Layzell for reviewing drafts of this post.

1. But not in a cyclic way; the borrow checker will enforce this!

2. You may wonder how it is safe for destructors to be safely run on cyclic references – after all, the destructor of whichever entry gets destroyed second will be able to read a dangling reference. We’ll cover this later in the post but it has to do with drop check, and specifically that if you attempt to set up cycles, the only explicit destructors allowed on the arena entries themselves will be ones on appropriately marked types.

3. The technical term for this is “covariant lifetime”

4. The technical term for this is “contravariant lifetime”

5. Because you’re claiming the destructor “can’t see” the type or lifetime, see?

# Integrating Rust and C++ in Firefox

Posted by Manish Goregaokar on February 22, 2021 in c++, programming, rust

This post was originally drafted in August 2018, but I never got around to finishing it. As such, parts of its framing (e.g. the focus on bindgen) are outdated, given the relative infancy of the interop space at the time. I was recently told that the post is still useful in this form so I decided to finish and publish it anyway, while attempting to mark outdated things as such when I notice them. Everything after the allocators section was written near the time of publication.

In 2017 I worked on the Stylo project, uplifting Servo’s CSS engine (“style system”) into Firefox’s browser engine (“Gecko”). This involved a lot of gnarly FFI between Servo’s Rust codebase and Firefox’s C++ codebase. There were a lot of challenges in doing this, and I feel like it’s worth sharing things from our experiences.

If you’re interested in Rust integrations, you may find this talk by Katharina on Rust - C++ FFI, and this blog post by Henri on integrating encoding-rs into Firefox useful as well.

## Who is this post for?

So, first off the bat, I’ll mention that when integrating Rust into a C++ codebase, you want to avoid having integrations as tight as Stylo. Don’t do what we did; make your Rust component mostly self-contained so that you just have to maintain something like ten FFI functions for interacting with it. If this is possible to do, you should do it and your life will be much easier. Pick a clean API boundary, define a straightforward API, use cbindgen or bindgen if necessary without any tricks, and you should be good to go.

That said, sometimes you have to have gnarly integrations, and this blog post is for those use cases. These techniques mostly use bindgen in their examples, however you can potentially use them with hand-rolled bindings or another tool as well. If you’re at this level of complexity, however, the potential for mistakes in the hand-rolled bindings is probably not worth it.

Note from 2021: cxx is probably a better tool for many of the use cases here, though many of the techniques still transfer.

## What was involved in Stylo’s FFI?

So, what made Stylo’s FFI so complicated?

It turns out that browsers are quite monolithic. You can split them into vaguely-defined components, but these components are still tightly integrated. If you intend to replace a component, you may need to make a jagged edge of an integration surface.

The style system is more self-contained than other parts, but it’s still quite tightly integrated.

The main job of a “style system” is to take the CSS rules and DOM tree, and run them through “the cascade”1 with an output of “computed styles” tagged on each node in the tree. So, for example, it will take a document like the following:

<style type="text/css">
body {
font-size: 12px;
}
div {
height: 2em;
}
</style>
<body>
<div id="foo"></div>

</body>


and turn it into something like:

• <body> has a font-size of 12px, everything else is the default
• the div #foo has a computed height of 24px 2, everything else is the default. It “inherits” the font-size from <body> as 12px

From a code point of view, this means that Stylo takes in Gecko’s C++ DOM tree. It parses all the CSS, and then runs the cascade on the tree. It stores computed styles on each element in a way that Gecko can read very cheaply.

Style computation can involve some complex steps that require calling back into C++ code. Servo’s style system is multithreaded, but Gecko is mostly designed to work off of a single main thread per process, so we need to deal with this impedence mismatch.

Since the output of Stylo is C++-readable structs, Stylo needs to be able to read and write nontrivial C++ abstractions. Typical FFI involves passing values over a boundary, never to be seen again, however here we’re dealing with persistent state that is accessed by both sides.

To sum up, we have:

• Lots and lots of back-and-forth FFI
• Rust code regularly dealing with nontrivial C++ abstractions
• A need for nontrivial abstractions to be passed over FFI

All of this conspires to make for some really complicated FFI code.

# The actual techniques

I’ll try to structure this so that the more broadly useful (and/or less gnarly) techniques come earlier in the post.

## The basics of bindgen

Bindgen is a tool that generates Rust bindings for structs and functions from the provided C or C++ header files. It’s often used for writing Rust bindings to existing C/C++ libraries, however it’s useful for integrations as well.

To use it for an integration, write a header file containing the functions your Rust code needs (referencing structs from other header files if necessary), and run bindgen on it. For some codebases, doing this once and checking in the generate file suffices, but if your C++ code is going to change a lot, run it as a build dependency instead. Beware that this can adversely impact build times, since your Rust build now has a partial C++ compilation step.

For large C++ codebases, pulling in a single header will likely pull in a lot of stuff. You should allowlist, blocklist, and/or mark things as opaque to reduce the amount of bindings generated. It’s best to go the allowlisting route — give bindgen an allowlisted list of functions / structs to generate bindings for, and it will transitively generate bindings for any dependencies they may have. Sometimes even this will end up generating a lot, it’s sometimes worth finding structs you’re not using and marking them as opaque so that their bindings aren’t necessary. Marking something as opaque replaces it with an array of the appropriate size and alignment, so from the Rust side it’s just some bits you don’t care about and can’t introspect further.

Bindgen does support some C++ features (you may need to pass -x c++). This is pretty good for generating bindings to e.g. templated structs. However, it’s not possible to support all C++ features here, so you may need to blocklist, opaqueify, or use intermediate types if you have some complicated C++ abstractions in the deps. You’ll typically get an error when generating bindings or when compiling the generated bindings, so don’t worry about this unless that happens.

Bindgen is quite configurable. Stylo has a script that consumes a large toml file containing all of the configuration.

## cbindgen

We don’t use cbindgen in Stylo, but it’s used for Webrender. It does the inverse of what bindgen does: given a Rust crate, it generates C headers for its public extern "C" API. It’s also quite configurable.

## cxx

cxx is the cool new hotness in 2021, which kind of approaches the problem from both sides, enabling you to write Rust bindings for C++ and C++ bindings for Rust. It’s definitely worth checking out, a lot of the things that are hard to make work with bindgen are trivial in cxx. For example, it automatically figures out what types need to be opaque, it automatically converts between &T and T* across FFI, and it is overall more targeted for the use case of an FFI layer where Rust and C++ both call each other.

## Bindgen-aided C++ calling Rust

So bindgen helps with creating things for Rust to call and manipulate, but not in the opposite direction. cbindgen can help here, but I’m not sure if it’s advisable to have both bindgen and cbindgen operating near each other on the same codebase.

In Stylo we use a bit of a hack for this. Firstly, all FFI functions defined in C++ that Rust calls are declared in one file, and are all named Gecko_*. Bindgen supports regexes for things like allowlisting, so this naming scheme makes it easy to deal with.

We also declare the FFI functions defined in Rust that C++ calls in another file, named Servo_*. They’re also all defined in one place.

However, there’s nothing ensuring that the signatures match! If we’re not careful, there may be mismatches, causing bad things to happen at link time or runtime. We use a small autogenerated unit test to ensure the validity of the signatures.

This is especially important as we do things like type replacement, and we need tests to ensure that the rug isn’t pulled out from underneath us.

## Type replacing for fun and profit

Using blocklisting in conjunction with the --raw-line/raw_line() flag, one can effectively ask bindgen to “replace” types. Blocklisting asks bindgen not to generate bindings for a type, however bindgen will continue to generate bindings referring to that type if necessary. (Unlike opaque types where bindgen generates an opaque binding for the type and uses it everywhere). --raw-line lets you request bindgen to add a line of raw rust code to the file, and such a line can potentially define or import a new version of the type you blocklisted. Effectively, this lets you replace types.

Bindgen generates unit tests ensuring that the layout of your structs is correct (run them!), so if you accidentally replace a type with something incompatible, you will get warnings at the struct level (functions may not warn).

There are various ways this can be used:

### Safe references across FFI

Note from 2021: cxx does this automatically

Calling into C++ (and accepting data from C++) is unsafe. However, there’s no reason we should have to worry about this more than we have to. For example, it would be nice if accessor FFI functions – functions which take a foreign object and return something from inside it – could use lifetimes. It would be even nicer if nullability were represented on the FFI boundary so that you don’t miss null checks, and can assume non-nullness when the C++ API is okay with it.

In Stylo, we have lots of functions like the following:

RawGeckoNodeBorrowedOrNull Gecko_GetLastChild(RawGeckoNodeBorrowed node);


which bindgen translates to:

extern "C" {
fn Gecko_GetLastChild(x: &RawGeckoNode) -> Option<&RawGeckoNode>;
}


Using the bindgen build script on a provided list of borrow-able types, we’ve told bindgen that:

• FooBorrowedOrNull is actually Option<&Foo>
• FooBorrowed is actually &Foo

Option<&Foo> is represented as a single nullable pointer in Rust, so this is a clean translation. We’re forced to null-check it, but once we do we can safely assume that the reference is valid. Furthermore, due to lifetime elision the actual signature of the FFI function is fn Gecko_GetLastChild<'a>(x: &'a RawGeckoNode) -> Option<&'a RawGeckoNode>, which ensures we won’t let the returned reference outlive the passed reference. Lifetime elision means that we can call C++ functions “safely” with the appropriate lifetime requirements, even though C++ has no such concept!

Note that this is shifting some of the safety invariants to the C++ side: We rely on the C++ to give us valid references, and we rely on it to not have nulls when the type is not marked as nullable. Most C++ codebases internally rely on such invariants for safety anyway, so this isn’t much of a stretch.

We do this on both sides, actually: Many of our Rust-defined extern "C" functions that C++ calls get to be internally-safe because the types let us assume the validity of the pointers obtained from C++.

### Making C++ abstractions Rust-accessible

A very useful thing to do here is to replace various C++ abstractions with Rust versions of them that share semantics. In Gecko, most strings are stored in nsString/nsAString/etc.

We’ve written an nsstring crate that represents layout-compatible nsStrings in a more Rusty way, with Rusty APIs. We then ask bindgen to replace Gecko nsStrings with these.

Usually it’s easier to just write an impl for the bindgen-generated abstraction, however sometimes you must replace it:

• When the abstraction internally does a lot of template stuff not supported by bindgen
• When you want the code for the abstraction to be in a separate crate

## Potential pitfall: Passing C++ classes by-value over FFI

It’s quite tempting to do stuff like

RefPtr<Foo> Servo_Gimme(...);


where you pass complicated classes by-value over FFI (RefPtr is Gecko’s variant of Rc<T>/Arc<T>).

This works on some systems, but is broken on MSVC: The ABI for passing non-POD types through functions is different. The linker usually notices this and complains, but it’s worth avoiding this entirely.

In Stylo we handle this by using some macro-generated intermediate types which are basically the same thing as the original class but without any constructors/destructors/operators. We convert to/from these types immediately before/after the FFI call, and on the Rust side we do similar conversions to Rust-compatible abstractions.

## Sharing abstractions with destructors

If you’re passing ownership of collections or other templated types across FFI, you probably want Rust code to be able to destroy C++ objects, and vice versa.

One way of doing this is to implement Drop on the generated struct. If you have class MyString, you can do:

class MyString {
// ...
~MyString();
}

void MyString_Destroy(*MyString x) {
x->~MyString()
}

impl Drop for bindings::MyString {
fn drop(&mut self) {
// (bindgen only)
bindings::MyString::destruct(self)
// OR
bindings::MyString_Destroy(self)
}
}


The MyString_Destroy isn’t necessary with bindgen – bindgen will generate a MyString::destruct() function for you – but be careful, this will make your generated bindings very platform-specific, so be sure to only do this if running them at build time. In general, when bindgen generates C++ methods, your bindings become platform specific and are best regenerated at build time3.

In Stylo we went down the route of manually defining _Destroy() functions since we started off with checked-in platform-agnostic bindings, however we could probably switch to using destruct() if we want to now.

When it comes to generic types, it’s a bit trickier, since Drop can’t be implemented piecewise on a generic type (you cannot impl Drop for MyVector<Foo>). You have to do something like:

template<typename T>
class MyVector {
// ...
}

// Deallocate buffer, but do not call destructors on elements
void MyVector_Deallocate_Buffer(MyVector<void>* x);

// assume we have an implementation of Iterator for MyVector<T> somewhere

impl<T> Drop for bindings::MyVector<T> {
fn drop(&mut self) {
for v in self.iter_mut() {
// calls the destructor for v, if any
std::ptr::drop_in_place(v)
}
bindings::MyVector_Deallocate_Buffer(self as *mut MyVector<T> as *mut MyVector<c_void>)
}
}



Note that if you forget to add a Drop implementation for T, this will silently forget to clean up the contents of the vector. See the next section for some ways to handle this by creating a “safe” mirror type.

## Mirror types

C++ libraries often have useful templated abstractions, and it’s nice to be able to manipulate them from Rust. Sometimes, it’s possible to just tack on semantics on the Rust side (either by adding an implementation or by doing type replacement), but in some cases this is tricky.

For example, Gecko has RefPtr<T>, which is similar to Rc<T>, except the actual refcounting logic is up to T to implement (it can choose between threadsafe, non-threadsafe, etc), which it does by writing AddRef() and Release() methods.

We mirror this in Rust by having a trait:

/// Trait for all objects that have Addref() and Release
/// methods and can be placed inside RefPtr<T>
pub unsafe trait RefCounted {
/// Bump the reference count.
/// Decrease the reference count.
unsafe fn release(&self);
}

/// A custom RefPtr implementation to take into account Drop semantics and
/// a bit less-painful memory management.
pub struct RefPtr<T: RefCounted> {
ptr: *mut T,
_marker: PhantomData<T>,
}


We implement the RefCounted trait for C++ types that are wrapped in RefPtr which we wish to access through Rust. We have some macros that make this easier to do. We have to have such a trait, because otherwise Rust code wouldn’t know how to manage various C++ types.

However, RefPtr<T> here can’t be the type that ends up being used in bindgen. Rust doesnt let us do things like impl<T: RefCounted> Drop for RefPtr<T> 4, so we can’t effectively make this work with the bindgen generated type unless we write a RefCounted implementation for every refcounted type that shows up in the bindgen output at all – which would be a lot of work.

Instead, we let bindgen generate its own RefPtr<T>, called structs::RefPtr<T> (all the structs that bindgen generates for Gecko go in a structs:: module). structs::RefPtr<T> itself doesn’t have enough semantics to be something we can pass around willy-nilly in Rust code without causing leaks. However, it has some methods that allow for conversion into the “safe” mirror RefPtr<T> (but only if T: RefCounted). So if you need to manipulate a RefPtr<T> in a C++ struct somewhere, you immediately use one of the conversion methods to get a safe version of it first, and then do things to it. Refcounted types that don’t have the RefCounted implementation won’t have conversion methods: they may exist in the data you’re manipulating, however you won’t be able to work with them.

In general, whenever attaching extra semantics to generic bindgen types doesn’t work create a mirror type that’s completely safe to use from Rust, with a trait that gates conversion to the mirror type.

## Potential pitfall: Allocators

If you’re passing heap-managed abstractions across FFI, be careful about which code frees which objects. If your Rust and C++ code don’t share allocators, deallocating memory allocated on the other side can have disastrous consequences.

If you’re building a cdylib or staticlib with Rust (this is likely if you’re linking it with a C++ application), the compiler will by default pick the system allocator (malloc), so if your C++ application also uses the same you’re all set.

On some platforms when building rlibs and binaries, Rust may choose jemalloc instead. It’s also possible that your C++ code uses a different allocator (lots of applications use allocators like jemalloc or tcmalloc, some have their own custom allocators like tor_malloc in Tor).

In such cases you have one of three options:

• Avoid transferring ownership of heap-allocated items, only share things as borrowed references
• Call destructors over FFI, as detailed in the section on destructors above
• Set Rust’s allocator to be the same as documented in the std::alloc module. Basically, can use the #[global_allocator] attribute to select which allocator you wish to use, and if necessary you can implement the GlobalAlloc trait on a custom allocator type that calls into whatever custom allocator C++ is using.

Note from 2021: Most stdlib collections (Vec, for example) now have an optional “custom allocator” parameter that can be used to swap in a different allocator for a specific use site.

## Arcs over FFI: Triomphe

This isn’t really a generalizable technique, but it’s pretty cool and generally instructive, so I’m including it here.

Stylo uses a lot of Arcs. A lot of them. The entire computation of styles makes heavy use of Arc::make_mut’s copy-on-write semantics so that we can build up the style tree in parallel but not have to make unnecessary copies of duplicated/defaulted styles for each element.

Many of these Arcs need to be readable from C++. Rust’s Arc, however, consists of a pointer to an allocation containing a refcount and the data, so if C++ needs to get access to the data it needs to know the layout of the Arc allocation, which we’d rather not do5.

We picked a different route: We created a crate duplicating Arc<T> which behaves almost exactly the same as Arc<T>, but it can be converted to OffsetArc<T> which has its pointer point to the middle of the allocation, where the T begins. To C++, this just looks like a *const T! We were then able to make it work with RefPtr<T> on the C++ side so that C++ can transparently read from the OffsetArc<T>, and only needs to call into Rust if it wishes to clone or drop it.

The external version of this crate can be found in triomphe. It contains a bunch of other goodies that are additionally useful outside of the FFI world, like ArcBorrow which is essentially “&Arc<T> without double indirection”, UniqueArc<T>, a mutable Arc<T> known to be uniquely owned, and ArcUnion<T, U>, which is a space-efficient union of Arc<T> and Arc<U>.

## Other pitfalls

### Transparent

It’s very tempting to wrap C++ types in tuple structs and pass them over FFI. For example, one might imagine that the following is okay:

struct Wrapper(bindings::SomeCppType);

extern "C" {
// C++ signature: SomeCppType get_cpp_type();
fn get_cpp_type() -> Wrapper;
}


This kind of thing is quite useful to get around coherence, or for adding additional semantics to a type.

While there’s basically one obvious way Wrapper can be represented, ABI stuff can be tricky, and Rust’s layout isn’t defined. It is safer to use #[repr(transparent)], which guarantees that Wrapper will have the same representation as the type it contains.

### C enums

Rust supports C-like enums, but there’s a crucial difference between them. In C, it is not undefined behavior for an enum to have an unlisted value. In fact, the following pattern is not uncommon:

enum Flags {
Flag1 = 0b0001,
Flag2 = 0b0010,
Flag3 = 0b0100,
Flag4 = 0b1000;
};


where the enum is actually used for bitflags, and Flag1 | Flag2 and 0 are both valid values for Flags.

This is not the case in Rust. If you are type-replacing C enums with Rust ones, make sure they are #[repr(C)]. The Rust compiler uses invalid enum values as space for packing other information while optimizing types, for example Rust is able to represent Option<Option<... 255 times .. Option<bool>> as a single byte.

If you are working with a C enum that is used for bitflags like above, please use an integer type instead. #[repr(C)] on enums in Rust guarantees layout, but it is still undefined behavior for any enum to take on invalid values.

### ABI concerns

ABIs can be tricky. If you just use bindgen with no special flags, you can be pretty much guaranteed to have an okay ABI, but as you start doing type replacements, stuff can get murkier.

Firstly, make sure you’re not passing owned C++ classes with destructors/etc across FFI boundaries. See above for why. There’s a bunch of subtle stuff here, but you can avoid most of it it if you just don’t pass these things across FFI in an owned way.

Also, try to make sure everything is #[repr(C)] across the boundary. Rust’s improper-ctypes lints will help here.

## Should C++ APIs be unconditionally unsafe?

Before I get into this, I want to reiterate that most of the recommendations in this post are for complex C++-Rust integrations, which are likely to only crop up when attempting to rewrite parts of a large C++ codebase in Rust. Such codebases have unique needs and it’s important to calibrate for that when judging what’s right for them.

I recall when this Chromium post and Steve’s cxx post came out, there was a bunch of discussion about C++ functions not being universally marked unsafe. Essentially, a lot of people are of the opinion that all FFI into C++ (or C) should be unconditionally marked unsafe (and that tools like cxx should follow these rules).

Back then I wrote a Reddit comment about my thoughts on this. It’s a comment that’s the length of a blog post in and of itself so I’m not going to reproduce all of it here, but I’ll try to get the gist. I highly suggest you read it instead of this section.

In short, I would recommend people in large, complex codebases doing heavy C++ interop to generally be okay with marking functions calling into C++ as “safe” provided that function would be considered “safe to call without thinking too much about it” on the C++ side, whatever that means for your codebase.

From my post on “undefined” vs “unsafe”, for Rust I define “safe” as

Basically, in Rust a bit of code is “safe” if it cannot exhibit undefined behavior under all circumstances of that code being used.

C++ doesn’t have a rigid language-level concept of safety that can be applied the same way. Instead, most C++ code follows a similar heuristic:

a bit of code is “safe” if it cannot exhibit undefined behavior under all expected circumstances of that code being used.

This is, perhaps, not as good or useful a heuristic as the one we have for Rust, but it’s still a heuristic that gets used in deciding how careful one needs to be when using various APIs. After all, there are plenty of giant C++ codebases out there, they have got to be able to reason about safety somehow.

When you decide to meld together a C++ and Rust codebase, or start rewriting parts of a C++ codebase in Rust, you have already in essence decided for a large part of the codebase to not exactly follow Rust’s safety rules (but hopefully still be safe). There is little to be gained by making that an explicit part of your FFI boundary. Rather, it is more useful to save unsafe on the FFI boundary for truly unsafe functions which you actually do need to be careful to call.

unsafe is useful for finding potential sources of badness in your codebase. For a tightly-integrated Rust/C++ codebase it’s already well known that the C++-side is introducing badness, marking every simple C++ getter as unsafe will lead to alarm fatigue and make it harder to find the real problems.

It’s worth figuring out where this boundary lies for you. Tools like cxx make it straightforward to call C++ functions through a safe interface, and it’s valuable to make use of that support.

Again, before going down this route it’s worth wondering if you really need tight Rust-C++ integration. When possible, it’s always better to pick a small, well-defined API boundary, rather than Stylo-esque tight integration with shared objects and a highly criscrossed callgraph.

These days cxx is probably the most complete tool for such integrations. bindgen and cbindgen are still quite good, but cxx is C++-first, with a lot more magic, and generally seems to Just Work without too much configuration.

autocxx is a cool concept by Adrian Taylor which melds bindgen and cxx to make something even more magical. It’s currently experimental, but I’m going to be watching it with interest.

Overall the field of Rust and C++ integration is at a stage where it’s mature enough for integrations to be possible without too much effort, but there are still tons of ways things could be improved and I’m super excited to see that happen as more people work on such integrations!

Thanks to Adam Perry, Adrian Taylor, katie martin, Nika Layzell, and Tyler Mandry for reviewing drafts of this post

1. The cascade in “Cascading Style Sheets” is the process used to take all the potential rules which could apply to an element and find the “most applicable” one that gets actually used.

2. The em unit is font-size-relative, so 2em with a font-size of 12px is computed to 2 * 12 = 24px

3. C++ name mangling is not standardized, so any function with the C++ ABI will generate a #[link_name = "_Z1foobarbaz"] attribute on the Rust side, and the exact string used here will differ across compiler implementations and platforms. Since GCC and Clang follow the same scheme, most people will encounter this problem when their code doesn’t work on Windows due to MSVC using a different scheme.

4. Drop impls are restricted in a bunch of ways for safety, in particular you cannot write impl<T: RefCounted> Drop for RefPtr<T> unless RefPtr is defined as RefPtr<T: RefCounted>. It’s not possible to have a generic type that has an impl of Drop for only some possible instantiations of its generics.

5. Rust’s standard library does not typically guarantee anything about the layout of its types, and furthermore, Rust does not make many guarantees about the stability of most types without a #[repr] attribute. This would work, but it would be brittle and prone to breakage.

# On Voting Systems

Posted by Manish Goregaokar on October 09, 2019 in elections, politics, programming

Election season is starting up again, and as with many other topics I’m seeing a lot of overconfident takes from people in tech wanting to “solve” how voting works with naïve techy solutions. Hell, even a presidential candidate seems to have proposed an extremely uninformed plan for “fixing” voting using blockchain technology.

Last year I wrote a thread on Twitter covering some of the essential properties good voting systems uphold as well as how they prevent fraud. It was through the lens of Alameda County’s voting system, where I’ve volunteered as a poll worker in the past (and intend to do again). I’ve been meaning to write down the contents of that thread in blog form for a while, and now seemed like a good opportunity to do it.

I’ll be explaining more about most of these properties later, but ideally, a good voting system should uphold:

• Secret ballot: Nobody, not even you, can verify who you voted for after you’re out of the polling place, to prevent vote-buying and coercion.
• Auditable paper trail: We should be able to audit the election. Paper trails are usually the most robust way to enable effective audits.
• Obviousness: It should be relatively obvious what individuals should be doing when they need to mark their ballots. A system that you can easily “mess up” with is a bad system.
• Accessibility: It should not exclude individuals with disabilities from being able to vote.

## How voting works in Alameda County

I’ll first go over how voting in my county works. The system isn’t perfect, but it’s pretty good, and it’s a good springboard for understanding how voting systems in general can work. There’s a poll worker guide you can refer to if you’re really interested in all the specifics.

Broadly speaking, there are four ways to vote:

• By mail
• In person at certain government offices, before election day (“early voting”)
• In person on election day at a polling place
• Provisionally, in person on election day at a polling place

Voting by mail is pretty straightforward: When you register you can choose to vote by mail (or you can choose to do so online after the fact). You get a ballot in the mail, along with a special envelope. You fill in the ballot at your leisure, stick it in the envelope, write your name/address on the envelope, sign it, and mail it back. There are also convenient ballot dropboxes all over the place in case you’re a millenial like me and don’t want to figure out how to buy stamps1.

If you’re voting by mail you can also show up at any polling place on the day of the election and drop off your ballots in a sealed bin. At the polling place I helped run roughly half of the people coming in were just there to drop off their vote by mail ballots!

Voting by mail is by far the easiest option here. Sadly not all counties support it2. In some states this is even the default option.

As I understand it, voting in person at designated government offices3 is pretty much the same as voting in person at a polling place, it’s just run by government employees instead of volunteers and open for a few weeks before election day.

### In person voting

If you’ve chosen to vote in person, you are supposed to turn up at your assigned polling place (you get your assignment in the mail along with other voter info booklets).

There’s a copy of the list of people assigned to the polling place posted outside, and another with the poll workers inside. When you tell your name to the poll workers, they cross your name off the list, and you have to sign your name next to it4.

• If your name isn’t on the list, the poll workers will try and find your assigned precinct and inform you that you can go there instead, but you can still choose to vote provisionally at the existing precinct.
• If your name isn’t on the list of all voters (perhaps you registered very late, or were unable to register), you can also vote provisionally.
• If your name is on the list but marked as voting-by-mail (and you want to vote in person), you can vote normally only if you surrender your mail ballot (which poll workers will mark as spoiled and put in a separate pouch).
• If you lost/didn’t receive your ballot, you can always vote provisionally.

When you are voting normally, signing your name on the list fraudulently is illegal.

If it is your first time voting, you need to show some form of ID, but it doesn’t need to be photo ID and even a utility bill is fine.

Once you’re done signing, you’ll be given your ballot cards and a privacy sleeve folder so you can carry your filled ballots around. Because this is California and there are tons of local and state measures, we had 4 (!!) ballot cards, six sides to fill in5. Usually a poll worker will also detach the ballot stubs in front of you and hand them to you to keep. You can use these to check the status (but not the contents!) of your ballot online.

You take your cards to a voting booth, fill them in, and come back. A poll worker will then help you feed your ballot cards into a scanner machine. This machine will reject cards with any problems — which you can fix, rerequesting new ballot cards if necessary, but you then have to spoil and return the old ballot card.

The machine keeps an externally-visible tally of the number of ballots submitted, and an internal tally of all the votes made, ignoring write-ins. It also internally stores ballot cards in one of two bins (depending on write-ins). These bins are verified to be empty when polls open, and are inaccessible till polls close.

It’s important to note that the scanner is not a load-bearing component of the system: It could be replaced with a locked bin with a slot, and the system would still work. The scanner enables one to get preliminary results for the precinct, and provides a way to double-check results.

And that’s it! You’ll be given an I Voted sticker, and you can go home!

### Using a voting machine

In case you think you will have trouble filling out a ballot card in pen (e.g. if your vision is heavily impared), there’s an alternate way to vote that doesn’t involve a pen. Instead, we have a machine which has a touchscreen and an audio unit, which prompts the voter for their selection for each ballot item on the touchscreen or audio unit. When they’re done, the machine will print out a “receipt” listing their choices inside a sealed box with a glass window, so they can verify that their vote was recorded correctly6. Once they’re done the sealed box will scroll the “receipt” out of view so that the next voter can’t see it.

The sealed box is called a Voter-Verified Paper Trail box: the election runners no longer need to trust the machine’s internal memory, they can trust the paper trail inside the box (which, while produced by a potentially-untrustworthy machine, was verified by the voters), and the machine’s internal memory is simply a way to double-check (and get fast preliminary results).

### Provisional voting

There are many, many situations in which you may not be able to vote normally. Perhaps you showed up at the wrong precinct but don’t have time to go to the right one. Perhaps you were signed up for vote-by-mail but didn’t receive (or lost) your ballot. Perhaps you recently moved into the county and weren’t able to register in time. Perhaps you were a first-time in-person voter and didn’t have some form of ID.

In such a case you can always vote provisionally. The beauty of this system is that it removes most liability from poll workers: we don’t have any reason to turn people away from the polls, all we can do is refuse to let people vote normally (and instead vote provisionally) in the case of any inconsistencies. This is not to say that malicious poll workers can’t turn people away; it’s illegal but it happens. But well-meaning poll workers cannot, by accident, disenfranchise a voter because we are always allowed to give them a provisional ballot, and that’s an easy rule to follow.

With provisional voting, the voters are given the same ballot cards, but they’re also given an envelope with a form on it. This envelope is equivalent to a voter registration form, (re)registering them in their appropriate county/district7. They vote on the ballot cards normally, but instead of submitting the ballots to the scanner, they put them in the envelope, which goes into a sealed bin8. You’re also given a number you can call to check the status of your ballot.

When you vote provisionally, the registrar of voters will manually process your envelope, remaking your ballot on the right set of cards if necessary, and feeding them into a scanner machine.

### Integrity checks

Underlying this system is a bevy of integrity checks. There’s an intricate seal system, with numbered seals of varying colors. Some are to be added and never removed, some are to be removed after checking the number, some are never supposed to be touched, some are added at the beginning of the day and removed at the end of the day.

For example, during setup we check that the bins in the scanner are empty, and seal it with a numbered seal. This number is noted down on a form, along with some numbers from the scanner/touchscreen display. The first person to vote is asked to verify all this, and signs the form along with the poll workers.

Election officials drop in multiple times during the day, and may check these numbers. At the end of the day, the numbers of all seals used, and any physical seals that were removed are sent back along with all the ballots.

Various ballot counts are also kept track of. We keep track of the number of provisional ballots, the number of submitted regular ballots (also kept track by the scanner), the number of ballot cards used, and the number of unused ballots left over. Everything needs to match up at the end of the day, and all unused ballots are sent back. These counts are also noted down.

Poll watchers are allowed to be around for most of this, though I think they’re not allowed to touch anything. I think poll watchers are also allowed to be around when the actual ballots are being counted by election officials.

### Immediate local results

As mentioned before, the scanner isn’t a crucial part of the system, but if it happens to be working it can be used to get immediate local results. At the end of the day, the scanner prints out a bunch of stuff, including vote totals for races which got more than N votes (N=20, IIRC), so you get immediate results for your precinct. This printout is supposed to be taped to the polling place doors for everyone to see, and presumably the registrar of voters uses the copy submitted to them to publish quick preliminary results.

Using paper ballots doesn’t mean that we have to give up all the benefits of computers doing some of the work for us! We can still use computers to get fast results, without relying on them for the integrity of the system.

## Properties of this voting system

This system has some crucial properties.

### Secret ballot

It’s well known that nobody is supposed to be able to see who you voted for. But a crucial part of this system is that, once you submit your ballots, you can’t see who you voted for either. Of course, you probably can remember, but you have no proof. On the face of it this sounds like a bad property — wouldn’t it be nicer if people could verify that their vote was counted correctly?

The problem is that if I can verify that my vote was counted correctly, someone else can coerce me into doing this in front of them to ensure I voted a certain way. Any system that gives me the ability to verify my vote gives people who have power over me (or just people who want to buy my vote) the same ability.

Provisional voting doesn’t quite have this property, but it’s supposed to be for edge cases. Vote by mail trades off some of this property for convenience; people can now see who you voted for while you’re voting (and the people you live with can fradulently vote on your behalf, too).

### Conservation of ballots (Auditable paper trail)

The total number of ballots in the system is roughly conserved and kept track of. If you’re registered to vote by mail, you cannot request a normal ballot without surrendering your vote by mail ballot and envelope (which we mark as spoiled and put in a separate pouch). If you re-request a ballot card because you made a mistake, the old card needs to be similarly spoiled and put away separately. It’s one set of ballot cards per voter, and almost all potential aberrations in this property result in a provisional vote9. Even provisional votes are converted to normal ballot cards in the end.

Eventually, there will be a giant roomful of ballots that cannot be traced back to their individual voters, but it can still be traced back to the entirety of the voters — it’s hard to put a ballot into the system without a corresponding voter. This is perfect — the ballots can be hand-counted, but they can’t be individually corellated with their respective voters.

You don’t even need to recount the entire set of ballots to perform an audit, risk limiting audits are quite effective and much more efficient to carry out.

### Paper ballots

The fact that they can (and should) be hand counted is itself an important property. Hand counting of ballots can be independently verified in ways that can’t be done for software. Despite not being able to trace a ballot back to its voter, there still is a paper trail of integrity for the ballots as a bulk unit.

This property leads to [software independance]: while we may use software in the process, it’s not possible for a bug in the software to cause an undetectable error in the final vote counts.

### Obviousness

Figuring out what to do in the voting booth isn’t hard. You’re allowed to request assistance, but you’ll rarely have to. There are systems (like the scanner’s error checking) that are designed to ensure you don’t mess up, but the system is quite sound even without them; they just provide an additional buffer.

Compare this with the problems some Texas voting machines had last midterm. The machines were somewhat buggy, but, crucially, there was an opaque right and wrong way to use them, and some voters accidentally used it the wrong way, and then didn’t check the final page before submitting. This kind of thing should never happen in a good voting system.

It’s really important that the system is intuitive and hard to make mistakes in.

## Fraud prevention

So, how is this robust against fraud?

Firstly, voter fraud isn’t a major problem in the US, and it’s often used as an excuse to propagate voter suppression tactics, which are a major problem here.

But even then, we probably want our system to be robust against fraud.

Let’s see how an individual might thwart this system. They could vote multiple times, under assumed identites. This doesn’t scale well and isn’t really worth it: to influence an election you’d need to do this many times, or have many individuals do it a couple times, and the chance of getting caught (e.g., the people who you are voting as may come by and try to vote later, throwing up a flag) and investigated scales exponentially with the number of votes. That’s not worth it at all.

Maybe poll workers could do something malicious. Poll worker manipulation would largely exist in the form of submitting extra ballots. But that’s hard because the ballot counts need to match the list of voters. So you have the same problem as individual voters committing fraud: if the actual voter shows up, they’ll notice. Poll workers could wait till the end of the day to do this, but then to make any kind of difference you’d have to do a bulk scan of ballots, and that’s very noticeable. Poll workers would have to collude to make anything like this work, and poll watchers (and government staff) may be present.

Poll workers can also discard ballots to influence an election. But you can’t do this in front of the voters, and the receptacles with non-defaced ballots are all sealed so you can’t do this when nobody is watching without having to re-seal (which means you need a new numbered seal, which the election office will notice). The scanner’s inner receptacle is opened at the end of the day but you can’t tamper with that without messing up the various counts.

Election officials have access to giant piles of ballots and could mess with things there, but I suspect poll watchers are present during the ballot sorting and counting process, and again, it’s hard to tamper with anything without messing up the various counts.

Overall, this system is pretty robust. It’s important to note that fraud prevention is achieved by more social means, not technical means: there are seals, counts, and various properties of the system, but no computers involved in any crucial roles.

## Techy solutions for voting

In general, amongst the three properties of “secret ballot”, “obviousness”, and “auditable paper trail”, computer-based voting systems almost always fail at one, and usually fail at two.

A lot of naïve tech solutions for voting are explicitly designed to not have the secret ballot property: they are instead designed specifically to let voters check that what their vote was counted as after the election. As mentioned earlier, this is a problem for vote-buying and coercion.

It’s theoretically possible to have a system where you can ensure your ballot, specifically, was correctly counted after the election, without losing the secret ballot property: ThreeBallot is a cool example of such a system, though it fails the “obviousness” property.

Most systems end up not having an auditable paper trail since they rely on machines to record votes. This is vulnerable to bugs in the machine: you end up having to trust the output of the machine. Buggy/vulnerable voting machines are so common that every year at DEFCON people get together to hack the latest voting machines, and typically succeed.

Voting machines can still produce a paper trail: Voter-Verified Paper Trail systems partially succeed in doing this. They’re not as good with maintaining the “conservation of ballots” property that makes tampering much harder, and they’re not as good on the “obviousness” part since people need to check the VVPAT box for what their vote was recorded as.

Ballot-Marking devices are a bit better at this: These still produce paper ballots, it’s just that the ballot is marked by the machine on your behalf. There’s still a bit of an “obviousness” fail in that people may not double check the marked ballot, but at least there’s a nice paper trail with ballot conservation! Of course, these only work if the produced ballot is easily human-readable.

It’s not impossible to design good voting systems that rely on technology, but it’s hard to maintain the same properties you can with paper ballots. If you want to try, please keep the properties listed above in mind.

### Blockchain?

Every now and then people will suggest using blockchains for voting. This is a pretty large design space, but …. most of these proposals are extremely naïve and achieve very little.

For one, most of them are of the category that lose the “secret ballot” property, instead producing some kind of identifier you’re able to check in some published blockchain. This lets you see what your vote was after the election, and as I’ve covered already that’s not a good thing.

Even if this process only lets you verify that your vote was counted (but not what it was), it typically involves some understanding of cryptography to spot-check the validity of the machine output (e.g. you need to verify that some hash is the hash of your actual vote or something). This fails the obviousness property.

Blockchains don’t really bring much to the table here. They’re decent for byzantine fault tolerance in a space without a central authority, but elections do have some form of central authority and we’re not getting rid of that. The anonymity properties of blockchains can usually be achieved without blockchains for things like elections.

There are some kinds of cryptography that can be useful for auditability — zero knowledge proofs and homomorphic encryption come to mind — but you don’t need blockchains to use these, and using these still requires some form of technology as a key part of the voting system and this makes other properties of the system harder to achieve.

## Become a poll worker!

It’s still a bit early for the next election, but I highly recommend you volunteer to be a poll worker for your county if you can!

It’s really fun, you get to learn about the inner workings of voting systems, and you get to meet a lot of people!

Thanks to Nika Layzell, Sunjay Varma, Jane Lusby, and Arshia Mufti for providing feedback on drafts of this blog post.

1. Last year they required postage, but I they’ve changed that with a law this year. Yay!

2. Ostensibly because of fears of voter fraud, but they’re largely unfounded — in practice this just reduces turnout

3. I think for Alameda county the only such office is the Registrar of Voters in Oakland

4. The crossing-off and signing lists are different, but this isn’t too important.

5. I remember one particularly curmudgeonly voter loudly grumbling about all the propositions as they were voting. One doesn’t “vote” in California, one fills out social studies homework.

6. I don’t quite recall how the verifiability works for people using the audio unit, they may be allowed to ask someone else to verify for them?

7. If you vote in a different precinct, or worse, a different county, the ballot cards may not contain all the same races, so voting provisionally from the wrong district means that you only get to vote for the races common to both ballot cards.

8. It’s imperative that these do not go into the scanner (since that operation cannot be undone), and to prevent this poll workers are instructed to not give provisional voters a secrecy sleeve as the envelope acts as a secrecy sleeve. Whoever is supervising the scanner will only allow people with secrecy sleeves to slip their ballots into the scanner.

9. The exception is using the touchscreen machine, where you get to vote without using up a ballot card on voting day. However, tallies for the machine are kept separately, and I think these too are eventually turned into normal ballot cards.

# Rust Governance: Scaling Empathy

Posted by Manish Goregaokar on February 04, 2019 in programming, rust

There’s been a lot of talk about improving Rust’s governance model lately. As we decompress from last year’s hectic edition work, we’re slowly starting to look at all the bits of debt we accumulated, and organizational debt is high on that list.

I’ve been talking in private with people about a bunch of these things for quite a while now, and I felt it worthwhile to write down as much of my thoughts as I can before the Rust All Hands in Berlin this week.

In the interest of brevity1 I’m going to assume the reader is roughly familiar with most of the stuff that’s happened with the Rust community in the past few years. I’m probably going to omit concrete examples of incidents, both to avoid mischaracterizing individual actions (as with most such analyses, I wish to talk in more general terms about trends), and also just because it would take me forever to write this if I were to supply all the layers of context. If you feel something is inaccurate, please let me know.

This blog post is probably going to reach the eyes of non-Rust-community members. You’re welcome to read it, but please accept my apologies in advance if it doesn’t make any sense. This is something that I initially planned to circulate as a private post (writing for a general audience is hard), but I felt this would be more widely useful. However due to time constraints I haven’t had time to edit it to make it acceptable to a wider audience.

## The symptoms

Before I actually get into it, I’d like to carefully delineate what the problem is that I’m talking about. Or more accurately, the symptoms I am talking about — as I’ll explain soon I feel like these are not the actual problem but symptoms of a more general problem.

Basically, as time has gone by our decisionmaking process has become more and more arduous, both for community members and the teams. Folks have to deal with:

• The same arguments getting brought up over and over again
• Derailing
• Not feeling heard
• Just getting exhausted by all the stuff that’s going on

The RFC process is the primary exhibitor of these symptoms, but semi-official consensus-building threads on internals.rust-lang.org have similar problems.

Aaron has written some extremely empathetic blog posts about a bunch of these problems, starting with concrete examples and ending with a takeaway of a bunch of values for us to apply as well as thoughts on what our next steps can be. I highly recommend you read them if you haven’t already.

Fundamentally I consider our problems to be social problems, not technical ones. In my opinion, technical solutions like changing the discussion forum format may be necessary but are not sufficient for fixing this.

## The scaling problem

I contend that all of these issues are symptoms of an underlying scaling issue, but also a failure of how our moderation works.

The scaling issue is somewhat straightforward. Such forum discussions are inherently N-to-N discussions. When you leave a comment, you’re leaving a comment for everyone to read and interpret, and this is hard to get right. It’s much easier to have one-on-one discussions because it’s easy to build a shared vocabulary and avoid misunderstandings; any misunderstandings can often be quickly detected and corrected.

I find that most unpleasant technical arguments stem from an unenumerated mismatch of assumptions, or sometimes what I call a mismatch of axioms (i.e. when there is fundamental conflict between core beliefs). A mismatch of assumptions, if identified, can be resolved, leading to an amicable conclusion. Mismatches of axioms are harder to resolve, however recognizing them can take most of the vitriol out of an argument, because both parties will understand each other, even if they don’t agree. In such situations the end result may leave one or both parties unhappy, but rarely angry. (It’s also not necessary that axiom mismatches leave people unhappy, embracing positive sum thinking helps us come to mutually beneficial conclusions)

All of these mismatches are easy to identify in one-on-one discussions, because it’s easy to switch gears to the meta discussion for a bit.

One-on-one discussions are pleasant. They foster empathy.

N-to-N discussions are not. It’s harder to converge on this shared vocabulary amongst N other people. It’s harder to identify these mismatches, partly because it’s hard to switch into the meta-mode of a discussion at all, but also because there’s a lot going on. It’s harder to build empathy.

As we’ve grown, discussion complexity has grown quadratically, and we’re not really attempting to relinearize them.

### Hanabi and parallel universes

I quite enjoy the game of Hanabi. It’s a game of information and trust, and I find it extremely fun, especially with the right group.

Hanabi is a cooperative game. You can see everyone’s cards (or tiles) but your own, and information-sharing is severely restricted. The goal is to play the right cards in the right order to collectively win. The gimmick is to share additional information through the side-channel of the choice of move you make.

A very common occurrence in this game is that people start making plans in their mind. You typically have a decent understanding of what information everyone has, and you can use this to make predictions as to what everyone’s moves will be. With this in mind, you can attempt to “set up” situations where the game progresses rapidly in a short period of time. This is somewhat necessary for the game to work, but a common pitfall is for these plans to be extremely elaborate, leading to frustration as the game doesn’t actually play out as planned.

The core issue behind this is forgetting that you actually can’t see the entire game state, since your own cards are hidden. It’s not just you who has plans — everyone does! And each of those plans is incomplete since they’re missing a piece of the picture, just as you are.

In Hanabi it’s very easy to forget that you’re missing a piece of the picture — in competitive card games you mostly can’t see the game state since everyone else’s cards are hidden. But in Hanabi you can see most of the cards and it’s easy to forget that your own four cards are hidden from you.

So what ends up happening is that due to incomplete information, everyone is operating in their own little parallel universe, occasionally getting frustrated when it becomes clear that other players are not operating in the same universe. As long as you recognize the existence of these parallel universes beforehand you’re fine, but if you don’t you will be frustrated.

This is largely true of N-to-N discussions as well. Because most of what’s being said makes sense to an individual in a particular way, it’s very easy for them to forget that other people may not share your assumptions and thus may be on a different page. Every time someone leaves a comment, different people may interpret it differently, “forking” the common understanding of the state of the discussion into multiple parallel universes. Eventually there are enough parallel universes that everyone’s talking past each other.

One thing I often prefer doing in such cases is to have a one on one discussion with people who disagree with me — typically the shared understanding that is the end result of such discussions is super useful and can be brought back to the discussion as something that all participants interpret the same way. I’m not consistent in doing this — in the midst of a heated argument it’s easy to get too wrapped up in the argument to think about getting results and I’ve certainly had my time arguing instead of resolving — but overall whenever I’ve chosen to do this it’s been a useful policy.

This is a good example of how relinearization and communication can help move N-to-N discussions along. Operating in different parallel universes is kind of the point of Hanabi, but it’s not the point of having a technical discussion.

## The moderation problem

In a technical discussion, broadly speaking, I find that there are three kinds of comments disagreeing with you:

• Constructive: Comments which disagree with you constructively. We’re glad these exist, disagreement can hurt but is necessary for us to collaboratively reach the best outcomes.
• Disruptive: Comments which may be written in good faith but end up being disruptive. For example, this includes people who don’t read enough of the discussion and end up rehashing the same points. It also includes taking discussions off topic. These kinds of things are problematic but not covered by the code of conduct.
• Abrasive: Comments which are rude/abrasive. These are covered by the code of conduct. The mod team tries to handle these.

(For a long time I and Aaron had a shared vocabulary of “Type A, B, C” for these, mostly because I’m often unimaginative when it comes to such things, thanks to Mark for coming up with, better, descriptive titles)

Note that while I’m talking about “disruptive” comments it’s not a judgement on the intent of the participants, but rather a judgement on the harm it has caused.

The second category – disruptive comments – are the thing we’re currently unable to handle well. They snowball pretty badly too — as more and more of these collect, more and more people get frustrated and in turn leave comments that cause further disruption. As the discussion progresses into more and more “parallel universes” it also just becomes easier for a comment to be disruptive.

The Rust moderation team operates mostly passively, we simply don’t have the scale2 to watch for and nip these things in the bud. Active moderation requires a degree of involvement we cannot provide. So while the best response would be to work with participants and resolve issues early as we see them crop up, we typically get pulled in at a point where some participants are already causing harm, and our response has to be more severe. It’s a bit of a catch-22: it’s not exactly our job to deal with this stuff3, but by the time it becomes our job (or even, by the time we notice), most acceptable actions for us to take are extremely suboptimal. The problem with passive moderation is that it’s largely reactive — it’s harder to proactively nudge the discussion in the right direction when you don’t even notice what’s going on until it’s too late. This is largely okay for dealing with bad-faith actors (the main goal of the mod team); it’s hard to prevent someone from deciding to harass someone else. But for dealing with disruptive buildups, we kind of need something different.

## Participation guidelines

Part of the solution here is recognizing that spaces for official discussion are different from community hangout spaces. Our code of conduct attempts to handle abrasive behavior, which can disrupt discussions anywhere, but the comments that can disrupt consensus building in official discussions aren’t really covered. Nor are the repercussions of code of conduct violations really appropriate for such disruptive comments anyway.

A proposal I’ve circulated in the past is to have a notion of participation guidelines. Discussions in team spaces (RFCs, pre-RFCs, discord/zulip/IRC channels during team meetings) follow a set of rules set forth by the individual teams. It might be worth having a base set of participation guidelines defined by the core team. Something like the following is a very rough strawman:

• Don’t have irrelevant discussions during team meetings on Discord/IRC/Zulip
• Don’t take threads off topic
• Don’t rehash discussions

We ask people to read these before participating, but also breaking these rules isn’t considered serious, it just triggers a conversation (and maybe the hiding/deletion of a comment). If someone repeatedly breaks these rules they may be asked to not participate in a given thread anymore. The primary goal here is to empower team members to better deal with disruptive comments by giving them a formalized framework. Having codified rules helps team members confidently deal with such situations without having to worry as much about drawing direct ire from affected community members.

A base participation guidelines document can also be a value statement, not just a set of rules but also set of values. These values can be things like:

• “We explicitly value high empathy interactions”
• “How everyone is feeling is everyone’s business”

(h/t Adam for the articulate wording here)

Having such words written somewhere — both the high level values we expect people to hold, and the individual behaviors we expect people to exhibit (or not exhibit) — is really valuable in and of itself, even if not enforced. The value of such documents is not that everyone reads them before participating — most don’t — but they serve as a good starting point for people interested in learning how to best conduct themselves, as well as an easy place to point people to where they’re having trouble doing so.

On its own, I find that this is a powerful framework but may not achieve the goal of improving the situation. I recently realized that this actually couples really well with a different idea I’ve been talking about for quite a while now, the idea of having facilitators:

## Facilitators

A common conflict I see occurring is that in many cases it’s a team’s job to think about and opine on a technical decision, but it’s also the team’s job to shepherd the discussion for that decision. This often works out great, but it also leads to people just feeling unheard. It kinda hurts when someone who has just strongly disagreed with you goes on to summarize the state of the discussion in a way that you feel you’ve been unfairly represented. The natural response to that for most people isn’t to work with that person and try to be properly represented, it’s to just get angry, leading to less empathy over time.

By design, Rust team members are partisan. The teams exist to build well-informed, carefully crafted opinions, and present them to the community. They also exist to make final decisions based on the results of a consensusbuilding discussion, which can involve picking sides. This is fine, there is always going to be some degree of partisanship amongst decisionmakers, or decisions would not get made.

Having team members also facilitate discussions is somewhat at odds with all of this. Furthermore, I feel like they don’t have enough bandwidth to do this well anyway. Some teams do have a concept of “sheriffs”, but this is more of an onramp to full team membership and the role of a sheriff is largely the same as the role of a team member, just without a binding vote.

I feel like it would be useful to have a group of (per-team?) facilitators to help with this. Facilitators are people who are interested in seeing progress happening, and largely don’t have much of an opinion on a given discussion, or are able to set aside this opinion in the interest of moving a discussion forward. They operate largely at the meta level of the discussion. Actions they may take are:

• Summarizing the discussion every now and then
• Calling out one sided discussions
• Encouraging one-on-one tangents to be discussed elsewhere (perhaps creating a space for them, like an issue)
• Calling out specific people to do a thing that helps move the discussion forward. For example, something like “hey @Manishearth, I noticed you’ve been vocal in arguing that Rust should switch to whitespace-sensitive syntax, could you summarize all the arguments made by people on your side?” would help.
• Reinforcing positive behavior
• Occasionally pinging participants privately to help them improve their comments
• Attempting to identify the root cause of a disagreement, or empowering people to work together to identify this. This one is important but tricky. I’ve often enjoyed doing it — noticing the core axiomatic disagreement at play and spelling it out is a great feeling. But I’ve also found that it’s incredibly hard to do when you’re emotionally involved, and I’ve often needed a nudge from someone else to get there.

At a high level, the job of the facilitators is to:

• help foster empathy between participants
• help linearize complex discussions
• nudge towards cooperative behavior, away from adversarial behavior. Get people playing not to win, but to win-win.

It’s important to note that facilitators don’t make decisions — the team does. In fact, they almost completely avoid making technical points, they instead keep their comments largely at the meta level, perhaps occasionally making factual corrections.

The teams could do most of this themselves4, but as I’ve mentioned before it’s harder for others to not perceive all of your actions as partisan when some of them are. Furthermore, it can come off as patronizing at times.

This is also something the moderation team could do, however it’s much harder to scale the moderation team this way. Given that the moderation team deals with harassment and stuff like that, we need to be careful about how we build it up. On the other hand facilitating discussions is largely a public task, and the stakes aren’t as high: screwups can get noticed, and they don’t cause much harm. As a fundamentally proactive moderation effort, most actions taken will be to nudge things in a positive direction; getting this wrong usually just means that the status quo is maintained, not that harm is caused. Also, from talking to people it seems that while very few people want to be involved in moderating Rust, this notion of facilitating sounds much more fun and rewarding (I’d love to hear from people who would like to help).

And to me, this pairs really well with the idea of participation guidelines: teams can write down how they want discussions to take place on their venues, and facilitators can help ensure this works out. It’s good to look at the participation guidelines less as a set of rules and more as an aspiration for how we conduct ourselves, with the facilitators as a means to achieving that goal.

There are a lot of specifics we can twiddle with this proposal. For example, we can have a per-team group of appointed facilitators (with no overlap with the team), and for a given discussion one facilitator is picked (if they don’t have time or feel like they have strong opinions, try someone else). But there’s also no strong need for there to be such a group, facilitators can be picked as a discussion is starting, too. I don’t expect most discussions to need facilitators, so this is mostly reserved for discussions we expect will get heated, or discussions that have started to get heated. I’m not really going to spend time analysing these specifics; I have opinions but I’d rather have us figure out if we want to do something like this and how before getting into the weeds.

## Prospective outcomes

The real goal here is to bootstrap better empathy within the community. In an ideal world we don’t need facilitators, instead everyone is able to facilitate well. The explicitly non-partisan nature of facilitators is useful, but if everyone was able to operate in this manner it would largely be unnecessary. But as with any organization, being able to horizontally scale specific skills is really tricky without specialization.

I suspect that in the process of building up such a team of facilitators, we will also end up building a set of resources that can help others learn to act the same way, and eventually overall improve how empathetic our community is.

The concept of facilitators directly addresses the moderation problem, but it also handles the scaling problem pretty well! Facilitators are key in re-linearizing the n-to-n discussions, bringing the “parallel universes” together again. This should overall help people (especially team members) who are feeling overwhelmed by all the things that are going on.

This also helps with concerns people have that they’re not getting heard, as facilitators are basically posed as allies on all sides of the argument; people whose primary goal is to help communication happen.

Overall what I’ve proposed here isn’t a fully-formed idea; but it’s the seed of one. There are a lot of interesting bits to discuss and build upon. I’m hoping through this post we might push forward some of the discussions about governance — both by providing a strawman idea, as well as by providing a perspective on the problem that I hope is useful.

I’m really interested to hear what people think!

Thanks to Aaron, Ashley, Adam, Ember, Arshia, Michael, Sunjay, Nick and other people I’ve probably forgotten for having been part of these discussions with me over the last few years, helping me refine my thoughts

1. I am way too verbose for “brief” to be an accurate description of anything I write, but might as well try

2. Scaling the moderation team properly is another piece of this puzzle that I’m working on; we’ve made some progress recently.

3. I helped draft our moderation policy, so this is a somewhat a lack of foresight on my part, but as I’ll explain later it’s suboptimal for the mod team to be dealing with this anyway.

4. In particular, I feel like Aaron has done an excellent and consistent job of facilitating discussions this way in many cases.

# Converting a WebGL Application to WebVR

Posted by Manish Goregaokar on September 11, 2018 in js, programming, web

I wrote a post for Mozilla Hacks on converting WebGL applications to WebVR, you can read it there

# Why I Enjoy Blogging

Posted by Manish Goregaokar on August 26, 2018 in programming, writing

I started this blog three years ago, moving from my older blog, hoping to written about programming, math, physics, books, and miscellenia. I’ve not quite written about everything I wanted to, but I’ve been very very happy with the experience of blogging. wc says I’ve written almost 75k words, which is mind-boggling to me!

I often get asked by others — usually trying to decide if they should start blogging — what it’s like. I also often try to convince friends to blog by enumerating why I think it’s awesome. Might as well write it down so that it’s generally useful for everyone! 😃

## Blogging helps cement my understanding of things!

I’ve often noticed that I’ll start blogging about something I think I understand, and it turns out that my understanding of the subject was somewhat nebulous. Turns out it’s pretty easy to convince ourselves that we understand something.

The act of writing stuff down helps cement my own understanding — words are usually not as nebulous as thoughts so I’m forced to figure out little details.

I recall when I wrote my post on how Rust’s thread safety guarantees work, I thought I understood Send and Sync in Rust. I understood what they did, but I didn’t have a clear mental model for them. I obtained this mental model through the process of writing the post; to be able to explain it to others I had to first explain it to myself.

I point out this post in particular because this was both one of the first posts for me where I’d noticed this, and, more importantly, my more concrete mental model led to me finding a soundness bug in Rust’s standard library. When I was thinking about my mental model I realized “an impl that looks like this should never exist”, so I grepped the source code and found one1.

I’ve even noticed a difference between one-on-one explaining and explaining things through blog posts. I love explaining things one-on-one, it’s much easier to tailor the explanation to the other person’s background, as well as what they’re actually asking for help with. Plus, it’s interactive. A lot of my posts are of the “okay I get this question a lot I’m going to write down the answer so I don’t have to repeat myself” kind and I’ve found that I’ve often learned things from these despite having talked about the thing in the article contents multiple times.

I guess it’s basically that blogging is inherently one-many — you’re trying to explain to a whole group of people with varied backgrounds — which means you need to cover all your bases2 and explain everything together instead of the minimum necessary.

## It’s really fun to revisit old posts!

Okay, I’ll admit that I never really write blog posts with this in mind. But when I do reread them, I’m usually quite thankful I wrote them!

I’m a fan of rereading in general, I’ve reread most of my favorite books tens of times; I make a yearly pilgrimage to James Mickens’ website; I reread many of my favorite posts and articles on the internet; and I often reread my own posts from the past.

Sometimes I’ll do it because I want a refresher in a topic. Sometimes I’ll do it because I’m bored. Whatever the reason, it’s always been a useful and fun thing to do.

Rereading old posts is a great way to transport myself back to my mindset from when I wrote the post. It’s easy to see progress in my understanding of things as well as in my writing. It’s interesting to note what I thought super important to include in the post then that I consider totally obvious now3. It’s interesting to relearn what I’ve forgotten. It’s reassuring to realize that my terrible jokes were just as terrible as they are now.

One of my favorite posts to reread is this really long one on generalized zero knowledge proofs. It’s the longest post I’ve written so far4, and it’s on a topic I don’t deal with often — cryptography. Not only does it help put me back in a mindset for thinking about cryptography, it’s about something super interesting but also complicated enough that rereading the post is like learning it all over again.

## It lets me exercise a different headspace!

I like programming a lot, but if programming was all I did, I’d get tired pretty quickly. When I was a student learning physics I’d often contribute to open source in my spare time, but now I write code full time so I’m less inclined to do it in my free time5.

But I still sometimes feel like doing programmery things in my spare time just … not programming.

Turns out that blogging doesn’t tire me out the same way! I’m sure that if I spent the whole day writing I’d not want to write when I go home, but I don’t spend the whole day writing, so it’s all good. It’s refreshing to sit down to write a blog post and discover a fresh reserve of energy. I’m not sure if this is the right term, but I usually call this “using a different headspace”.

I’ve also started using this to plan my work, I mix up the kinds of headspace I’m employing for various tasks so that I feel energetic throughout the day.

This is also why I really enjoy mentoring — mentoring often requires the same effort from me as fixing it myself, but it’s a different headspace I’m employing so it’s less tiring.

## Blogging lets me be lazy!

I often find myself explaining things often. I like helping folks and explaining things, but I’m also lazy6, so writing stuff down really makes stuff easy for me! If folks ask me a question I can give a quick answer and then go “if you want to learn more, I’ve written about it here!”. If folks are asking a question a lot, there’s probably something missing in the documentation or learning materials about it. Some things can be fixed upstream in documentation, but other things — like “how should I reason about modules in Rust?” deserve to be tackled as a separate problem and addressed with their own post.

(Yes, this post is in this category!)

## It’s okay if folks have written about it before!

A common question I’ve gotten is “Well I can write about X but … there are a lot of other posts out there about it, should I still?”

Yes!!

People think differently, people learn differently, and people come from different backgrounds. Existing posts may be useful for some folks but less useful for others.

My personal rule of thumb is that if it took me some effort to understand something after reading about it, that’s something worth writing about, so it’s easier to understand for others like me encountering the subject.

One of my favorite bloggers, Julia Evans very often writes posts explaining computer concepts. Most of the times these have been explained before in other blog posts or manuals. But that doesn’t matter — her posts are different, and they’re amazing. They’re upbeat, fun to read, and often get me excited to learn more about things I knew about but never really looked at closely before.

## I kinda feel it’s my duty to?

There’s a quote by Toni Morrison I quite enjoy:

I tell my students, ‘When you get these jobs that you have been so brilliantly trained for, just remember that your real job is that if you are free, you need to free somebody else. If you have some power, then your job is to empower somebody else. This is not just a grab-bag candy game.

I enjoy it so much I concluded my talk at RustFest Kyiv with it!

I have the privilege of having time to do things like blogging and mentoring. Given that, I feel that it really is my duty to share what I know as much as possible; to help others attempting to tread the path I’m treading; and to battle against tribal knowledge.

When it comes to programming I’m mostly “self-taught”. But when I say that, I really mean that I wasn’t taught in a traditional way by other humans — I learned things by trying stuff out and reading what others had written. I didn’t learn Rust by taking rustc and pretending to be a fuzzer and just trying random nonsense till stuff made sense, I went through the tutorial (and then started exploring by trying random stuff). I didn’t figure out cool algorithms by discovering them from first principles, I picked them up from books and blog posts. I’m “self-taught” because I’ve been in charge of my learning process, but I’ve definitely relied on the work of other people throughout this process.

This means that for me, personally, knowledge-sharing is especially important. If I had to spend time figuring something out, I should make it easier for the next people to try7.

(All this said, I probably don’t blog as much as I should)

## You should blog too!

I wish everyone wrote more. I know not everyone has the time/privilege to do this, but if you do, I urge you to start!

I feel like tips on how to blog would fill up an entire other blog post, but Julia Evans has multiple posts on this that I strongly recommend. Feel free to ask me for review on posts!

As for the technicalities of setting up a blog, my colleague Emily recently wrote a great post about doing this with Jekyll. This blog uses Octopress which is similar to set up.

Thanks to Arshia, QuietMisdreavus, and Alex for reviewing drafts of this blog post.

1. Who needs to look for unsoundness with rigorous formal verification when you have grep

2. Incidentally, I find there’s a similar dynamic when it comes to forum discussions vs hashing things out one-on-one, it’s way harder to get anywhere with forum discussions because they’re one-many and you have to put in that much more work to empathize with everyone else and also phrase things in a way that is resilient to accidental misinterpretation.

3. This is especially important as I get more and more “used” to subjects I’m familiar with – it’s easy to lose the ability to explain things when I think half of it is obvious.

4. This is probably the real reason I love rereading it — I like being verbose and would nest parentheses and footnotes if society let me

5. I also am in general less inclined to do technical things in my free time and have a better work-life balance, glad that worked out!

6. See blog title

7. One of my former title ideas for this post was “Knowledge is Theft”, riffing off of this concept, but I felt that was a bit too tongue-in-cheek.

# The Future of Clippy

Posted by Manish Goregaokar on June 05, 2018 in programming, rust

We’ve recently been making lots of progress on future plans for clippy and I thought I’d post an update.

For some background, Clippy is the linter for Rust. We have more than 250 lints, and are steadily growing.

## Clippy and Nightly

Sadly, Clippy has been nightly-only for a very long time. The reason behind this is that to perform its analyses it hooks into the compiler so that it doesn’t have to reimplement half the compiler’s info to get things like type information. But these are internal APIs and as such will never stabilize, so Clippy needs to be used with nightly Rust.

We’re hoping this will change soon! The plan is that Clippy will eventually be distributed by Rustup, so something like rustup component add clippy will get you the clippy binary.

The first steps are happening, we’re planning on setting it up so that when it compiles Rustup will be able to fetch a clippy component (however this won’t be the recommended way to use clippy until we figure out the workflow here, so sit tight!)

Eventually, clippy will probably block nightlies1; and after a bunch of cycles of letting that work itself out, hopefully clippy will be available with the stable compiler. There’s a lot of stuff that needs to be figured out, and we want to do this in a way that minimally impacts compiler development, so this may move in fits and starts.

## Lint audit

A couple months ago Oliver and I2 did a lint audit in Clippy. Previously, clippy lints were classified as simply “clippy”, “clippy_pedantic”, and “restriction”. “restriction” was for allow-by-default lints for things which are generally not a problem but may be something you specifically want to forbid based on the situation, and “pedantic” was for all the lints which were allow-by-default for other reasons.

Usually these reasons included stuff like “somewhat controversial lint”, “lint is very buggy”, or for lints which are actually exceedingly pedantic and may only be wanted by folks who very seriously prefer their code to be perfect.

We had a lot of buggy lints, and these categories weren’t as helpful. People use clippy for different reasons. Some folks only care about clippy catching bugs, whereas others want its help enforcing the general “Rust Style”.

So we came up with a better division of lints:

• Correctness (Deny): Probable bugs, e.g. calling .clone() on &&T, which clones the (Copy) reference and not the actual type
• Style (Warn): Style issues; where the fix usually doesn’t semantically change the code. For example, having a method named into_foo() that doesn’t take self by-move
• Complexity (Warn): For detecting unnecessary code complexities and helping simplify them. For example, replacing .filter(..).next() with .find(..)
• Perf (Warn): Detecting potential performance footguns, like using Box<Vec<T>> or calling .or(foo()) instead of or_else(foo).
• Pedantic (Allow): Controversial or exceedingly pedantic lints
• Nursery (Allow): For lints which are buggy or need more work
• Restriction (Allow): Lints for things which are not usually a problem, but may be something specific situations may dictate disallowing.

and applied it to the codebase. You can see the results on our lint list

Some lints could belong in more than one group, and we picked the best one in that case. Feedback welcome!

## Clippy 1.0

In the run up to making Clippy a rustup component we’d like to do a 1.0 release of Clippy. This involves an RFC, and pinning down an idea of stability.

The general plan we have right now is to have the same idea of lint stability as rustc; essentially we do not guarantee stability under #[deny(lintname)]. This is mostly fine since deny only affects the current crate (dependencies have their lints capped) so at most you’ll be forced to slap on an allow somewhere after a rustup.

With specifics, this means that we’ll never remove lints. We may recategorize them, or “deprecate” them (which makes the lint do nothing, but keeps the name around so that #[allow(lintname)] doesn’t break the build aside from emitting a warning).

We’ll also not change what individual lints do fundamentally. The kinds of changes you can expect are:

• Entirely new lints
• Fixing false positives (a lint may no longer lint in a buggy case)
• Fixing false negatives (A case where the lint should be linting but doesn’t is fixed)
• Bugfixes (When the lint panics or does something otherwise totally broken)

When fixing false negatives this will usually be fixing things that can be understood as comfortably within the scope of the lint as documented/named

I’ll be posting an RFC soonish that both contains this general plan of stability, as well as a list of the current lint categorization for folks to discuss.

Anyway, thought I’d just post a general update on everything, since stuff’s changing quickly.

There’s still time for stable or even just reliably rustuppable nightly clippy to happen but the path to it is pretty clear now!

1. As in, if clippy is broken there will not be a nightly that day. Rustfmt and RLS work this way right now AIUI.

2. Okay, mostly Oliver

# Down a Rusty Rabbit Hole

Posted by Manish Goregaokar on April 12, 2018 in programming, rust

Last week I fell down a rather interesting rabbit hole in Rust, which was basically me discovering a series of quirks of the Rust compiler/language, each one leading to the next when I asked “why?”.

It started when someone asked why autogenerated Debug impls use argument names like __arg_0 which start with a double underscore.

This happened to be my fault. The reason we used a double underscore was that while a single underscore tells rustc not to warn about a possibly-unused variable, there’s an off- by-default clippy lint that warns about variables that start with a single underscore that are used, which can be silenced with a double underscore. Now, the correct fix here is to make the lint ignore derive/macros (which I believe we did as well), but at the time we needed to add an underscore anyway so a double underscore didn’t seem worse.

Except of course, this double underscore appears in the docs. Oops.

Ideally the rustc derive infrastructure would have a way of specifying the argument name to use so that we can at least have descriptive things here, but that’s a bit more work (I’m willing to mentor this work though!). So I thought I’d fix this by at least removing the double underscore, and making the unused lint ignore #[derive()] output.

While going through the code to look for underscores I also discovered a hygiene issue. The following code throws a bunch of very weird type errors:

pub const __cmp: u8 = 1;

#[derive(PartialOrd, PartialEq)]
pub enum Foo {
A(u8), B(u8)
}


(playpen)

error[E0308]: mismatched types
--> src/main.rs:6:7
|
6 |     A(u8), B(u8)
|       ^^^ expected enum std::option::Option, found u8
|
= note: expected type std::option::Option<std::cmp::Ordering>
found type u8
.....


This is because the generated code for PartialOrd contains the following:

match foo.cmp(bar) {
Some(Ordering::Equal) => .....,
__cmp => __cmp,
}


__cmp can both be a binding to a wildcard pattern match as well as a match against a constant named __cmp, and in the presence of such a constant it resolves to the constant, causing type errors.

One way to fix this is to bind foo.cmp(bar) to some temporary variable x and use that directly in a _ => x branch.

I thought I could be clever and try cmp @ _ => cmp instead. match supports syntax where you can do foo @ <pattern>, where foo is bound to the entire matched variable. The cmp here is unambiguously a binding; it cannot be a pattern. So no conflicting with the const, problem solved!

So I made a PR for both removing the underscores and also fixing this. The change for __cmp is no longer in that PR, but you can find it here.

Except I hit a problem. With that PR, the following still breaks:

pub const cmp: u8 = 1;

#[derive(PartialOrd, PartialEq)]
pub enum Foo {
A(u8), B(u8)
}


throwing a slightly cryptic error:

error[E0530]: match bindings cannot shadow constants
--> test.rs:9:7
|
4 | pub const cmp: u8 = 1;
| ---------------------- a constant cmp is defined here
...
9 |     B(u8)
|       ^^^ cannot be named the same as a constant


You can see a reduced version of this error in the following code:

pub const cmp : u8 = 1;

fn main() {
match 1 {
cmp @ _ => ()
}
}


(playpen)

Huh. Wat. Why? cmp @ _ seems to be pretty unambiguous, what’s wrong with it shadowing a constant?

Turns out bindings cannot shadow constants at all, for a rather subtle reason:

const A: u8 = ...; // A_const
let A @ _ = ...; // A_let
match .. {
A => ...; // A_match
}


What happens here is that constants and variables occupy the same namespace. So A_let shadows A_const here, and when we attempt to match, A_match is resolved to A_let and rejected (since you can’t match against a variable), and A_match falls back to resolving as a fresh binding pattern, instead of resolving to a pattern that matches against A_const.

This is kinda weird, so we disallow shadowing constants with variables. This is rarely a problem because variables are lowercase and constants are uppercase. We could technically allow this language-wise, but it’s hard on the implementation (and irrelevant in practice) so we don’t.

So I dropped that fix. The temporary local variable approach is broken as well since you can also name a constant the same as the local variable and have a clash (so again, you need the underscores to avoid surprises).

But then I realized that we had an issue with removing the underscores from __arg_0 as well.

The following code is also broken:

pub const __arg_0: u8 = 1;

#[derive(Debug)]
struct Foo(u8);


(playpen)

error[E0308]: mismatched types
--> src/main.rs:3:10
|
3 | #[derive(Debug)]
|          ^^^^^ expected mutable reference, found u8
|
= note: expected type &mut std::fmt::Formatter<'_>
found type u8


You can see a reduced version of this error in the following code:

pub const __arg_0: u8 = 1;

fn foo(__arg_0: bool) {}

error[E0308]: mismatched types
--> src/main.rs:3:8
|
3 | fn foo(__arg_0: bool) {}
|        ^^^^^^^ expected bool, found u8


(playpen)

This breakage is not an issue with the current code because of the double underscores – there’s a very low chance someone will create a constant that is both lowercase and starts with a double underscore. But it’s a problem when I remove the underscores since that chance shoots up.

Anyway, this failure is even weirder. Why are we attempting to match against the constant in the first place? fn argument patterns1 are irrefutable, i.e. all possible values of the type should match the argument. For example, fn foo(Some(foo): Option<u8>) {} will fail to compile with “refutable pattern in function argument: None not covered”.

There’s no point trying to match against constants here; because even if we find a constant it will be rejected later. Instead, we can unambiguously resolve identifiers as new bindings, yes?

Right?

Firm in my belief, I filed an issue.

I was wrong, it’s not going to always be rejected later. With zero-sized types this can totally still work:

struct S;

const C: S = S;

fn main() {
let C = S;
}


Here because S has only one state, matching against a constant of the type is still irrefutable.

I argued that this doesn’t matter – since the type has a single value, it doesn’t matter whether we resolved to a new binding or the constant; the value and semantics are the same.

This is true.

Except.

It was at this point that my table found itself in the perplexing state of being upside-down.

This is still really fine, zero-sized-constants-with-destructors is a pretty rare thing in Rust and I don’t really see folks relying on this behavior.

However I later realized that this entire detour was pointless because even if we fix this, we end up with a way for bindings to shadow constants. Which … which we already realized isn’t allowed by the compiler till we fix some bugs.

Damn.

The actual fix to the macro stuff is to use hygenic generated variable names, which the current infrastructure supports. I plan to make a PR for this eventually.

But it was a very interesting dive into the nuances of pattern matching in Rust.

1. Yes, function arguments in Rust are patterns. You can totally do things like (a, b): (u8, u8) in function arguments (like you can do in let

# Picking Apart the Crashing iOS String

Posted by Manish Goregaokar on February 15, 2018 in programming, unicode

So there’s yet another iOS text crash, where just looking at a particular string crashes iOS. Basically, if you put this string in any system text box (and other places), it crashes that process. I’ve been testing it by copy-pasting characters into Spotlight so I don’t end up crashing my browser.

The original sequence is U+0C1C U+0C4D U+0C1E U+200C U+0C3E, which is a sequence of Telugu characters: the consonant ja (జ), a virama ( ్ ), the consonant nya (ఞ), a zero-width non-joiner, and the vowel aa ( ా).

I was pretty interested in what made this sequence “special”, and started investigating.

So first when looking into this, I thought that the <ja, virama, nya> sequence was the culprit. That sequence forms a special ligature in many Indic scripts (ज्ञ in Devanagari) which is often considered a letter of its own. However, the ligature for Telugu doesn’t seem very “special”.

Also, from some experimentation, this bug seemed to occur for any pair of Telugu consonants with a vowel, as long as the vowel is not   ై (ai). Huh.

The ZWNJ must be doing something weird, then. <consonant, virama, consonant, vowel> is a pretty common sequence in any Indic script; but ZWNJ before a vowel isn’t very useful for most scripts (except for Bengali and Oriya, but I’ll get to that).

And then I saw that there was a sequence in Bengali that also crashed.

The sequence is U+09B8 U+09CD U+09B0 U+200C U+09C1, which is the consonant “so” (স), a virama ( ্ ), the consonant “ro” (র), a ZWNJ, and vowel u (  ু).

Before we get too into this, let’s first take a little detour to learn how Indic scripts work:

## Indic scripts and consonant clusters

Indic scripts are abugidas; which means that their “letters” are consonants, which you can attach diacritics to to change the vowel. By default, consonants have a base vowel. So, for example, क is “kuh” (kə, often transcribed as “ka”), but I can change the vowel to make it के (the “ka” in “okay”) का (“kaa”, like “car”).

Usually, the default vowel is the ə sound, though not always (in Bengali it’s more of an o sound).

Because of the “default” vowel, you need a way to combine consonants. For example, if you wished to write the word “ski”, you can’t write it as स + की (sa + ki = “saki”), you must write it as स्की. What’s happened here is that the स got its vowel “killed”, and got tacked on to the की to form a consonant cluster ligature.

You can also write this as स्‌की . That little tail you see on the स is known as a “virama”; it basically means “remove this vowel”. Explicit viramas are sometimes used when there’s no easy way to form a ligature, e.g. in ङ्‌ठ because there is no simple way to ligatureify ङ into ठ. Some scripts also prefer explicit viramas, e.g. “ski” in Malayalam is written as സ്കീ, where the little crescent is the explicit virama.

In unicode, the virama character is always used to form a consonant cluster. So स्की was written as <स,  ्, क,  ी>, or <sa, virama, ka, i>. If the font supports the cluster, it will show up as a ligature, otherwise it will use an explicit virama.

For Devanagari and Bengali, usually, in a consonant cluster the first consonant is munged a bit and the second consonant stays intact. There are exceptions – sometimes they’ll form an entirely new glyph (क + ष = क्ष), and sometimes both glyphs will change (ड + ड = ड्ड, द + म = द्म, द + ब = द्ब). Those last ones should look like this in conjunct form:

## Investigating the Bengali case

Now, interestingly, unlike the Telugu crash, the Bengali crash seemed to only occur when the second consonant is র (“ro”). However, I can trigger it for any choice of the first consonant or vowel, except when the vowel is  ো (o) or  ৌ (au).

Now, র is an interesting consonant in some Indic scripts, including Devanagari. In Devanagari, it looks like र (“ra”). However, it does all kinds of things when forming a cluster. If you’re having it precede another consonant in a cluster, it forms a little feather-like stroke, like in र्क (rka). In Marathi, that stroke can also look like a tusk, as in र्‍क. As a suffix consonant, it can provide a little “extra leg”, as in क्र (kra). For letters without a vertical stroke, like ठ (tha), it does this caret-like thing, ठ्र (thra).

Basically, while most consonants retain some of their form when put inside a cluster, र does not. And a more special thing about र is that this happens even when र is the second consonant in a cluster – as I mentioned before, for most consonant clusters the second consonant stays intact. While there are exceptions, they are usually specific to the cluster; it is only र for which this happens for all clusters.

It’s similar in Bengali, র as the second consonant adds a tentacle-like thing on the existing consonant. For example, প + র (po + ro) gives প্র (pro).

But it’s not just র that does this in Bengali, the consonant “jo” does as well. প + য (po + jo) forms প্য (pjo), and the য is transformed into a wavy line called a “jophola”.

So I tried it with য — , and it turns out that the Bengali crash occurs for য as well! So the general Bengali case is <consonant, virama, র OR য, ZWNJ, vowel>, where the vowel is not  ো or  ৌ.

## Suffix-joining consonants

So we’re getting close, here. At least for Bengali, it occurs when the second consonant is such that it often combines with the first consonant without modifying its form much.

In fact, this is the case for Telugu as well! Consonant clusters in Telugu are usually formed by preserving the original consonant, and tacking the second consonant on below!

For example, the original crashy string contains the cluster జ + ఞ, which looks like జ్ఞ. The first letter isn’t really modified, but the second is.

From this, we can guess that it will also occur for Devanagari with र. Indeed it does! U+0915 U+094D U+0930 U+200C U+093E, that is, <क,  ्, र, zwnj,  ा> (< ka, virama, ra, zwnj, aa >) is one such crashing sequence.

But this isn’t really the whole story, is it? For example, the crash does occur for “kro” + zwnj + vowel in Bengali, and in “kro” (ক্র = ক + র = ko + ro) the resultant cluster involves the munging of both the prefix and suffix. But the crash doesn’t occur for द्ब or ड्ड. It seems to be specific to the letter, not the nature of the cluster.

Digging deeper, the reason is that for many fonts (presumably the ones in use), these consonants form “suffix joining consonants”1 (a term I made up) when preceded by a virama. This seems to correspond to the pstf OpenType feature, as well as vatu.

For example, the sequence virama + क gives   ्क, i.e. it renders a virama with a placeholder followed by a क.

But, for र, virama + र renders  ्र, which for me looks like this:

In fact, this is the case for the other consonants as well. For me,  ्र  ্র  ্য  ్ఞ  ్క (Devanagari virama-ra, Bengali virama-ro, Bengali virama-jo, Telugu virama-nya, Telugu virama-ka) all render as “suffix joining consonants”:

(This is true for all Telugu consonants, not just the ones listed).

An interesting bit is that the crash does not occur for <र, virama, र, zwnj, vowel>, because र-virama-र uses the prefix-joining form of the first र (र्र). The same occurs for র with itself or ৰ or য. Because the virama is “stickier” to the left in these cases, it doesn’t cause a crash. (h/t hackbunny for discovering this using a script to enumerate all cases).

Kannada also has “suffix joining consonants”, but for some reason I cannot trigger the crash with it. Ya in Gurmukhi is also suffix-joining.

## The ZWNJ

The ZWNJ is curious. The crash doesn’t happen without it, but as I mentioned before a ZWNJ before a vowel doesn’t really do anything for most Indic scripts. In Indic scripts, a ZWNJ can be used to explicitly force a virama if used after the virama (I used it to write स्‌की in this post), however that’s not how it’s being used here.

In Bengali and Oriya specifically, a ZWNJ can be used to force a different vowel form when used before a vowel (e.g. রু vs র‌ু), however this bug seems to apply to vowels for which there is only one form, and this bug also applies to other scripts where this isn’t the case anyway.

The exception vowels are interesting. They’re basically all vowels that are made up of two glyph components. Philippe Verdy points out:

And why this bug does not occur with some vowels is because these are vowels in two parts, that are first decomposed into two separate glyphs reordered in the buffer of glyphs, while other vowels do not need this prior mapping and keep their initial direct mapping from their codepoints in fonts, which means that this has to do to the way the ZWNJ looks for the glyphs of the vowels in the glyphs buffer and not in the initial codepoints buffer: there’s some desynchronization, and more probably an uninitialized data field (for the lookup made in handling ZWNJ) if no vowel decomposition was done (the same data field is correctly initialized when it is the first consonnant which takes an alternate form before a virama, like in most Indic consonnant clusters, because the a glyph buffer is created.

## Generalizing

So, ultimately, the full set of cases that cause the crash are:

Any sequence <consonant1, virama, consonant2, ZWNJ, vowel> in Devanagari, Bengali, and Telugu, where:

• consonant2 is suffix-joining (pstf/vatu) – i.e. र, র, য, ৰ, and all Telugu consonants
• consonant1 is not a reph-forming letter like र/র (or a variant, like ৰ)
• vowel does not have two glyph components, i.e. it is not   ై,   ো, or   ৌ

This leaves one question open:

Why doesn’t it apply to Kannada? Or, for that matter, Khmer, which has a similar virama-like thing called a “coeng”?

## Are these valid strings?

A recurring question I’m getting is if these strings are valid in the language, or unicode gibberish like Zalgo text. Breaking it down:

• All of the rendered glyphs are valid. The original Telugu one is the root of the word for “knowledge” (and I’ve taken to calling this bug “forbidden knowledge” for that reason).
• In Telugu and Devanagari, there is no functional use of the ZWNJ as used before a vowel. It should not be there, and one would not expect it in typical text.
• In Bengali (also Oriya), putting a ZWNJ before some vowels prevents them from ligatureifying, and this is mentioned in the Unicode spec. However, it seems rare for native speakers to use this.
• In all of these scripts, putting a ZWNJ after viramas can be used to force an explicit virama over a ligature. That is not the position ZWNJ is used here, but it gives a hint that this might have been a mistype. Doing this is also rare at least for Devanagari (and I believe for the other two scripts as well)
• Android has an explicit key for ZWNJ on its keyboards for these languages2, right next to the spacebar. iOS has this as well on the long-press of the virama key. Very easy to mistype, at least for Android.

So while the crashing strings are usually invalid, and when not, very rare, they are easy enough to mistype.

An example by @FakeUnicode was the string “For/k” (or “Foŕk”, if accents were easier to type). A slash isn’t something you’d normally type there, and the produced string is gibberish, but it’s easy enough to type by accident.

Except of course that the mistake in “For/k”/”Foŕk” is visually obvious and would be fixed; this isn’t the case for most of the crashing strings.

## Conclusion

I don’t really have one guess as to what’s going on here – I’d love to see what people think – but my current guess is that the “affinity” of the virama to the left instead of the right confuses the algorithm that handles ZWNJs after viramas into thinking the ZWNJ applies to the virama (it doesn’t, there’s a consonant in between), and this leads to some numbers not matching up and causing a buffer overflow or something. Philippe’s diagnosis of the vowel situation matches up with this.

An interesting thing is that I can cause this crash to happen more reliably in browsers by clicking on the string.

Additionally, sometimes it actually renders in spotlight for a split second before crashing; which means that either the crash isn’t deterministic, or it occurs in some process after rendering. I’m not sure what to think of either. Looking at the backtraces, the crash seems to occur in different places, so it’s likely that it’s memory corruption that gets uncovered later.

I’d love to hear if folks have further insight into this.

Update: Philippe on the Unicode mailing list has an interesting theory

Yes, I could attach a debugger to the crashing process and investigate that instead, but that’s no fun 😂

1. Philippe Verdy points out that these may be called “phala forms” at least for Bengali

2. I don’t think the Android keyboard needs this key; the keyboard seems very much a dump of “what does this unicode block let us do”, and includes things like Sindhi-specific or Kashmiri-specific characters for the Marathi keyboard as well as extremely archaic characters, whilst neglecting more common things like the eyelash reph (which doesn’t have its own code point but is a special unicode sequence; native speakers should not be expected to be aware of this sequence).