In Pursuit of Laziness

Manish Goregaokar’s blog

GC Support in Rust: API Design

Recently we (Felix, Niko, and I) have been working on getting compiler-level GC support for Rust. The plan is to provide a base set of APIs and intrinsics on which GCs can be built, without including an actual GC itself. This blog post serves as status update and a pre-pre- rfc on the designs. I’m also going to walk through the process of coming up with the current design. We’ll soon be posting more detailed design docs and discussion about some of the unresolved bits.

The motivation behind this is the same as my motivation for writing rust-gc. Firstly, it makes it possible to integrate with languages which themselves have a GC. Being able to safely pass around GCd types in Rust is very useful when writing libraries for Node, Python, or Ruby in Rust.

Secondly, some algorithms are much neater when a GC is involved. Things like persistent datastructures, for example, are easier to deal with when a GC is involved. There are ways around this requirement, but it’s nice to have the full range of options.

Rust tries to be safe without a GC, and this doesn’t change that — we envision that GCs in Rust will be rarely used except for some very specific use cases like the ones listed above.

Compiler support isn’t strictly necessary for a GC in Rust to be safe. rust-gc manages to work without compiler support (except for a #[derive()] plugin). However, there’s a lot of manual tracking of roots involved, which has a much higher cost than compiler-backed GCs. This is suboptimal — we want GC support to be as efficient as possible.

Design goals

We’re considering GCs designed as a Gc<T> object, which, like Rc<T>, can be explicitly wrapped around a value to move it to the GC heap. A pervasive GC (where every Rust object is GCd) is an explicit non-goal; if you need a GC everywhere a different language may make more sense. We’re expecting Gc<T> to be used only where needed, much like how Rc<T> is today.

We want this to work well with other Rust abstractions. Things like Vec<Gc<T>> should be completely legal, for example.

We want implementors to have total freedom in how Gc<T> is represented – they define the type, not the compiler. The compiler provides traits and intrinsics which can be used to find the GC roots. It should be possible for implementors to provide safe APIs for Gc<T>. There will be no canonical Gc<T> in the stdlib.

We are trying to support multiple GCs in a single binary. This should be a pretty niche thing to need, but it strengthens the behavior of GCs as libraries (and not magical one-time things like custom allocators). One possible use case for this is if a library internally uses a GC to run some algorithm, and this library is used by an application which uses a GC for some other reason (perhaps to talk to Node). Interacting GCs are hard to reason about, though. The current design leaves this decision up to the GC designer — while it is possible to let your GCd object contain objects managed by a different GC, this requires some explicit extra work. Interacting GCs is a very niche use case1, so if this ability isn’t something we’re adamant on supporting.

We also would like it to be safe to use trait objects with the GC. This raises some concerns which I’ll address in depth later in this post.

Core design

The core idea is to use LLVM stack maps to keep track of roots.

In a tracing GC, the concept of a “root” is basically something which can be directly reached without going through other GC objects. In our case they will be cases of Gc<T> ending up on the stack or in non-gc heap boxes which themselves are reachable from the stack. Some examples:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
struct Foo {
    bar: Gc<Bar>,
}

struct Bar {
    inner: Gc<bool>,
}

// `bar` is a root
let bar = Gc::new(Bar::new());
// `bar.inner` is not a root, since it can't be
// accessed without going through `bar`

// `foo.bar` is a root:
let foo = Foo::new(); // This is a root


// `inner` is not a root, because it is a borrowed reference
let inner = &bar.inner;

// `rooted_bool` is a root, since it is a `Gc<bool>` on the stack
// (cloning has the same behavior as that on `Rc<T>`: it creates a
// new reference to the same value)
let rooted_bool = bar.inner.clone();

// `boxed_bar` is a root. While the Gc<Bar> is not on the stack,
// it can be reached without dereferencing another `Gc<T>`
// or passing through a borrowed reference
let boxed_bar = Box::new(Gc::new(Bar::new()));

When figuring out which objects are live (“tracing”), we need to have this initial set of “roots” which contain the list of things directly reachable from the stack. From here, the GC can rifle through the fields and subfields of the roots till it finds other GCd objects, which it can mark as live and continue the process with.

Most runtimes for GCd languages have efficient ways of obtaining this list of roots. Contrast this with conservative collectors like Boehm, which read in the whole stack and consider anything which looks like a pointer to the GC heap to be a root. rust-gc’s approach is inefficient too; because it incurs an additional reference counting cost on copying and mutation.

However, the list of current roots is known at compile time; it’s just a matter of which variables are live at any point. We store this list of live variables in a per-call-site “stack map”. To find all the roots, you walk up the call stack, and for each call site look up its entry in the stack map, which will contain the stack offsets of all the roots (and other metadata if we need it). LLVM has native support for this. The stack map is stored in a separate section so there is no runtime performance hit during regular execution, however some optimizations may be inhibited by turning on GC.

So basically a GC will have access to a walk_roots<F>(f: F) where F: FnMut(..) intrinsic that will yield all the roots to the provided function (which can then mark them as such and start tracing).

I’m not going to focus on the implementation of this intrinsic for this blog post — this might be the subject of a later blog post by Felix who is working on this.

Instead, I’m focusing on the higher-level API.

Identifying rootables

The first problem we come across with the design mentioned above is that the compiler doesn’t yet know how to distinguish between a root and a non-root. We can’t mark every variable as a root; that would bloat the stack maps and make walking the roots a very expensive operation.

A very simple way of doing this is via a trait, Root.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// in libcore

unsafe trait Root {}

// auto-trait, anything containing
// a Root will itself be Root
unsafe impl !Root for .. {}

// references are never roots
unsafe impl<'a, T> !Root for &'a T {}


// in a gc impl

struct Gc<T> {
    // ..
}

unsafe impl<T> Root for Gc<T> {}

if we detect Root objects that are directly reachable, we consider them to be roots.

This has a flaw, it doesn’t actually tell us how to find roots inside container types. What would we do if there was a Box<Gc<T>> or a Vec<Gc<T>> on the stack? We can stick their entry in the stack map, but the GC needs to know what to do with them!

We could store some type information in the map and let the GC hardcode how to root each container type. This isn’t extensible though; the GC will have to be able to handle types from arbitrary crates too. Additionally, we have to solve this problem anyway for tracing — when tracing we need to be able to find all values “contained” within a particular value, which is the same operation we need to do to find roots.

For this purpose, we introduce the Trace trait:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// in libcore
unsafe trait Trace {
    fn trace(&self);
}

// in libcollections
// (or any third-party collections library)
unsafe impl<T: Trace> Trace for Vec<T> {
    fn trace(&self) {
        for i in self {
            i.trace()
        }
    }
}

// in gc library

// only allow trace objects
struct Gc<T: Trace> {
    // ..
}

unsafe impl<T> Trace for Gc<T> {
    fn trace(&self) {
        // mark `self`

        // Don't actually trace contained fields,
        // because there may be cycles and we'd recurse infinitely
    }
}

// in consumer of gc library

// autoderived impl will call `bar.trace()` and `baz.trace()`
#[derive(Trace)]
struct Foo {
    bar: Gc<Bar>,
    baz: SomeType,
}

(These traits are unsafe to implement because an incorrect implementation can lead to a reachable value getting cleaned up by the GC, which is unsafe)

Basically, an implementation of Trace will yield all values owned by the object, unless that object is a GC struct like Gc<T>, in which case the GC implementor will have it mark the object. This way, calling .trace() will walk all fields and subfields of an object recursively, until it finds all of the contained Gc<T>s.

This has an issue with multiple GCs, though — we don’t want the GCs to interact unless they want to, and with the Trace trait being shared one GC object may accidentally contain a different GC object.

We need to introduce the concept of a tracer here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// in libcore

trait Tracer : Any {}

unsafe trait Trace {
    fn trace(&self, tracer: &mut Tracer);
}


// in libcollections

// impl doesn't care about the tracer
unsafe impl<T: Trace> Trace for Vec<T> {
    fn trace(&self, tracer: &mut Tracer) {
        for i in self {
            i.trace(tracer)
        }
    }
}

// in gc library

struct MyTracer {} // more complicated tracers may have state
impl Tracer for MyTracer {}


struct Gc<T: Trace> {
    // ..
}

unsafe impl<T> Trace for Gc<T> {
    fn trace(&self, tracer: &mut Tracer) {
        if let Some(tracer) = tracer.downcast_mut::<MyTracer>() {
            // mark self
        } else {
            panic("Don't mix GCs!");
            // If you want to support multiple GCs interacting with each other,
            // you can let this else block trace the contents.
            // Beware, interacting GCs have subtle safety issues.
        }
    }
}

This also makes it easier to distinguish between rooting and tracing operations. While the operations are similar (“to root/trace a value, walk its fields recursively till you find all of the Gcs, and root/mark those”), the code we run at the leaf Gc<T> nodes is different. In the previous model, this could have been solved with a global static boolean that identifies if the code is currently walking roots or tracing, but with the Tracer trait object we can just pass in different tracer values.

We’re not yet sure if we should be lumping root walking and tracing in a single trait; so we might end up with a second Scan trait that works similarly.

Note that we’re not getting rid of the Root trait here. This is because Root and Trace have slightly incompatible purposes – Root signals to the compiler if something definitely contains roots, whereas Trace marks things which are safe to put inside a GC. bool is Trace, but not Root. Vec<Gc<T>> is Trace and Root, Vec<bool> is Trace but not Root. &T and &mut T are neither. Trace will actually show up in trait bounds for GC code. Root will only be analysed by the compiler itself, bounds like R: Root probably won’t show up.

There should not be any types which are Root but not Trace, because this means the compiler won’t know what to do with them!

Now, when generating the stack map, we include the stack offset of all Root objects in scope, as well as appropriate dynamic dispatch vtable pointers for the Trace implementation2. Walking the stack involves calling the trace method on each entry in the stack map for each call site.

Unresolved problems

There are a lot of these. Suggestions very welcome.

Trait objects

Trait objects provide an interesting challenge. They may or may not contain roots, but what’s more important is that trait objects in libraries that know nothing about GC may also contain roots.

For example, if a library is dealing with a Box<SomeTrait>, and your code feeds it a Box<SomeRoot as SomeTrait>, the trait object is now a root. If a gc is triggered while in this call (perhaps by a callback), then this trait object should be counted as a root.

But this library didn’t depend on the GC, and when it was compiled, it wasn’t compiled with stack map entries for this GC object.

There are two solutions here. The first is to recompile everything (including libstd) from scratch with GC support on, and put all owned trait objects in the stack maps. They will have an extra generated trace entry in the vtable that will ignore the object if it isn’t a root. To put trait objects inside Gc<T>, you will have to explicitly use Box<Trait+Trace>, however – this magical trace entry is just for collecting roots.

The second solution is to simply not allow casting Root objects to owned trait objects. I feel that there are use cases for both – the former has extra bloat and requires a custom libstd (which could be distributed via rustup if necessary), but the latter restricts how you use trait objects. Servo, for example, would probably prefer the latter since we don’t put our DOM objects in owned trait objects. But other GC users may want maximum flexibility. Letting people choose this via a codegen flag (which can be controlled via cargo) might be a good idea.

Should it be Trace<T>?

There is a dynamic dispatch cost on rooting/tracing any Gc<T> leaf with the tracer model.

This can be obviated by having it be:

1
2
3
trait Trace<T: Tracer> {
    fn trace(&self, tracer: &mut T)
}

Most types would implement Trace<T>, and GCs can implement Trace<SpecificTracer>, and only require their contents to be Trace<SpecificTracer>. This lets the type system forbid interacting GCs instead of having it done at runtime.

This has multiple downsides, however:

  • #[derive(Trace)] becomes #[derive(Trace<MyTracer>)] for things containing Gc<T> (because Gc<T> is not Trace<T> for all T, and macro expansion runs before this information can be computed).
  • If there are multiple GCs, there are multiple Trace<T> vtable pointers in the stack map. Not all libs know about the other GC when being compiled, so you need to defer generation of these stack map entries somehow.
  • The heuristics for forbidding types which are Root but not Trace<T> become subtler. You have to effectively forbid types which are Root but do not have an impl of Trace<T> for at least one tracer T that is active in the compilation.

Non-Trace collections on the stack

If something like the following, defined by a third-party library:

1
2
3
4
struct Foo<T> {
    x: T,
    // ...
}

doesn’t implement Trace, it’s still okay to use Foo<RootedThing> on the stack, because we can figure out that the inner T is what we need to root.

However, if a third-party MyVec<T> (which behaves like a vector) contains RootedThings, and is on the stack, the compiler doesn’t know what do do with it. Lack of a Trace bound makes it impossible to put such types on the GC heap, but there’s no restriction on putting these types on the stack. As I mentioned before, we can simply forbid the existence of types which are Root but not Trace (MyVec<RootedThing> is Root). This is already done with Copy and Drop.

There’s a subtle difference between this and the Copy/Drop forbidding. Copy and Drop are always explicitly implemented. On the other hand, Root is an auto trait and automatically implements itself on types containing roots. This means that we can’t necessarily forbid such types being created at impl time — third party collections like above for example won’t contain Root types until they are monomorphised. We can error during monomorphization, but this error might not be very user-friendly, like template errors in C++.

Another solution is to make Root into ?Root, much like ?Sized. This means that the writers of collections will explicitly opt in to allowing GCd things inside them. This probably would lead to a lot more churn, however. But the diagnostics would be clearer.

Turns out that this actually works with half-decent diagnostics. This doesn’t forbid the existence of types which impl Root but not Trace, however. It simply avoids autoderiving Root on types which aren’t Trace. But this behavior can be changed. (In fact, it was changed while this post was being written!)

It becomes more complicated with Trace though. Having Root<T> might fix this, but then you have to deal with the auto trait generics.

One solution for the auto trait generics is to simple not include Root in the stdlib. Instead, require code like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// in gc library

struct MyTracer {/* .. */}

struct Gc<T: Trace<MyTracer>> {
    // ...
}

#[gc_root_trait]
unsafe trait MyRoot: Trace<MyTracer> {}

unsafe impl !MyRoot for .. {}

unsafe impl<T: Trace<MyTracer>> MyRoot for Gc<T> {}

This can be further simplified by completely removing the rooting trait requirement and instead require #[gc(tracer=MyTracer)] on all GC structs. This, however, is a bit more special and we lose the free diagnostics that you get from utilizing the type system.

Are Root-containing raw pointers Root?

For the auto-trait to work, types like Vec<ContainsRoot> should also be marked as Root.

This can be done by just marking *const T and *mut T as Root if T is Root using an impl in libcore. However, borrowed types like Iter will also be dragged into this. We only want types which own Root things to be considered roots.

The alternative is to not require this, and solely rely on PhantomData. Vec<T> also contains a PhantomData<T>, which gives the compiler a hint that it owns a T. On the other hand, Iter<'a, T> contains a PhantomData<&'a T>, which hints that it borrows a T. This is already used by the compiler to determine drop soundness, so we can just use the same thing to determine Root types. This is already supported by the autotrait infrastructure.

A downside here is that we’re relying more on producers of unsafe code remembering to use PhantomData. I’m not 100% certain about this, but triggering dropck unsoundness by neglecting PhantomData is still pretty hard (and often requires types like arenas), whereas forgetting a root can very easily cause a GC segfault. I do not consider this to be a major downside.

Finalizers and Drop

The following code is unsafe:

1
2
3
4
5
6
7
8
9
10
11
12
13
struct Foo {
    bar: Gc<Bar>,
    baz: Gc<Baz> // baz can contain a Bar
}
impl Drop for Foo {
    fn drop(&mut self) {
        println!("{:?}", *bar);
        // or
        *self.baz.bar.borrow_mut() = bar.clone();
    }
}

// Foo itself is used as a `Gc<Foo>`

The problem is that destructors run in the sweep cycle of a GC, in some order. This means that bar may have already been collected when Foo’s destructor runs. While in many cases this can be solved with a smart collection alrogithm, in the case where there’s a cycle being collected there’s nowhere safe to start.

Additionally, further mutation of the graph after collection may extend the lifetime of a to-be- collected variable.

A simple solution is to forbid all GC accesses during the collection phase. However, this means dereferences too, and this will incur a cost on all GCd types – they stop being simple pointer accesses. This solution places the burden on the GC implementor, instead of the compiler.

We have enough information in the type system to solve this – we can forbid Drop impls on types which are explicitly Root. But it turns out that this isn’t enough. Consider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
trait MyTrait {
    fn do_the_thing(&self);
}
struct HijackableType<T: MyTrait> {
    x: T,
}

impl<T> Drop for HijackableType<T> {
    fn drop(&mut self) {
        self.x.do_the_thing();
    }
}

// in other library

struct Bar {
    inner: Gc<Baz>
}

impl MyTrait for Bar {
    fn do_the_thing(&self) {
        println!("{:?}", *self.inner)
    }
}

Foo<Bar> now has an unsafe destructor. Stopping this behavior requres forbidding Drop impls on structs with trait bounds, but that is too restrictive.

This may end up having a similar solution to the “all roots must be Trace” issue. Warning on monomorphizations isn’t enough, we need to be able to allow Vec<ContainsRoot>, but not HijackableType<ContainsRoot>. Making this distinction without poisoning half the generics out there is tricky.

The notion of a hijackable type is actually already important for sound generic drop impls, see RFC 1327 (dropck eyepatch), RFC 1238 (nonparametrick dropck), and their predecessor, RFC 0769 (sound generic drop). We might be able to rely on this, but would need to introduce additional constraints in dropck.

Fortunately, there is always the fallback solution of requiring the implementor to enforce this constraint at runtime.


  1. Firefox does have a garbage collector and a cycle collector which interact, though, so it’s not something which is unthinkable.

  2. If there is an active stack drop flag for the value, that will need to be included too.

Fun Crypto Problem: Designing an Anonymous Reputation System

One of the reasons I like crypto is that it’s a gold mine of interesting problems which feel like they are impossible to solve and if a solution exists, it must be magic.

The other day, I came across one such problem here, by @SarahJamieLewis

Is there a scheme where A can give reputation points to B, & later, B presenting as C can prove their reputation (without revealing A or B)?

(I recommend trying to solve this yourself before reading ahead)

The problem isn’t completely defined because we don’t know how “reputation” is supposed to work. A simple model is to think of it as currency, and use Bitcoin as a proxy. Of course, a real reputation system probably would be different from currency. There might only be a small set of authorized reputation “sellers” (this can still be built on top of Bitcoin, or you can use a system similar to the CA system for TLS certificates). Or there might be a system in which each person can vote for another person at most once (though this needs to be designed in a way that is resilient to sybil attacks).

Let us assume that there is a ledger out there, where each ledger entry is a certificate saying that entity X has given one reputation point to entity Y. A public key is included, where the private key is only known to Y. This model cleanly applies to both Bitcoin and CA systems – in Bitcoin, the transaction is the “certificate”, and in the CA system the certificate is the certificate.

For additional anonymity, you can have a different private key for each certificate. I’m going to assume this is the case, though the solutions don’t change much if it isn’t.

Solution via ZKP

I’m very fond of the concept of a zero-knowledge proof, and when you have a hammer everything looks like a nail.

So my first solution was one involving zero-knowledge proofs.

Construct the problem “Given the certificates in this ledger and X private keys, prove that these private keys each have one certificate they correspond to, and that the keys are distinct”.

In this problem, the certificates (public keys) are hardcoded, whereas the private keys are inputs. This sort of algorithm can be written as a sequential logic circuit, assuming that the method of signing can be. We can then perform a zero-knowledge proof of this problem using the ZKP for general execution outlined here. The prover inserts their private keys into the algorithm, run the algorithm, and prove that the execution was faithful and had an output of true using the ZKP.

Since the ZKP doesn’t leak any information about its inputs, it doesn’t leak which certificates were the ones for which the prover had private keys, so it doesn’t leak the identities of A or B.

However, this is overkill. The general ZKP gets very expensive as the size of the algorithm, and since the ledger was hardcoded in it, this ZKP will probably take a while (or a lot of computational power) to execute. One can perform it with a subset of the ledger picked by the prover, but repeating the process may slowly reveal the identity of the prover via the intersection of these subsets.

Solution via secret-sharing

(This solution is technically a ZKP too, but it doesn’t use the “general” ZKP algorithm which while expensive can be used for any combinatorical verification algorithm)

Once I’d gotten the “use a ZKP!” solution out of my system, I thought about it more and realized that the problem is very close to a secret-sharing one.

Secret-sharing is when you want to have a cryptographic “lock” (a shared secret) which can only be revealed/opened when the requisite quorum of (any) X keys out of N total keys is used.

Shamir’s secret sharing is a nice algorithm using polynomials that lets you do this.

In this situation, we want to prove that we have X private keys out of N total certificates in the ledger.

The verifier (Victor) can construct a secret sharing problem with a single secret and N secret- sharing-keys (in the case of Shamir, these would be N x,y-coordinate pairs). Each such key is paired with a certificate, and is encrypted with the corresponding public key of that certificate.

The prover (Peggy) is given all of these encrypted secret-sharing keys, as well as the certificates they correspond to.

If Peggy legitimately has X reputation, she has the X private keys necessary to obtain X of the secret sharing keys by decrypting them. From this, she can obtain the secret. By showing the secret to Victor, she has proven that she has at least X private keys corresponding to certificates in the ledger, and thus has at least X reputation. In the process, which certificates were involved is not revealed (so both the reputation-giver and reputation-receiver) stay anonymous.

Or was it?

Victor can construct a malicious secret sharing problem. Such a problem would basically reveal a different secret depending on the secret-sharing-keys Peggy uses. For example, in Shamir’s secret sharing, Victor can just give N random coordinates. X of those coordinates will always create a degree-X curve, but the curves obtained from different sets of X coordinates will probably have a different constant term (and thus a different secret).

The secret-sharing problem needs to be transmitted in a way that makes it possible for Peggy to verify that it’s not malicious.

One way to do it is to make it possible to uncover all the secret-sharing-keys, but only after the secret has been found. In Shamir’s algorithm, this can be done by pre-revealing the x coordinates and only encrypting the y coordinates. Once Peggy has found the secret, she has the entire polynomial curve, and can input the remaining x coordinates into the curve to find the remaining secret sharing keys (and then verify that they have been encrypted properly).

This is almost perfect. User “otus” on Crypto Stack Exchange pointed out my mistake.

The problem with this scheme (and the previous one to a lesser degree) is that Peggy could simply brute-force the values of the y coordinates beforehand.

This can be solved by using nonces. Instead of encrypting each y-coordinate, Victor encrypts each y-coordinate, plus a nonce. So, instead of encrypting the y-coordinate “42”, a string like “da72ke8lv0q-42” will be encrypted.

On decryption, it is easy to extract the coordinate from the plaintext (presumably the scheme used to add the nonce would be decided upon beforehand). However, we can’t brute-force for the plaintext anymore, because the ciphertext isn’t the encryption of a low-entropy value like a regular, smallish number, it’s the encryption of a relatively high-entropy value.

So far, this prevents brute forcing, but it also prevents Peggy from verifying that the secret- sharing problem was non-malicious, since she doesn’t know the nonces. Nor can these be pre-shared with her, since she can just use them to brute force again.

The solution here is for Victor to use the shared secret as a symmetric key, encrypt all of the nonces with it, and share them with Peggy. Until Peggy knows this key, she cannot use the nonces to brute force. Once she knows this key, she can decrypt the values for the nonces and use them to verify that the nonces are correct.

This is exactly the property we need. If Peggy doesn’t have enough private keys (reputation points), she won’t have the secret and can’t prove her reputation to Victor. Once Peggy does have the quorum of keys, she will know the symmetric key, be able to decrypt the nonces, and use these nonces to verify that the other N-X ciphertexts fall on the curve which she has obtained. Once she has verified this, she can present the shared secret/symmetric key to Victor, who will know that she had enough keys to crack the secret sharing problem and thus has at least X reputation.


This was quite an entertaining problem to solve (and it got me thinking about ZKPs again, which made me write my previous post). Thanks, Sarah!

Got an alternate solution (or other similar fun problems)? Let me know!

Interactive Sudoku Zero-knowledge Proof

Back in March I was particularly interested in Zero-Knowledge Proofs. At the time, I wrote a long blog post introducing them and explaining how the ZKP for generic execution works.

I was really enjoying learning about them, so I decided to do a presentation on them in my crypto course. Sadly there wasn’t going to be time for explaining the structure of the proof for general execution, but I could present something more fun: Sudoku.

Sudoku solutions can be proven via ZKP. That is to say, if Peggy has a solution to Victor’s Sudoku problem, she can prove that she has a valid solution without ever revealing any information about her solution to Victor (aside from the fact that it is valid).

To make the ZKP easier to explain, I wrote an interactive version of it.

I planned to write about it then, but completely forgot till now. Oops.

I’m first going to explain how the ZKP is carried out before I explain how the interactive verifier works. If you aren’t familiar with ZKPs, you might want to read my previous post on the subject up to and including the part about proving graph colorings.

Proving Sudoku

This proof is going to be carried out very similarly to the graph coloring proof. Indeed, Sudoku can be reduced to a graph coloring problem, though that’s not how we’re going to obtain the ZKP.

Victor has a Sudoku problem:

Peggy has a solution:

In order to not leak information about her solution, Peggy permutes it:

Basically, there is a 1-1 mapping between the old digits and the new ones. In this specific permutation, all 3s are replaced by 4s, all 1s by 5s, etc.

She now commits to this permutation by committing to every individual cell. A random nonce is obtained for each cell, and the contents of that cell are hashed along with the nonce. This is the same commitment procedure used in the graph coloring ZKP.

These commitments are now sent over to Victor.

Victor ponders for a bit, and demands that Peggy reveal the third row of the sudoku square.

(Note that this is the non-permuted problem statement)

This row is marked in orange. There are some additional elements marked in green, which I shall get to shortly.

Peggy reveals the permuted values for this row:

Victor can now verify that all digits 1-9 appear within this permuted row, and that they match the commitments. This means that they appear in the original solution too (since permutation doesn’t change this fact), and, at least for this row, the solution is correct. If Peggy didn’t have a solution, there was a chance she’d be caught in this round if Victor had asked for the right set of 9 squares to be revealed.

The procedure can be repeated (with a new permutation each time) to minimize this chance, with Victor asking to reveal a row, column, or 3x3 subsquare each time, until he is certain that Peggy has a solution.

But wait! This only works towards proving that Peggy has a valid Sudoku solution, not that this is the solution to Victor’s specific problem. Victor only verified that each row/column/subsquare had no duplicates, a property which is true for all sudoku solutions!

This is where the green squares come in. For any given set of “orange squares” (a row, column, or 3x3 subsquare), we take the “preset” digits appearing in the problem statement (In this case: 7, 8, and 6) in that set of squares. All other instances of those digits preset in the problem statement form the set of “green squares”:

Peggy reveals the permuted values for both the green and orange squares each time:

In addition to verifying that there are no duplicates in the orange squares, Victor additionally verifies that the permutation is consistent. For example, the 7th element in that row is a 6, which is already preset in the problem statement. There are two other 6s in the problem statement, one in the 5th row 8th column, and one in the 7th row 1st column. If the permutation is consistent, their corresponding squares in the revealed portion of the permuted solution should all have the same digit. In this case, that number is 1. Similarly, the 5th element in that row is a preset 8, and there’s a corresponding green square in the 5th row last column that also has an 8. In the permuted solution, Victor verifies that they both have the same digit, in this case 7.

This lets Victor ensure that Peggy has a solution to his sudoku problem. The fact that two given squares must share the same digit is invariant under permutations, so this can be safely verified. In fact, a sudoku problem is really just a problem saying “Fill these 81 squares with 9 symbols such that there are no duplicates in any row/column/subsquare, and these three squares have the same symbol in them, and these five squares have the same symbol in them, and …”. So that’s all we verify: There should be no duplicates, and the digits in certain sets of squares should be the same.

Note that revealing the green squares doesn’t reveal additional information about Peggy’s solution. Assuming Peggy’s solution is correct, from comparing the problem statement with the revealed/permuted values, Victor already knows that in the permutation, 7 has become 6, 8 has become 7, and 6 has become 1. So he already knows what the other preset green squares contain, he is just verifying them.

We cannot reveal anything more than the green squares, since that would reveal additional information about the permutation and thus the solution.

Edit: This actually still isn’t enough, which was pointed out to me by “dooglius” here. Basically, if the sudoku problem has two digits which only appear once each, there is nothing that can stop Peggy from coming up with a solution where these two digits have been changed to something else (since they’ll never be in a green square). Fixing this is easy, we allow Victor to ask Peggy to reveal just the permuted values of the presets (without simultaneously revealing a row/column/subsquare). Victor can then verify that the preset-permutation mapping is consistent (all presets of the same value map to the same permutation) and 1-1.

This check actually obviates the need of the green squares entirely. As long as there is a chance that Victor will ask for the presets to be revealed instead of a row/column/subsquare, Peggy cannot try to trick Victor with the solution of a different sudoku problem without the risk of getting caught when Victor asks for the presets to be revealed. However, the green squares leak no information, so there’s no problem in keeping them as a part of the ZKP as a way to reduce the chances of Peggy duping Victor.

The interactive verifier

Visit the interactive verifier. There’s a sudoku square at the top which you can fill with a problem, and you can fill the solution in on the first square on the Prover side – fill this in and click Start. Since I know nobody’s going to actually do that, there’s a “Fill with known problem/solution” that does this for you.

Once you’ve initiated the process, the ball is in the Prover’s court. The Prover must first permute the solution by clicking the Permute button. You can edit the permutation if you like (to introduce a flaw), or manually do this after clicking the button.

Once you’ve clicked the button, generate nonces by clicking the next one, “Populate Nonces”. These, too can be edited. You can generate hashes (which can also be edited) by clicking the next button, and after that send the hashes (commitments) over to the Verifier’s side.

The ball is now in the Verifier’s court. As you can see, there’s a set of hashes on the Verifier’s side. The Verifier only knows the problem statement and whatever is visible on their side of the screen, and nothing more.

You, acting on behalf of the Verifier, can now select a row/column/subsquare/preset using the dropdown and text box on the Verifier. As you select, the orange/green squares that are going to be revealed will be shown. When satisfied with your choice, click “Reveal”, and the Prover will populate your squares with the permuted values and nonces. “Verify” will verify that:

  • The appropriate elements and hashes are revealed
  • The hash is equal to SHA256(nonce + "-" + digit)
  • The orange squares contain distinct digits.
  • The green squares contain digits that match with the orange squares they correspond to from the problem solution

Once you click verify, it will show the probability of correctness (this isn’t an exact value, it’s calculated using an approximate formula that doesn’t depend on the problem statement), and the ball moves back into Peggy’s court, who can permute her solution again and continue. The probability slowly increases each round.

Doing this manually till it reaches 99% is boring, so there’s a button at the top (“Run automatically”) which can be clicked to run it for a given number of rounds, at any stage in the process once started. If you tamper with one of the values in the permuted solution, and run it for ~20 runs, it usually gets caught.

Have fun!

Starting at Mozilla

I got a job!

I’m now working at Mozilla as a Research Engineer, on Servo.

I started two weeks ago, and so far I’m really enjoying it! I feel quite lucky to get to work on an open source project; with an amazing and helpful team. Getting to do most of my work in Rust is great, too :)

So far I’ve been working on the network stack (specifically, “making fetch happen”), and I’ll probably be spending time on DOM things as well.

Really excited to see how this goes!

Exploring Zero-Knowledge Proofs

Follow up article to this one here

So recently I read this article about how the Bitcoin community had executed what’s known as a “Zero-Knowledge Contingent Payment”; a pretty neat concept.

What was really interesting for me was the (simplified) underlying algorithm for generic zero knowledge proofs. It took me a while (and some questions asked to helpful folks on the Internet) to understand it fully, but the concept is quite intriguing and sounds rather magical. I thought I’d explain it here in an accessible way, both so that others can get it and to improve my own understanding.

I intend this article to be read by people with a programming or mathematical background1, who have some understanding of what logic gates are. Please let me know if you feel that something is inadequately (or wrongly) explained.

So what is a zero knowledge proof?

Let’s say Alice has a problem she wants to solve. It could be a mathematical problem, like factorizing a large number, or coloring a map (or graph!) with only three colors, or solving a Sudoku puzzle. Anything where you can write a program to verify the solution.

She doesn’t have the resources to solve the problem herself, but wants to buy a solution from someone else. She makes the problem public, and Bob says he has a solution.

However, Alice is skeptical. She’s not sure if Bob is telling the truth here, and would like some evidence that he does indeed have a solution. At the same time, Bob is not willing to share his solution with Alice without getting paid. They also don’t want to involve a third party; let’s say this is a rather Important Sudoku Puzzle that affects National Security ¯\_(ツ)_/¯.

What Alice and Bob need here is a way for Bob to prove to Alice that he has the solution, without sharing the solution, and without involving a third party2.

It turns out that this is totally possible (magical, right!). There’s a quick example on Wikipedia of a simple proof of a non-mathematical fact – whether or not someone has a key to a particular door.

For proving more complicated problems, we have to digress into some basic crypto first

Interlude: Hashes and commitments

Feel free to skip this if you know what a hash function and a nonce is.

In cryptography, there’s something called a “hash function”. In essence it’s an “irreversible” function whose output is known as a “hash”, with the following three properties:

  • It’s not computationally intensive to calculate the hash of an input
  • Given a hash, it’s a computationally hard problem to calculate an input to the hash function that results in this hash, usually involving brute force
  • It’s also a computationally hard problem, given an input and a hash, to find a different input (especially a different input that is similar to the first one) that produces the same hash.

Note that multiple values may result in the same hash.

The result of this is basically that hashes are hard to forge. If Bob shares a hash Y = H(X) with Alice, where X is some secret data and H is a hash function, if Bob reveals X at some later point, by checking that Y = H(X), Alice can be reasonably certain that the value shared by Bob was indeed the original input to the hash function and not tampered with in a way that the same hash was produced. Similarly, Bob can be certain that knowing only Y, Alice cannot reverse-engineer X since the hash function is “irreversible”.

This brings us to the concept of a commitment. Hashes can be used as described above to “commit” to a value. If Bob decides on a number X, and makes its hash Y public, he has committed to this value without revealing it. When he does decide to reveal it, he is forced to reveal X and not some modified bogus value, thus making the “commitment” binding.

Some of you may have noticed a flaw here: It’s hard to commit to small numbers, or things that come from a restricted set. If Bob wishes to commit to the number 5 (without revealing it), or the color red (out of a set of three colors), Alice can just try H(0) to H(9) or H(red), H(green), H(blue) and find out which one matches. After all, hashes aren’t supposed to be resilient to brute force attacks, and brute force attacks become very easy when the set of inputs is tiny.

A solution to this is to use a nonce (also known as a “trapdoor”). Bob commits to 5 by hashing the string 5-vektvzkjyfdqtnwry, where vektvzkjyfdqtnwry is a random value he selected, known as a “nonce”. When Bob wishes to reveal the value, he just reveals 5-vektvzkjyfdqtnwry and Alice is convinced that the original value committed to was indeed 5. Of course, this requires some agreement on the format of the nonce; in this case the nonce is just “everything after the dash”. Note that the nonce is private, and only revealed when Bob wishes to reveal the committed number.

Note that each new commitment should use a new nonce. Otherwise, information can be leaked; for example if Bob needs to commit to three numbers (say, 2, 5, 2) in a way that they can be individually revealed, he shouldn’t compute the hashes for 2-vektvzkjyfdqtnwry, 5-vektvzkjyfdqtnwry, 2-vektvzkjyfdqtnwry, since the first and last hashes will be equal and Alice will know that the committed values behind them are probably the same too (something which you may not wish to reveal).

Another issue that can turn up is a “rainbow table”, where one party comes into the game with a precomputed table of hashes of all strings up till a certain number of characters. One solution for this is to increase the nonce size, however since Bob decides the nonces it’s possible for him to smartly select them if he’s the one with a table. The solution here is to use a “salt”, which is a large random string combined with the committed value and hash. Bob and Alice could, for example, mutually decide on a salt of asdcjyxeafxjvikfzmnyfqsehsxwxsfywbreb, and when Bob wishes to commit to the number 5, he hashes asdcjyxeafxjvikfzmnyfqsehsxwxsfywbreb-5-vektvzkjyfdqtnwry. Note that salts work similar to nonces here, however the salt is known publically (you can model it as a last-minute modification of the agreed-upon hash function H, since H'(X) = H(add_salt(X))). In some cases, you may also want a per-instance salt, which is mutually decided every time Bob wants to compute a hash.

Hashes are a useful building block for many things; they’re a key component in password security, as well as being part of all kinds of cryptographics protocols. In this post we’ll mainly focus on their ability to be used as a unbreakable commitment.

Back to your regularly scheduled blog post.

Coloring graphs

The classic example of zero knowledge proofs is graph coloring. I’ll run through a quick explanation, though it’s explained beautifully here too.

Let’s say Alice has a graph:

No, not that kind, Alice. The other graph.

She wants it colored such that no two adjacent nodes share a color. This is an NP-complete problem (so it can take up a lot of computational resources to solve). Of course, this graph is small and easy to color, but that’s just for the sake of this blog post.

Bob, using his trusty Crayola™ 3-crayon set3, has managed to come up with a valid coloring:

He wishes to prove that he has this to Alice, without revealing it or involving a third party. Probably for National Security Reasons. Something something Nicolas Cage.

Bob and Alice meet, and Alice gives him a large piece of paper with the (uncolored) graph drawn on it.

Bob goes into a private room, and colors it. He also covers each graph node with a hat. Alice now enters the room.

Alice chooses an adjacent pair of nodes. Let’s say she chooses 1 and 2. Bob removes those two hats (since Alice is watching him, he has no chance to tamper with the colorings underneath the hats before revealing them). Now, Alice knows the colors of nodes 1 and 2:

This lets her verify that nodes 1 and 2 had different colorings in the graph Bob drew.

Note that this doesn’t actually tell her anything about Bob’s coloring aside from the increased probability of correctness. The colors can always be permuted, so any valid coloring would give the same answer here if the colors were permuted so that 1 is red and 2 is blue. This is important; we don’t want to leak information about Bob’s solution aside from the fact that it is correct.

Nor is this information enough to verify correctness. Bob could have equally drawn a wrong coloring.

(clearly someone wasn’t paying attention in kindergarten)

Since Alice only looked at nodes 1 and 2, she didn’t see anything wrong with the graph. But if she had by chance picked nodes 3 and 4, Bob’s deception would have been revealed.

So she only has 14% (1/7) certainity4 that Bob’s graph is correct.

However, we can run this experiment again. Bob can permute the colors, draw on a fresh copy of the graph, and ask Alice to choose another pair of adjacent nodes. She can check this, and the probability of correctness will increase to around 27% (1 - (6/7)*(6/7)).

Since Bob has permuted the colors, Alice cannot use the information from the previous round to glean any information about Bob’s solution in this round. Of course, Bob is free to produce a completely different coloring (one that is not a permutation), with a different flaw this time. Regardless of where the flaw is, Alice still has a chance of uncovering it each time.

This can continue until Alice is satisfied that there is a very little chance that Bob has cheated. For example, after 60 rounds, Alice would be 99.99% certain.

Note that this didn’t actually involve any cryptography; it was an algorithm based on information flow. However, if you want this to work securely (in the current solution Alice could push Bob away and reveal all the nodes herself) and make it work without requiring Alice and Bob to be in the same location, you need to use hashes.

Remember when Bob colored the graph whilst alone in the secret room? Once Alice had entered the room, this coloring was committed. There was no way for Bob to tamper with this coloring.

We do the same thing here. After obtaining a valid coloring, Bob commits to this coloring by calculating some hashes.

NodeColor(private)Nonce(private)Hash
1redwmdqatobcke1f957bedcceeb217305bfa12cbee4abac36eff1
2bluefmcbpzkgyp87d9d7239909c28ec8d73a3b9a99673cbf870046
3greendktuqvrsssa40bafb81149937c77ae55589aff1b53d9c043d8
4blueauhbyuzkmzb3503962937850f7c1b59cf4b827ca40a62b122a
5redgfunjcmygkd8db52bb36ca595b9231180c1055fe3958c3ea7d


(The hashes here are calculated using SHA-1 for the hashing algorithm. It’s not considered very secure anymore, but the secure ones all output huge hashes which spill over the page)

Bob sends the public part of the table (the node-hash mapping) to Alice. Alice asks for nodes 1 and 2, and Bob reveals the entire table entry for those two nodes (including the nonce).

Note that since Alice now knows the color and nonce for nodes 1 and 2, she can verify that the colors shown are indeed the ones Bob committed to. echo red-wmdqatobck | sha1sum if you want to check on a local Unixy shell.

As in the previous case, Alice can repeat this algorithm until she reaches an acceptable level of certainty (each time with a permutation of colors and a new set of nonces).

A lot of zero knowledge proofs (but not all!) are inherently probabalistic and interactive. They involve multiple rounds where in each round the prover (Bob) commits to something, the verifier (Alice) challenges the prover to reveal some information. The process repeats, with the certainity on the verifier’s side approaching 100% as more and more rounds happen.

Zero Knowledge Proof for General Execution

It turns out that you can have a ZKP exchange for the execution of any algorithm that can be transcribed into combinatorical logic. In other words, you should be able to write the program without loops and recursion, though loops bounded by a constant are allowed5. This isn’t as restrictive as it seems, usually verification is a straightforward task not involving convoluted loops. The examples above (graph coloring6, sudoku, prime factorization7) can all be verified without loops.

The algorithm shown here is by Gregory Maxwell, originally published here. It’s somewhat inefficient, but it demonstrates the idea behind ZKP for general execution. As mentioned there, it can be optimized using techniques described in this paper.

Let’s get started. Any combinatorical program can be decomposed into a bunch of AND and NOT gates, taking in a bunch of input values and giving out one or more output values. For simplicity let’s assume that the problem statement (i.e. the specific sudoku puzzle, etc) that needs verifying is embedded inside the program, and the final output of the program is just a single boolean indicating whether or not the input is a solution. This algorithm, however, can work for programs with arbitrarily large outputs.

Alice and Bob do this decomposition. The also agree on a numbering of the AND gates. Let’s say that there are N AND gates. We’re mostly going to ignore the NOT gates for the purpose of this article – they’re there, but they aren’t modified or anything.

Creating encrypted AND gates

Now, Bob creates 4*N encrypted AND gates. This is an AND gate, but with the inputs and outputs all muddled up.

This is a regular AND gate:

This is an encrypted AND gate:

(yes, it can be identical to an AND gate)

So is this:

and this:

Basically, each input and the output may or may not be inverted. We can model this in a different way, there is an encryption key corresponding to each input and output. This key is XORd with the input/output (so if the key is 1, the wire is inverted, and if the key is 0, the wire is not inverted).

A regular AND gate has a truth table as follows:

Input 1 Input 2 Output
0 0 0
1 0 0
0 1 0
1 1 1


This truth table, encrypted (with the input keys \(e_1 = 1, e_2 = 0\) and output key \(e_o = 1\)) is:

Encrypted Input 1 Encrypted Input 2 Encrypted Output
1 0 1
0 0 1
1 1 1
0 1 0


So, if the encrypted gate gets the (encrypted) inputs 1 and 0, its (encrypted) output will be 1.

Since XOR is its own inverse (\(x \oplus y \oplus y\) is just \(x\)), if we wish to encrypt an input before piping it through this gate, we just XOR it with the relevant input key. If we wish to decrypt the output, we again XOR it with the output key. The XOR gates being applied will just cancel out with the internal encryption gates. In other words, encryption and decryption are done with the same operation!

To recap, the dotted box below is an encrypted AND gate. An encrypted input enters from the left, and is decrypted by the internal XOR gate to obtain the actual input, which is piped through the AND gate. To encrypt an input so that it can be passed into this gate, one uses the same key with an XOR (not shown in the diagram). Similarly, the actual output of the AND gate exits on the right, and is encrypted by the XOR gate at the output to get the “encrypted output” (the wire that extends out of the box). To decrypt this, one must apply the same XOR operation to the encrypted output to recover the actual output of the gate.

Creating adaptation keys and commitments

Now, unlike regular AND gates, these encrypted AND gates cannot be composed. The output of an encrypted AND gate is encrypted, with a potentially different encryption key as to that of the next gate’s input. So, we insert an “adaptation key” between the two. For example, if the output of the first gate is connected to the first input of the second gate, we need to insert this operation between the two gates:

We XOR by \(e_o\) of the first gate (to decrypt), and then again XOR by \(e_1\) of the second gate (to reencrypt). This operation is the same as XORing by \(e_o \oplus e_1\), which is the “adaptation key”. Every pair of encrypted gates will have an adaptation key for every configuration they can be placed in.

Alright. Bob creates a ton of these “encrypted gates”, and calculates all the adaptation keys. He also mixes up the truth tables of each encrypted gate8.

Now, he commits to these truth tables. A commitment for each entry in each truth table is made, so he’ll end up with something like this:

Encrypted Input 1 Encrypted Input 2 Encrypted Output nonce commitment
0 0 1 .. H(001 + nonce)
1 0 1 .. H(101 + nonce)
0 1 0 .. H(010 + nonce)
1 1 1 .. H(111 + nonce)


He also commits to each of the adaptation keys and each of the encryption keys.

As usual, all the commitments will be sent to Alice. Alice will then have data like: “Commitment for Gate 1 entry 1: .., Commitment for Gate 2 entry 2:.., … Commitment for Gate 2 entry 1: .., …. Commitment for adaptation key between Gate 1’s output and Gate 2’s first input: .., Commitment for adaptation key between Gate 1’s output and Gate 2’s second input: .., Commitment for encryption key for Gate 1’s first input, …”.

Shuffling and revealing

These commitments are taken in some predefined order, and the resultant monster string is hashed (without a nonce). This “superhash” is used as the seed to a pseudorandom number generator which is used to shuffle the gates. Both Alice and Bob can calculate this shuffling.

This post-shuffle ordering is used after this point. The hash-shuffle is important here because it adds a layer of tamper protection. If Bob wishes to tamper with the, say 4th gate post-shuffle, Bob would have to create a bad gate before making the commitments; this changes the commitments, and thus the shuffle order, and so the tampered gate will not end up being the 4th gate. Basically, it’s hard to control where the tampered gate will end up.

Now, out of the 4N gates, Bob takes the last 2N, and reveals everything about them: Their encryption keys, the nonces for their truth table commitments, and all adaptation keys between these gates (along with the nonces for the adaptation key commitments).

Alice ensures that everything in this revealed data adds up. All the truth tables, after decryption, should actually be AND gate truth tables. All adaptation keys must match up with their relevant encryption keys. All commitments must match up.

Double trouble!

Bob duplicates the AND-and-NOT-gate based circuit. He now has two identical circuits which take the same inputs, and have one output each. In itself this is pretty useless; this circuit is obviously redundant. However, in the context of encrypted gates, this redundancy becomes useful.

Bob drops in the 2*N encrypted gates into this double-circuit, using the post-shuffle ordering of encrypted gates and the predecided numbering9 of the AND gates in the circuit. He puts the necessary adaptation gates (i.e. an XOR operation with the relevant adaptation key) between encrypted AND gates to make the circuit work. Note that each “half” of the circuit has a different set of encrypted gates, and thus a different encryption key for each input. There are NOT gates here too (from the original circuit, which was made of ANDs and NOTs); they stay in place (the adaptation gate can go on either side of them) with no modifications or encryption.

Execution

Let’s recall that Bob is claiming to have the correct input for the original circuit – the input that makes that circuit output true.

Since Bob has all the encryption keys, he can encrypt this correct input to get the correct encrypted input, which should make the new circuit output true (well, encrypted true) as well.

Bob goes ahead and does this. He encrypts the input (since there are different encryption keys for either side of the circuit, he does this twice), and runs it through the circuit. He notes down the truth table entry utilized for each gate. He ensures that the output, once decrypted, is true (it should be, if everything has been done correctly till now).

Verification

He now reveals the details of the program execution to Alice. He reveals:

  • All adaptation gates involved (and their nonces, to verify the commitments)
  • All truth table entries involved in the execution (and their nonces …).
  • The output encryption key (and its nonce)
  • The encrypted inputs

Alice goes ahead and verifies that the commitments have not been reneged upon. Note that she also now has a full execution history. It’s an encrypted history – she can’t calculate the original input from it – but she can verify that the execution was faithfully carried out. While she doesn’t have the entire truth table for any encrypted gate, she has the entry that was used in the execution, which is enough. She just has to ensure that the inputs to a gate match the truth table entry, use the entry to see what the output is, apply the relevant adaptation key to get the input for the next gate, and repeat.

And there you have it. Alice has verified that Bob faithfully executed her verification circuit, and thus he must have the correct answer to her problem.

Tampering?

Let’s see if it’s possible for Bob to tamper with any of this. If Bob wishes to tamper with one of the gates, he has to tamper with the gates before calculating commitments, which means that the shuffling will get mixed up, which will mean that he can’t control where the tampered gate will end up in the final circuit. This is compounded by the fact that half the gates are revealed (so the tampered gate may end up in the wrong set), and that there are two copies of the circuit (so you need to tamper with both sides simultaneously, requiring even more luck on getting the shuffle where you want it).

The probability of Bob being able to execute a succesful tamper can be adjusted by increasing the number of revealed gates, and increasing the duplication of the circuit. There is also the aforementioned fudge factor that can be introduced by having Alice choose where each encrypted gate should go after Bob has already provided commitments, and finally the procedure can be repeated as many times as necessary with a fresh set of encrypted gates to increase certainty. Unlike the graph coloring algorithm (where the uncertainty in a single run was large – if Bob has a couple of wrong edges there’s relatively small chance he’ll get caught); here in a single run it is Bob who has a massive disadvantage, since he must tamper with exactly the right gates, and there’s very little chance that his tampered gates will fall in the right place based on Alice’s chosen ordering. Additionally, tampering with the gates in the first place is hard, since you need to avoid having them get revealed. I think that with reasonable (e.g., not asking for something like 1000 duplicated circuits) choices on the level of duplication and number of revealed gates, it’s possible for Alice to get a very high level of certainty without needing to conduct multiple rounds.

How about the opposite question: Can Alice find out anything about the input, aside from the fact that it is correct, from the information she has? At first glance it seems like she can, because she can see the whole path of execution. In case of a program with non-constant loops, this would be damning, since she can figure out how many executions happened (and thus know the decrypted value for the number of loop iterations) and backtrack using that in a cleverly-written program. However, this program has no loops.

Looking at it closely, any encrypted history of execution can be changed to a different encrypted history of execution for the same nonencrypted execution by adding NOT gates wherever they don’t match, and then absorbing these NOT gates into the input or output keys (by NOTing them) of the adjacent encrypted AND gates. This means that without knowing the details of the encrypted gates, all histories of execution are equally possible for a given actual execution10. Therefore, knowing only a history of execution does not provide you further information about the actual execution, since it could equally have been for some other history of execution.

Bonus: Fixing the escrow and Bitcoin

(I’m going to assume basic knowledge of Bitcoin later on in this section)

After all this, we still only have a way of Bob proving he has a solution. There’s no way of securely exchanging the solution for money (or whatever) without involving a trusted third party to handle the swap. This is known as escrow, where a third party is given both items for swapping; and after checking that everything is in order the third party completes the swap.

We can build on this so that the third party is only trusted with the money, and cannot actually peek at the answer.

It’s pretty straightforward: Bob and Alice mutually agree on a shared secret “pad” P. Bob takes his answer, bitwise-XORs it with the pad (which is of the same length as the answer) to get padded input X, and then hashes it to get hash Y.

Now, initially we had a verification program which proves the statement “This input is a solution to Alice’s problem”. We modify this program so that it proves the following two statements:

  • This input is a solution to Alice’s problem
  • When the input is XORd with P, and subsequently hashed, the hash that comes out is Y

Alice and Bob now go through the ZKP algorithm and the above is proven. Of course, they must keep the exchange between themselves, since the value of the pad (which can be extracted from the circuit) must remain secret.

Assuming that Bob isn’t able to cause any hash collisions, Alice at this point would be happy with a number that, when hashed, gives Y. This is something that escrow can verify, since neither Y nor X can be reverse-engineered to get the original answer unless you have P.

Now, Alice puts the money in escrow, and notifies the third party handing escrow of the value of Y (the hash). Bob puts the padded input X in escrow as well. The third party verifies that Y is the hash of X, and releases the money to Bob and the padded input to Alice. Since Alice knows pad P, she can XOR it with X to recover the original real input. Everyone walks away happy!

Well, maybe not. There still is the danger of the third party handling escrow to walk away with the money. Why trust any one party?

Turns out that Bitcoin proves to be an alternative to this situation. The technique described in Greg Maxwell’s article (called Zero-Knowledge Contingent Payment), builds upon the above protocol using “scripts” in Bitcoin.

The way a Bitcoin transaction works is that anyone (well, the first person) who can solve the embedded challenge is allowed to use the money contained in it. Like a piñata. Except with money instead of candy and public-key cryptography instead of a stick.

Most Bitcoin transactions pay directly to a person, and they use a standard kind of challenge (the actual script is here). If Alice wishes to pay Bob 5 BTC, Alice crafts a transaction which says “anyone with the private key behind Bob’s public key (his address) may spend this money”. Of course, in practice this means that only Bob can spend the money. Alice created a piñata which only Bob can break.

We can build on this to make the Bitcoin network behave as a trustworthy escrow. After having stepped through the zero-knowledge protocol and being confident that Y is the hash of the padded input, Alice crafts a transaction which says “anyone with a string that results in this hash may spend this money”11. Bob has this string; it is the padded answer X. He makes a transaction with X as part of the input script (so that he can claim the money); and the Bitcoin network accepts it. Assuming Alice and Bob are not able to tamper with each others’ local networks, by the time Alice sees the transaction containing X, the network should have accepted this transaction already (though it may not yet be part of the blockchain), and Bob should be getting his money.

(In case the crucial part is trusting that the escrow doesn’t run off with the money, and you don’t care if other people can see the answer, you can skip the padding step and directly hash the input. I believe the proof of concept executed in Greg’s post did this, but I’m not sure)

Thanks to Shantanu Thakoor, eternaleye, and ebfull for feedback on drafts of this post


  1. I have some physics friends who would probably enjoy this too.

  2. Actually, you still need a trusted third party to make the money-swap work, but it can be done in a way that the National Secrets Sudoku Solution isn’t actually shared with the third party. The Bitcoin article linked above describes a way to do away with a trusted third party, instead replacing it with the implicitly trusted Bitcoin network. We’ll discuss this further at the end of the post.

  3. With free sharpener!

  4. There are seven edges. This is a conservative estimate, assuming that Bob’s graph has one bad edge. More mistakes increase this probability, but it becomes more cumbersome to calculate.

  5. We basically want to be able to write this as a series of sequentially-arranged logic gates. If a loop is bounded by a constant, it can just be unrolled. break and continue can be handled here, though goto cannot.

  6. Remember that the number of nodes and edges is already known, so we can just write a program “Check edge 1”, “Check edge 2”, … without needing to explicitly loop over everything

  7. Again, since the number being factorized is known beforehand, there are bounds on the sizes of its factors, and a multiplication circuit for a number of bounded size can be designed.

  8. mixing up a truth table doesn’t change how it works, but it makes it impossible to figure out the original entry just by knowing that your entry was the “third” entry or something

  9. You can actually add another fudge factor here by making Alice decide the gate numbering after having received gate commitments. If N isn’t that large, there’s still a small chance Bob can fake the output by permuting the original gates (and twiddling the nonces) until the tampered gates fall into the right spot. This removes that possibility to a reasonably high level of certainty, which can be strengthened by going through the whole procedure multiple times.

  10. We’re ignoring the commitments made by Bob here, which let us make the opposite statement – “this encrypted history of execution is the only one that’s possible given the commitments”. However, the commitments themselves don’t carry any new information per se; they instead lock in information which is revealed to you in the future (information which is not revealed at all cannot be reverse-engineered from the commitments, so that’s safe too). This means that Alice cannot use them to glean anything about the decrypted input, and we can ignore them for the time being.

  11. She should probably also add a clause that requires Bob’s private key to sign something, so that someone else can’t copy the answer from Bob’s transaction and steal the money. Additional work can be done to make it so that if the transaction goes unclaimed, Alice can reclaim the money.

Making Your Open Source Project Newcomer-friendly

One reason I really like open source is that it offers a lot of great opportunities for newish programmers to get some hands-on experience with real world problems. There’s only so much one can learn from small personal projects; but in open source one often gets to tackle interesting problems on large codebases — problems which wouldn’t occur in small/personal ones. There are also valuable skills related to collaboration to be learnt.

Because of this, I care quite a bit about making projects welcoming to new contributions, and try to improve this experience on projects I’m involved in. I’ve picked up a few tricks along the way. Most of these aren’t my ideas, I’ve gleaned them from watching people like Josh Matthews, Margaret Leibovic, Mike Conley, and Joel Maher do their thing. If you’re interested, here is a post by Margaret, and here are some of Josh’s slides from a presentation, both on the same subject.

Before I get started, bear in mind that making a project “newbie-friendly” isn’t something that magically happens. Like most things, it takes effort, but often this effort can come to fruition in the form of motivated contributors helping out on your project and eventually even becoming co-maintainers. It’s really worth it!

The simple stuff

There’s a lot of really easy stuff you can do to kickstart contributions to your own project. Most of this is obvious:

CONTRIBUTING.md

Add a CONTRIBUTING.md file. Keep it up to date. Link to it prominently from the README. The README should also have clear and detailed instructions for building the project. These two files are different – README is for those who want to use your project (perhaps by building from sources), CONTRIBUTING is for people who want to contribute.

Mention steps for getting involved: how to find something to work on, how to send a patch/make a pull request, a checklist of things to ensure your patch/PR satisfies before submission (e.g. passing tests, commit message guidelines, etc). Additionally, include some tips and tricks (like a link to the internal documentation) that can help new contributors, links to communication channels (IRC, Slack, Gitter, whatever) and anything else you may find helpful for someone considering contributing to your project. If you use some form of issue labeling, explanations of the labeling scheme can help folks find stuff they want to work on. An overview of the directory structure can be similarly helpful.

For some examples, check out the CONTRIBUTING.md files for servo and rust-clippy.

Maintain a list of easy bugs

More on this later, but try to use some form of tagging to mark easy bugs. I love this slide from Josh’s talk.

How to politely say f*** off

“Choose something to work on from our issue tracker.”
                                           - every project maintainer

Most bugs on issue trackers are nontrivial, steeped with jargon, devoid of actionable information, or otherwise inaccessible to the average new contributor. It’s relatively low effort to recognize bugs which are “easy” for maintainers, but it’s a lot of work for people unfamiliar to this project to figure this out.

A simple label on GitHub is all you need in most cases. Be sure to link to it from your contributing file!

Communicate!

Have open channels for communication. IRC is often the favorite here, though IRC is pretty alien for people getting involved in open source for the first time. If you’re using IRC, see if you can link to a web client (like Mibbit) with short instructions on how to join. Stuff like Gitter works too.

Mailing lists also work — everyone knows how to email! However, email can be intimidating to newcomers; many have a “omg I can’t ask my silly questions here!” attitude which stops them from progressing.

Explicitly inviting questions in each issue helps here. Clippy doesn’t have a mailing list or IRC channel (too small a project), but I encourage people working on new bugs to ping me on any communication channel they’d like. I’ve mentored people over email, GH issue threads, IRC, even reddit PMs, and it’s worked out fine in each case.

Often folks will PM you for help. Provide help, but encourage them to ask questions in the main venue. This has the twofold benefit of showing everyone that the main channel is open to questions, and it also helps people get quicker answers since someone else can answer if you’re not around.

Recognition

Celebrate new contributors. Tweet about them. Mention them in blog posts. Getting a two-line patch accepted in an open source project doesn’t sound like much, but when you’re just getting started, it’s a very awesome feeling. Make it more awesome! Both This Week In Rust and This Week In Servo mention new contributors (sometimes we tweet about it too), and I’ve often got very happy messages from these contributors about the mentions.

Add a code of conduct

A lot of folks have had bad experiences with people online, often in other open source communities and may be wary about joining others. A code of conduct is a statement that unsavory behavior won’t be tolerated, which helps make the project more welcoming and appealing to these people, simultaneously making it a nicer place which is helpful for everyone. Of course, you should be prepared to enforce the code of conduct if the situation requires it.

I use the Rust code of conduct but the Contributor Covenant is good, too. Various language/framework communities often have their own favorite code of conduct. Pick one.

Empathize!

We often forget how hard it is to jumpstart in something we’re an active part of. For example, for many of us the process of making a pull request is almost second nature.

However, not everyone is used to these things. I’ve seen contributors who can code well but haven’t used Github in the past having lots of trouble making and updating a pull request. The same applies to other workflow things; like code review, version control1, or build system peculiarities.

Keep this in mind when dealing with new contributors. These are skills which can be picked up relatively quickly, but those without them will have a frustrating experience and end up asking you a lot of questions.

Improving the newcomer experience

Alright, now you’ve gotten all the basics done. People now have a vague idea of how to contribute to your project. Let’s make it more concrete.

Mentoring

Don’t just leave an easy bug open. Offer to mentor it! This is a very fun and rewarding experience, and of course contributors are more likely to stick around in a project they percieve to be helpful and welcoming.

It’s often better to go one step further and give tips for fixing the issue before anyone even picks it up (example). Communication in open source has latency — the contributor might be on the other side of the planet, or might otherwise be contributing at a different time of the day than you2. Reducing the number of back-and-forth cycles is really helpful here, and giving some info so that a contributor can get started immediately without needing to wait for a response goes a long way in improving the newcomer experience.

Avoid creating a mentored bug where you yourself aren’t certain on how to fix it. Ideally, you should know the exact steps to take to fix a bug before marking it as mentored. Don’t divulge all the steps to the mentee, but the exercise of solving the bug yourself (without writing the code) ensures that there aren’t any hidden traps.

Mostly mentorship just involves answering questions and laying out a path for the mentee. Be sure to encourage questions in the first place! A lot of people, especially students, are intimidated when joining open source and try to stay as quiet as possible. For a healthy mentorship, you want them to ask questions. A lot. Encourage this.

Remember that in many cases the new contributor may be intimidated by you. For example, I’ve often come across new Firefox contributors (who I introduced to the project) asking me questions instead of their assigned mentor because “the mentor works for Mozilla and is way too awesome for me to bug with questions”. This wasn’t something the mentor told them (Firefox mentors are all very nice and helpful people), it was a conclusion they came to on their own — one which would impede their progress on the bug.

One trick that helps mitigate this is encouraging questions in your main channel. When people PM me with questions on IRC, I answer their questions, but also encourage them to ask in the main IRC channel next time. This is good for everyone — It gives the channel an aura of being “okay to ask questions in” (if other people see that questions are being asked and answered in the channel), and it also lets other maintainers jump in to help the new contributor in case I’m not around.

Once a new contributor has fixed a bug, mentorship isn’t over – it’s just started! See if you can find something more involved for them to work in a related area of the codebase. Get to know the contributor too, a sense of familiarity goes a long way in reducing intimidation and other friction.

Tailoring process for newcomers

Most open source projects have a set of hoops you have to jump through for a pull request to be accepted. These are necessary for the health of the project and pretty straightforward for existing contributors, but can be intimidating for new ones. They also add extra cycles of communication. I’ve often seen people put up almost-working patches, and disappear after a few cycles — even though the bulk of the work was done and there were just process issues (or code nits) left over for merging; which can be quite disheartening. Reducing extra process helps mitigate this.

For example, Servo uses this great tool called Reviewable for code review. Regular contributors don’t have much friction whilst using this, so we use it wherever possible. However, for small pull requests from new contributors I avoid using Reviewable and instead opt to review directly from the GitHub interface. For these pull requests I don’t need Reviewable’s features, so I don’t lose much, but now the contributor has to go through one less hoop.

Similarly, for rust-clippy, I often make minor fixes and run the readme update script on behalf of the contributor instead of asking them to do it themselves. I usually check out the PR locally, run git merge pr-branch --no-commit --no-ff3, make edits, commit and push. This way the PR still gets marked as merged (commit --amend doesn’t do that), and the history stays bisectable.

OpenSSL uses a mailing list for patches, however they allow contributions via GitHub too. Most seasoned contributors probably stick to the mailing list, but new contributors can use the familiar GitHub interface if they want, reducing friction.

Of course, cutting down on (necessary) process should only be done for the first one or two contributions; try to educate the newcomer about your processes as time passes.

An alternate way to tackle this issue is to go the other way around and teach process first. Give newcomers an extremely easy bug that just involves replacing a string or some other simple one-line fix, and help them push it through the process. This way, the next time they work on something, they’ll be familiar with the process and be able to devote more time to the actual code.

Creating easy bugs

At some point down this road many projects have a problem where there are people who want to contribute, but not enough suitable easy bugs.

One technique that has helped me create a lot of easy bugs is to just look out for separable and non-critical subfeatures when working on something. There often are things like polish or other small features which you don’t need to include in the main pull request, but you do anyway because it’s a few extra seconds of work. If you think it can be split out as an easy bug, go ahead and file it!

For example, whilst working on some form issues, instead of completely implementing something, I implemented just what I needed, and filed an easy bug. This is another bug with a very similar situation; I’d implemented the framework for form submission, made it work with <input>, and filed an easy bug for wiring it up to <button>.

Sometimes you may not find a subfeature that can be split out, but you may notice something else which could be improved. This is an example of such an issue. I was working on something else, and noticed that this area of the code could be designed better. Whilst I could have fixed it myself with very little effort as part of my other changes, I made it into an easy bug instead.

Simple refactorings can be a source of easy bugs too. These require familiarity with the language, but not much more, so they’re ideal for people new to the project.

It’s also possible to take a hard bug and make it easier, either by partially implementing it, or giving enough hints (code links, explanations, etc) that the hard part is already taken care of.

Avoid making “critical” (i.e, needs to land in a week or two) features into easy bugs. Even simple changes can take a while for new contributors (especially due to the nature of asynchronous communication, lack of time, and/or getting bogged down in the process). Easy bugs should be something which you eventually want, but are okay with it taking longer to solve. It’s very disheartening for a new contributor if they are working on something and a maintainer solves it for them because it was needed to land quickly. (Given enough time this will eventually happen for some bug, in such a case see if you can provide a different bug for them to work on and apologize)

Discoverability

Make it super easy for newcomers to find a bug they want to work on; not just any easy bug!

Bugs Ahoy and What Can I do for Mozilla are both great examples of this. Servo has servo-starters.

There are also various sites where you can list your easy bugs, some of which are listed in Emily’s post.

Projects and more involved participation

Having easy bugs and mentoring newcomers is just one step. You probably want to have these newcomers working on harder stuff, projects, and perhaps eventually maintianing/reviewing!

For many people these steps may not necessarily require involvement from you; I’ve seen professional software developers move their way to being a maintainer with very little mentorship just because they’re experienced enough to figure out how the project works on their own.

However, many of your contributors may be students or otherwise inexperienced; indeed they may be contributing to your project to gain this experience and become better developers. Such people can become valuable members of the team with some effort.

This mostly involves nudging people towards harder bugs and/or projects. It’s also very valuable to maintain a list of “student projects” (noncritical but large bodies of work). These can be picked up by contributors or sometimes students wishing to do a project for course credit.

It’s important to try and provide a logical series of issues instead of picking things randomly around the project so that the contributor can focus on one part of the codebase while starting out. If the issues all culminate in a large feature, even better.

Joel Maher and the Mozilla Tools team have started a pretty great program called “Quarter of Contribution” which provides focused mentorship for a particular project. It seems to work out well. Programs like Google Summer of Code and Outreachy also provide ways for new contributors to try out your project at a significant level of involvement.

Creating such projects or harder bugs is a nontrivial problem, and I don’t have a clear idea on how this can be done (aside from using similar techniques as listed in the “creating easy bugs” section above). Ideas welcome!

Projects aren’t always necessary here either. Depending on the contributor, sometimes you can present them with a regular (i.e., not “easy” or otherwise earmarked) issue to work on, provide some hints, and tell them to try and figure stuff out without your help (or with less help). Stay involved, and encourage them to ask questions of others or initiate discussions, but try to stick to observing. It’s really fun to watch someone figure stuff out on their own. I did this with a contributor here, where I only provided the initial hint and the code review; as well as here, where I encouraged the contributor to initiate and direct the relevant bikeshedding on various channels without my involvement. The contributor is now more in touch with the Rust community and codebase as a result; and for me I enjoyed watching him figure stuff out and direct discussions on his own.

Mentor! Share!

I’m still exploring these techniques myself. I’ve had great results applying some of these to clippy, and we already apply many of them to Servo. I’m slowly working on applying these techniques to Rust.

While some of these projects I’m always open to hearing about more ideas for making it easier for newcomers to contribute, so please let me know if you have any ideas or experiences to share!

Mentoring, while a lot of work, is an insanely rewarding experience, and I hope you try to incorporate it into your open source projects!

Thanks to Josh Matthews, James Graham, Emily Dunham, and Joel Maher for feedback on drafts of this post

discuss: HN


  1. I’ve lost track of the number of times I’ve helped someone through git rebase and merge conflicts on Servo

  2. This often happens with open source projects with paid staff – the staff is around during the workday, but the contributors are around during the evenings, so there’s less overlap.

  3. There are various reasons why you should not do this, mainly because non-merge-related changes in merge commits are hard to track down. Be aware of the downsides and use this trick judiciously.

Designing a GC in Rust

For a while I’ve been working on a garbage collector for Rust with Michael Layzell. I thought this would be a good time to talk of our design and progress so far.

Motivation

“Wait”, you ask, “why does Rust need a garbage collector”? Rust is supposed to work without a GC, that’s one of its main selling points!

True. Rust does work pretty well without a GC. It’s managed to do without one so far, and we still have all sorts of well-written crates out there (none of which use a GC).

But Rust is not just about low-cost memory safety. It’s also about choosing your costs and guarantees. Box<T> and stack allocation are not always sufficient, sometimes one needs to reach for something like Rc<T> (reference counting). But even Rc is not perfect; it can’t handle cycles between pointers. There are solutions to that issue like using Weak<T>, but that only works in limited cases (when you know what the points-to graph looks like at compile time), and isn’t very ergonomic.

Cases where one needs to maintain a complicated, dynamic graph are where a GC becomes useful. Similarly, if one is writing an interpreter for a GCd language, having a GC in Rust would simplify things a lot.

Not to say that one should pervasively use a GC in Rust. Similar to Rc<T>, it’s best to use regular ownership-based memory management as much as possible, and sprinkle Rc/Gc in places where your code needs it.

Previous designs

This isn’t the first GC in Rust. Automatic memory management has existed before in various forms, but all were limited.

Besides the ones listed below, Nick Fitzgerald’s cycle collector based on this paper exists and is something that you should look into if you’re interested. There’s also an RFC by Peter Liniker which sketches out a design for an immutable GC.

Core Rust GC(s)

Rust itself had a garbage collector until a bit more than a year ago. These “managed pointers” (@T) were part of the language. They were removed later with a plan to make GC a library feature.

I believe these were basically reference counted (cycle collected?) pointers with some language integration, but I’m not sure.

Nowadays, the only form of automatic memory management in Rust are via Rc and Arc which are nonatomic and atomic reference counted pointers respectively. In other words, they keep track of the number of shared references via a reference count (incremented when it is cloned, decremented when destructors run). If the reference count reaches zero, the contents are cleaned up.

This is a pretty useful abstraction, however, as mentioned above, it doesn’t let you create cycles without leaking them.

Spidermonkey

You can read more about Servo’s Spidermonkey bindings in this blog post (somewhat outdated, but still relevant)

In Servo we use bindings to the Spidermonkey Javascript engine. Since Javascript is a garbage collected language, the Rust representations of Javascript objects are also garbage collected.

Of course, this sort of GC isn’t really useful for generic use since it comes bundled with a JS runtime. However, the Rust side of the GC is of a design that could be used in an independent library.

The Rust side of the Spidermonkey GC is done through a bunch of smart pointers, and a trait called JSTraceable. JSTraceable is a trait which can “trace” recursively down some data, finding and marking all GC-managed objects inside it. This is autoderived using Rust’s plugin infrastructure, so a simple #[jstraceable] annotation will generate trace hooks for the struct it is on.

Now, we have various smart pointers. The first is JS<T>. This is opaque, but can be held by other GC-managed structs. To use this on the stack, this must be explicitly rooted, via .root(). This produces a Root<T>, which can be dereferenced to get the inner object. When the Root is created, the contained object is listed in a collection of “roots” in a global. A root indicates that the value is being used on the stack somewhere, and the GC starts tracing usage from these roots. When the Root<T> is destroyed, the root is removed.

The problem with this is that JS<T> doesn’t work on the stack. There is no way for the GC to know that we are holding on to JS<T> on the stack. So, if I copy a JS<T> to the stack, remove all references to it from objects in the GC heap, and trigger a collection, the JS<T> will still be around on the stack after collection since the GC can’t trace to it. If I attempt to root it, I may get a panic or a segfault depending on the implementation.

To protect against this, we have a bunch of lints. The relevant one here protects against JS<T> from being carried around on the stack; but like most lints, it’s not perfect.

To summarize: Spidermonkey gives us a good GC. However using it for a generic Rust program is ill advised. Additionally, Servo’s wrappers around the GC are cheap, but need lints for safety. While it would probably be possible to write safer wrappers for general usage, it’s pretty impractical to carry around a JS runtime when you don’t need one.

However, Spidermonkey’s GC did inspire me to think more into the matter.

Brainstorming a design

For quite a while I’d had various ideas about GCs. Most were simplifications of Servo’s wrappers (there’s some complexity brought in there by Spidermonkey that’s not necessary for a general GC). Most were tracing/rooting with mark-and-sweep collection. All of them used lints. Being rather busy, I didn’t really work on it past that, but planned to work on it if I could find someone to work with.

One day, Michael pinged me on IRC and asked me about GCs. Lots of people knew that I was interested in writing a GC for Rust, and one of them directed him to me when he expressed a similar interest.

So we started discussing GCs. We settled on a tracing mark-and-sweep GC. In other words, the GC runs regular “sweeps” where it first “traces” the usage of all objects and marks them and their children as used, and then sweeps up all unused objects.

This model on its own has a flaw. It doesn’t know about GC pointers held on the stack as local variables (“stack roots”). There are multiple methods for solving this. We’ve already seen one above in the Spidermonkey design – maintain two types of pointers (one for the stack, one for the heap), and try very hard using static analysis to ensure that they don’t cross over.

A common model (used by GCs like Boehm, called “conservative GCs”) is to do something called “stack scanning”. In such a system, the GC goes down the stack looking for things which may perhaps be GC pointers. Generally the GC allocates objects in known regions of the memory, so a GC pointer is any value on the stack which belongs to one of these regions.

Of course, this makes garbage collection rather inefficient, and will miss cases like Box<Gc<T>> where the GCd pointer is accessible, but through a non-GC pointer.

We decided rather early on that we didn’t want a GC based on lints or stack scanning. Both are rather suboptimal solutions in my opinion, and very hard to make sound1. We were also hoping that Rust’s type system and ownership semantics could help us in designing a good, safe, API.

So, we needed a way to keep track of roots, and we needed a way to trace objects.

Tracing

The latter part was easy. We wrote a compiler plugin (well, we stole Servo’s tracing plugin which I’d written earlier) which autoderives an implementation of the Trace trait on any given struct or enum, using the same internal infrastructure that #[derive(PartialEq)] and the rest use. So, with just the following code, it’s easy to make a struct or enum gc-friendly:

1
2
3
4
5
6
7
8
9
10
#[derive(Trace)]
struct Foo {
    x: u8,
    y: Bar,
}

#[derive(Trace)]
enum Bar {
    Baz(u8), Quux
}

For a foo of type Foo foo.trace(), will expand to a call of foo.x.trace() and foo.y.trace(). bar.trace() will check which variant it is and call trace() on the u8 inside if it’s a Baz. For most structs this turns out to be a no-op and is often optimized away by inlining, but if a struct contains a Gc<T>, the special implementation of Trace for Gc<T> will “mark” the traceability of the Gc<T>. Types without Trace implemented cannot be used in types implementing Trace or in a Gc, which is enforced with a T: Trace bound on Gc<T>.

So, we have a way of walking the fields of a given object and finding inner Gc<T>s. Splendid. This lets us write the mark&sweep phase easily: Take the list of known reachable Gc<T>s, walk their contents until you find more Gc<T>s (marking all you find), and clean up any which aren’t reachable.

Rooting

Of course, now we have to solve the problem of keeping track of the known reachable Gc<T>s, i.e. the roots. This is a hard problem to solve without language support, and I hope that eventually we might be able to get the language hooks necessary to solve it. LLVM has support for tracking GCthings on the stack, and some day we may be able to leverage that in Rust.

As noted above, Spidermonkey’s solution was to have non-rooted (non-dereferencable) heap pointers, which can be explicitly converted to rooted pointers and then read.

We went the other way. All Gc<T> pointers, when created, are considered “rooted”. The instance of Gc<T> has a “rooted” bit set to true, and the underlying shared box (GcBox, though this is not a public interface) has its “root count” set to one.

When this Gc<T> is cloned, an identical Gc<T> (with rooted bit set to true) is returned, and the underlying root count is incremented. Cloning a Gc does not perform a deep copy.

1
2
3
4
5
6
let a = Gc::new(20); // a.root = true, (*a.ptr).roots = 1, (*a.ptr).data = 20

// ptr points to the underlying box, which contains the data as well as
// GC metadata like the root count. `Gc::new()` will allocate this box

let b = a.clone(); // b.root = true, (*a.ptr).roots++, b.ptr = a.ptr

This is rather similar to how Rc works, however there is no root field, and the roots counter is called a “reference counter”.

For regular local sharing, it is recommended to just use a borrowed reference to the inner variable (borrowing works fine with rust-gc!) since there is no cost to creating this reference.

When a GC thing is put inside another GC thing, the first thing no longer can remain a root. This is handled by “unrooting” the first GC thing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
struct Foo {
    bar: u32,
    baz: Gc<u32>,
}

let a = Gc::new(20); // why anyone would want to GC an integer I'll never know
                     // but I'll stick with this example since it's simple

let b = Gc::new(Foo {bar: 1, baz: a});
// a.root = false, (*a.ptr).roots--
// b initialized similar to previous example

// `a` was moved into `b`, so now `a` cannot be accessed directly here
// other than through `b`, and `a` is no longer a root.
// To avoid moving a, passing `a.clone()` to `b` will work

Of course, we need a way to traverse the object passed to the Gc<T>, in this case Foo, and look for any contained Gc<T>s to unroot. Sound familiar? This needs the same mechanism that trace() needed! We add struct-walking root() and unroot() methods to the Trace trait which are auto- derived exactly the same way, and continue. (We don’t need root() right now, but we will need it later on).

Now, during collection, we can just traverse the list of GcBoxs and use the ones with a nonzero root count as roots for our mark traversal.

So far, so good. We have a pretty sound design for a GC that works … for immutable data.

Mutability

Like Rc<T>, Gc<T> is by default immutable. Rust abhors aliasable mutability, even in single threaded contexts, and both these smart pointers allow aliasing.

Mutation poses a problem for our GC, beyond the regular problems of aliasable mutability: It’s possible to move rooted things into heap objects and vice versa:

1
2
3
4
5
6
7
8
9
10
11
12
let x = Gc::new(20);

let y = Gc::new(None);

*y = Some(x); // uh oh, x is still considered rooted!

// and the reverse!

let y = Gc::new(Some(Gc::new(20)));

let x = y.take(); // x was never rooted!
// `take()` moves the `Some(Gc<u32>)` out of `y`, replaces it with `None`       

Since Gc<T> doesn’t implement DerefMut, none of this is possible — one cannot mutate the inner data. This is one of the places where Rust’s ownership/mutability system works out awesomely in our favor.

Of course, an immutable GC isn’t very useful. We can’t even create cycles in an immutable GC, so why would anyone need this in the first place2?

So of course, we needed to make it somehow mutable. People using Rc<T> solve this problem by using RefCell<T>, which maintains something similar to the borrow semantics at runtime and is internally mutable. RefCell<T> itself can’t be used by us since it doesn’t guard against the problem illustrated above (and hence won’t implement Trace, but a similar cell type would work).

So we created GcCell<T>. This behaves just like RefCell<T>, except that it will root() before beginning a mutable borrow, and unroot() before ending it (well, only if it itself is not rooted, which is tracked by an internal field similar to Gc<T>). Now, everything is safe:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#[derive(Trace)]
struct Foo {
    a: u8,
    b: GcCell<Gc<u8>>,
}

let x = Gc::new(20);

let y = Gc::new(Foo {a: 10, b: Gc::new(30)});
{
    *y.b.borrow_mut() = x; // the `Gc(30)` from `y.b` was rooted by this call
                           // but since we don't actually use it here,
                           // the destructor gets rid of it.
                           // We could use swap() to retain access to it.
    // ...
    // x unrooted
}


// and the reverse case works too:

let y = Gc::new(GcCell::new(Some(Gc::new(20))));

let x = y.borrow_mut().take(); // the inner `Some(Gc(20))` gets rooted by `borrow_mut()`
                               // before `x` can access it

So now, mutation works too! We have a working garbage collector!

Open problems

Destructors

I believe this can be solved without lints, but it may require some upcoming features of Rust to be implemented first (like specialization).

In essence, destructors implemented on a value inside Gc<T> can be unsafe. This will only happen if they try to access values within a Gc<T> — if they do, they may come across a box that has already been collected, or they may lengthen the lifetime of a box scheduled to be collected.

The basic solution to this is to use “finalizers” instead of destructors. Finalizers, like in Java, are not guaranteed to run. However, we may need further drop hooks or trait specialization to make an airtight interface for this. I don’t have a concrete design for this yet, though.

Concurrency

Our model mostly just works in a concurrent situation (with thread safety tweaks, of course); in fact it’s possible to make it so that the concurrent GC will not “stop the world” unless someone tries to do a write to a GcCell. We have an experimental concurrent GC in this pull request. We still need to figure out how to make interop between both GCs safe, though we may just end up making them such that an object using one GC cannot be fed to an object using the other.

Performance

So far we haven’t really focused on performance, and worked on ensuring safety. Our collection triggering algorithm, for example, was horribly inefficient, though we planned on improving it. The wonderful Huon fixed this, though.

Similarly, we haven’t yet optimized storage. We have some ideas which we may work on later. (If you want to help, contributions welcome!)

Cross-crate deriving

Currently, an object deriving Trace should have Traceable children. This isn’t always possible when members from another crate (which does not depend on rust-gc) are involved. At the moment, we allow an #[unsafe_ignore_trace] annotation on fields which are of this type (which excludes it from being traced – if that crate doesn’t transitively depend on rust-gc, its members cannot contain GCthings anyway unless generics are involved). It should be possible to detect whether or not this is safe, and/or autoderive Trace using the opt-in builtin traits framework (needs specialization to work), but at the moment we don’t do anything other than expose that annotation.

Stdlib support for a global Trace trait that everyone derives would be awesome.

Conclusion

Designing a GC was a wonderful experience! I didn’t get to write much code (I was busy and Michael was able to implement most of it overnight because he’s totally awesome), but the long design discussions followed by trying to figure out holes in the GC design in every idle moment of the day were quite enjoyable. GCs are very hard to get right, but it’s very satisfying when you come up with a design that works! I’m also quite happy at how well Rust helped in making a safe interface.

I encourage everyone to try it out and/or find holes in our design. Contributions of all kind welcome, we’d especially love performance improvements and testcases.

Discuss: HN, Reddit


  1. I’m very skeptical that it’s possible to make either of these completely sound without writing lints which effectively rewrite a large chunk of the compiler

  2. There is a case to be made for an immutable GC which allows some form of deferred initialization of GC fields, however.

The World’s Most Over-engineered Alarm Clock

A few weeks ago, I set up my Raspberry Pi as a music server so that I could listen to music without having to deal with keeping my laptop in a certain corner of the room.

After setting it all up, it occurred to me: “I can do much more with this!”.

Waking up to go to class in the morning is always a challenge for me. It’s not that I don’t wake up — I often wake up, cancel all the alarms, and go back to bed. Half-asleep me somehow has the skill to turn off alarms, but not the discipline of going to class1. I’ve tried those apps that make you do math before you can cancel the alarm, and I’m able to do the math and go back to sleep without fully waking up in the process (Hey, I’m a physics student; math is what we do!).

So I decided to create an alarm clock. Not just any alarm clock, the most overengineered alarm clock I can think of.

It consists of the Pi connected to a speaker and kept in a hard-to-reach area of the room. The Pi is on a DHCP network. Every night, I ssh to the Pi, set the volume to full, and run a script which, using at and mpg123, will schedule jobs to run around the desired time of waking up. First, there will be a few pieces of soothing music (either violin music or parts of the Interstellar OST) run once or twice, a while before the time of waking up. Close to the time of waking up, there are a bunch of jobs where each one will run a string of annoying music. In my case, it’s the Minions’ banana song followed by Nyan Cat (I sometimes add more to this list).

So far so good.

Now, the soothing music gives asleep-me me a chance to surrender and wake up before the Nyan Cat begins, and often fear of Nyan Cat is a pretty good motivator to wake up. If I don’t wake up to the soft songs, the annoying ones invariably work.

At this stage I’m still pretty groggy and have the intense urge to go back to bed. However, turning off the alarm isn’t simple. Since it’s in a hard to reach area of the room, I can’t just turn it off. I need to get up, sit in my chair, and turn on the laptop (which is hibernated/suspended), and kill it via ssh.

This needs me to:

  • nmap the network to find the Pi (I’m the only one on this subnet who uses ssh, so this just needs a port filter)
  • ssh into the Pi, remembering the password (I haven’t done this yet but I could further complicate things by changing the password often to reduce muscle-memory)
  • killall mpg123 to kill the currently playing song
  • Cancel the remaining at jobs. This can be done with atq + atrm for every job (tedious and long), or with awk. If I’ve already fully woken up, I’m able to do the awk one, otherwise half-asleep me ends up doing the brute-force one, which is enough manual typing to shake the remaining bits of sleepiness off.

After this whole process, it’s pretty much guaranteed that I’m fully awake – there’s no going back now!

So far it’s worked pretty well (both when I’ve slept on time and when I haven’t). The first ten minutes after this I’m rather annoyed, but after that I’m happy I woke up. If half-asleep me eventually gets the muscle memory to get past this process, I should probably be able to tweak it to add more complexity or change the way it works.

Of course, having an arms race with oneself probably isn’t the best way to solve this problem. I suspect I’ll go back to regular alarms in a month or so, but it’s a fun experiment for now.

However, by the time I’m done with this alarm clock, I’ll either be waking up on time, or I’ll be able to Linux in my sleep, which is a win-win!


  1. As a fourth year student the fully-awake me also has a bit of trouble with this ;)

Adventures in Systems Programming: C++ Local Statics

For a while now I’ve been quite interested in compilers and systems programming in general; and I feel that an important feature of systems programming is that it’s relatively easy to figure out what a line of code does (modulo optimizations) at the OS or hardware level1. Conversely, it’s important to know how your tools work more than ever in systems programming. So when I see a language feature I’m not familiar with, I’m interested in finding out how it works under the hood.

I’m not a C++ expert. I can work on C++ codebases, but I’m not anywhere near knowing all of the features and nuances of C++. However, I am pretty good at Rust and understand a decent portion of the compiler internals. This gives me a great perspective — I’ve not yet internalized most C++ features to take them for granted, and I’m well equipped to investigate these features.

Today I came across some C++ code similar to the following2:

1
2
3
4
void foo() {
    static SomeType bar = Env()->someMethod();
    static OtherType baz = Env()->otherMethod(bar);
}

This code piqued my interest. Specifically, the local static stuff. I knew that when you have a static like

1
static int FOO = 1;

the 1 is stored somewhere in the .data section of the program. This is easily verified with gdb:

1
2
3
4
5
static int THING = 0xAAAA;

int main() {
 return 1;
}
1
2
3
4
5
6
$ g++ test.cpp -g
$ gdb a.out
(gdb) info addr THING
Symbol "THING" is static storage at address 0x601038.
(gdb) info symbol 0x601038
THING in section .data

This is basically a part of the compiled program as it is loaded into memory.

Similarly, when you have a static that is initialized with a function, it’s stored in the .bss section, and initialized before main(). Again, easily verified:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include<iostream>
using namespace std;

int bar() {
 cout<<"bar called\n";
 return 0xFAFAFA;
}

static int THING = bar();

int main() {
 cout<<"main called\n";
 return 0;
}
1
2
3
4
5
6
7
8
$ ./a.out
bar called
main called
$ gdb a.out
(gdb) info addr THING
Symbol "THING" is static storage at address 0x601198.
(gdb) info symbol 0x601198
THING in section .bss

We can also leave statics uninitialized (static int THING;) and they will be placed in .bss3.

So far so good.

Now back to the original snippet:

1
2
3
4
void foo() {
    static SomeType bar = Env()->someMethod();
    static OtherType baz = Env()->otherMethod(bar);
}

Naïvely one might say that these are statics which are scoped locally to avoid name clashes. It’s not much different from static THING = bar() aside from the fact that it isn’t a global identifier.

However, this isn’t the case. What tipped me off was that this called Env(), and I wasn’t so sure that the environment was guaranteed to be properly initialized and available before main() is called 4.

Instead, these are statics which are initialized the first time the function is called.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include<iostream>
using namespace std;

int bar() {
 cout<<"bar called\n";
 return 0xFAFAFA;
}

void foo() {
 cout<<"foo called\n";
 static int i = bar();
 cout<<"Static is:"<< i<<"\n";
}

int main() {
 cout<<"main called\n";
 foo();
 foo();
 foo();
 return 0;
}
1
2
3
4
5
6
7
8
9
10
$ g++ test.cpp
$ ./a.out
main called
foo called
bar called
Static is:16448250
foo called
Static is:16448250
foo called
Static is:16448250

Wait, “the first time the function is called”? Alarm bells go off… Surely there’s some cost to that! Let’s investigate.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ gdb a.out
(gdb) disas bar
   // snip
   0x0000000000400c72 <+15>:    test   %al,%al
   0x0000000000400c74 <+17>:    jne    0x400ca4 <_Z3foov+65>
   0x0000000000400c76 <+19>:    mov    $0x6021f8,%edi
   0x0000000000400c7b <+24>:    callq  0x400a00 <__cxa_guard_acquire@plt>
   0x0000000000400c80 <+29>:    test   %eax,%eax
   0x0000000000400c82 <+31>:    setne  %al
   0x0000000000400c85 <+34>:    test   %al,%al
   0x0000000000400c87 <+36>:    je     0x400ca4 <_Z3foov+65>
   0x0000000000400c89 <+38>:    mov    $0x0,%r12d
   0x0000000000400c8f <+44>:    callq  0x400c06 <_Z3barv>
   0x0000000000400c94 <+49>:    mov    %eax,0x201566(%rip)        # 0x602200 <_ZZ3foovE1i>
   0x0000000000400c9a <+55>:    mov    $0x6021f8,%edi
   0x0000000000400c9f <+60>:    callq  0x400a80 <__cxa_guard_release@plt>
   0x0000000000400ca4 <+65>:    mov    0x201556(%rip),%eax        # 0x602200 <_ZZ3foovE1i>
   0x0000000000400caa <+71>:    mov    %eax,%esi
   0x0000000000400cac <+73>:    mov    $0x6020c0,%edi
   // snip

The instruction at +44 calls bar(), and it seems to be surrounded by calls to some __cxa_guard functions.

We can take a naïve guess at what this does: It probably just sets a hidden static flag on initialization which ensures that it only runs once.

Of course, the actual solution isn’t as simple. It needs to avoid data races, handle errors, and somehow take care of recursive initialization.

Let’s look at the spec and one implementation, found by searching for __cxa_guard.

Both of them show us the generated code for initializing things like local statics:

1
2
3
4
5
6
7
8
9
10
11
12
  if (obj_guard.first_byte == 0) {
    if ( __cxa_guard_acquire (&obj_guard) ) {
      try {
      // ... initialize the object ...;
      } catch (...) {
        __cxa_guard_abort (&obj_guard);
        throw;
      }
      // ... queue object destructor with __cxa_atexit() ...;
      __cxa_guard_release (&obj_guard);
    }
  }

Here, obj_guard is our “hidden static flag”, with some other extra data.

__cxa_guard_acquire and __cxa_guard_release acquire and release a lock to prevent recursive initialization. So this program will crash:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include<iostream>
using namespace std;

void foo(bool recur);

int bar(bool recur) {
 cout<<"bar called\n";
 if(recur) {
    foo(false);
 }
 return 0xFAFAFA;
}

void foo(bool recur) {
 cout<<"foo called\n";
 static int i = bar(recur);
 cout<<"Static is:"<< i<<"\n";
}



int main() {
 foo(true);
 return 0;
}
1
2
3
4
5
6
7
8
$ g++ test.cpp
$ ./a.out
foo called
bar called
foo called
terminate called after throwing an instance of '__gnu_cxx::recursive_init_error'
  what():  std::exception
Aborted (core dumped)

Over here, to initialize i, bar() needs to be called, but bar() calls foo() which needs i to be initialized, which again will call bar() (though this time it won’t recurse). If i wasn’t static it would be fine, but now we have two calls trying to initialize i, and it’s unclear as to which value should be used.

The implementation is pretty interesting. Before looking at the code my quick guess was that the following would happen for local statics:

  • obj_guard is a struct containing a mutex and a flag with three states: “uninitialized”, “initializing”, and “initialized”. Alternatively, use an atomic state indicator.
  • When we try to initialize for the first time, the mutex is locked, the flag is set to “initializing”, the mutex is released, the value is initialized, and the flag is set to “initialized”.
  • If when acquiring the mutex, the value is “initialized”, don’t initialize again
  • If when acquiring the mutex, the value is “initializing”, throw some exception

(We need the tristate flag because without it recursion would cause deadlocks)

I suppose that this implementation would work, though it’s not the one being used. The implementation in bionic (the Android version of the C stdlib) is similar; it uses per-static atomics which indicate various states. However, it does not throw an exception when we have a recursive initialization, it instead seems to deadlock5. This is okay because the C++ spec says (Section 6.7.4)

If control re-enters the declaration (recursively) while the object is being initialized, the behavior is undefined.

However, the implementations in gcc/libstdc++ (also this version of libcppabi from Apple, which is a bit more readable) do something different. They use a global recursive mutex to handle reentrancy. Recursive mutexes basically can be locked multiple times by a single thread, but cannot be locked by another thread till the locking thread unlocks them the same number of times. This means that recursion/reentrancy won’t cause deadlocks, but we still have one- thread-at-a-time access. What these implementations do is:

  • guard_object is a set of two flags, one which indicates if the static is initialized, and one which indicates that the static is being initialized (“in use”)
  • If the object is initialized, do nothing (this doesn’t use mutexes and is cheap). This isn’t exactly part of the implementation in the library, but is part of the generated code.
  • If it isn’t initialized, acquire the global recursive lock
  • If the object is initialized by the time the lock was acquired, unlock and return
  • If not, check if the static is being initialized from the second guard_object flag. If it is “in use”, throw an exception.
  • If it wasn’t, mark the second flag of the static’s guard object as being “in use”
  • Call the initialization function, bubble errors
  • Unlock the global mutex
  • Mark the second flag as “not in use”

At any one time, only one thread will be in the process of running initialization routines, due to the global recursive mutex. Since the mutex is recursive, a function (eg bar()) used for initializing local statics may itself use (different) local statics. Due to the “in use” flag, the initialization of a local static may not recursively call its parent function without causing an error.

This doesn’t need per-static atomics, and doesn’t deadlock, however it has the cost of a global mutex which is called at most once per local static. In a highly threaded situation with lots of such statics, one might want to reevaluate directly using local statics.

LLVM’s libcxxabi is similar to the libstdc++ implementation, but instead of a recursive mutex it uses a regular mutex (on non-ARM Apple systems) which is unlocked before __cxa_guard_acquire exits and tests for reentrancy by noting the thread ID in the guard object instead of the “in use” flag. Condvars are used for waiting for a thread to stop using an object. On other platforms, it seems to deadlock, though I’m not sure.

So here we have a rather innocent-looking feature that has some hidden costs and pitfalls. But now I can look at a line of code where this feature is being used, and have a good idea of what’s happening there. One step closer to being a better systems programmer!

Thanks to Rohan Prinja, Eduard Burtescu, and Nishant Sunny for reviewing drafts of this blog post


  1. Emphasis on relatively. This article will show that it’s definitely not “easy” all the time.

  2. This was JNI code which obtained a JNI environment and pulled out method/class IDs from it to be used later

  3. Unless it has a constructor or otherwise isn’t made out of trivially constructible types; in this case it is treated similar to the previous case.

  4. I checked later, and it was indeed the case that global statics are initialized before Env() is ready

  5. I later verified this with a modification of the crashing program above stuck inside some JNI Android code.

How Rust Achieves Thread Safety

In every talk I have given till now, the question “how does Rust achieve thread safety?” has invariably come up1. I usually just give an overview, but this provides a more comprehensive explanation for those who are interested

See also: Huon’s blog post on the same topic

In my previous post I touched a bit on the Copy trait. There are other such “marker” traits in the standard library, and the ones relevant to this discussion are Send and Sync. I recommend reading that post if you’re not familiar with Rust wrapper types like RefCell and Rc, since I’ll be using them as examples throughout this post; but the concepts explained here are largely independent.

For the purposes of this post, I’ll restrict thread safety to mean no data races or cross-thread dangling pointers. Rust doesn’t aim to solve race conditions. However, there are projects which utilize the type system to provide some form of extra safety, for example rust- sessions attempts to provide protocol safety using session types.

These traits are auto-implemented using a feature called “opt in builtin traits”. So, for example, if struct Foo contains only Sync fields, it will also be Sync, unless we explicitly opt out using impl !Sync for Foo {}. Similarly, if struct Foo contains at least one non-Sync type, it will not be Sync either, unless it explicitly opts in (unsafe impl Sync for Foo {})

This means that, for example, a Sender for a Send type is itself Send, but a Sender for a non-Send type will not be Send. This pattern is quite powerful; it lets one use channels with non-threadsafe data in a single-threaded context without requiring a separate “single threaded” channel abstraction.

At the same time, structs like Rc and RefCell which contain Send/Sync fields have explicitly opted out of one or more of these because the invariants they rely on do not hold in threaded situations.

It’s actually possible to design your own library with comparable thread safety guarantees outside of the compiler — while these marker traits are specially treated by the compiler, the special treatment is not necessary for their working. Any two opt-in builtin traits could be used here.

Send and Sync have slightly differing meanings, but are very intertwined.

Send types can be moved between threads without an issue. It answers the question “if this variable were moved to another thread, would it still be valid for use?”. Most objects which completely own their contained data qualify here. Notably, Rc doesn’t (since it is shared ownership). Another exception is LocalKey, which does own its data but isn’t valid from other threads. Borrowed data does qualify to be Send, but in most cases it can’t be sent across threads due to a constraint that will be touched upon later.

Even though types like RefCell use non-atomic reference counting, it can be sent safely between threads because this is a transfer of ownership (a move). Sending a RefCell to another thread will be a move and will make it unusable from the original thread; so this is fine.

Sync, on the other hand, is about synchronous access. It answers the question: “if multiple threads were all trying to access this data, would it be safe?”. Types like Mutex and other lock/atomic based types implement this, along with primitive types. Things containing pointers generally are not Sync.

Sync is sort of a crutch to Send; it helps make other types Send when sharing is involved. For example, &T and Arc<T> are only Send when the inner data is Sync (there’s an additional Send bound in the case of Arc<T>). In words, stuff that has shared/borrowed ownership can be sent to another thread if the shared/borrowed data is synchronous-safe.

RefCell, while Send, is not Sync because of the non atomic reference counting.

Bringing it together, the gatekeeper for all this is thread::spawn(). It has the signature

1
pub fn spawn<F, T>(f: F) -> JoinHandle<T> where F: FnOnce() -> T, F: Send + 'static, T: Send + 'static

Admittedly, this is confusing/noisy, partially because it’s allowed to return a value, and also because it returns a handle from which we can block on a thread join. We can conjure a simpler spawn API for our needs though:

1
pub fn spawn<F>(f: F) where F: FnOnce(), F: Send + 'static

which can be called like:

1
2
3
4
5
6
7
8
9
let mut x = vec![1,2,3,4];

// `move` instructs the closure to move out of its environment
thread::spawn(move || {
   x.push(1);

});

// x is not accessible here since it was moved

In words, spawn() will take a callable (usually a closure) that will be called once, and contains data which is Send and 'static. Here, 'static just means that there is no borrowed data contained in the closure. This is the aforementioned constraint that prevents the sharing of borrowed data across threads; without it we would be able to send a borrowed pointer to a thread that could easily outlive the borrow, causing safety issues.

There’s a slight nuance here about the closures — closures can capture outer variables, but by default they do so by-reference (hence the move keyword). They autoimplement Send and Sync depending on their capture clauses. For more on their internal representation, see huon’s post. In this case, x was captured by-move; i.e. as Vec<T> (instead of being similar to &Vec<T> or something), so the closure itself can be Send. Without the move keyword, the closure would not be `‘static’ since it contains borrowed content.

Since the closure inherits the Send/Sync/'static-ness of its captured data, a closure capturing data of the correct type will satisfy the F: Send+'static bound.

Some examples of things that are allowed and not allowed by this function (for the type of x):

  • Vec<T>, Box<T> are allowed because they are Send and 'static (when the inner type is of the same kind)
  • &T isn’t allowed because it’s not 'static. This is good, because borrows should have a statically-known lifetime. Sending a borrowed pointer to a thread may lead to a use after free, or otherwise break aliasing rules.
  • Rc<T> isn’t Send, so it isn’t allowed. We could have some other Rc<T>s hanging around, and end up with a data race on the refcount.
  • Arc<Vec<u32>> is allowed (Vec<T> is Send and Sync if the inner type is); we can’t cause a safety violation here. Iterator invalidation requires mutation, and Arc<T> doesn’t provide this by default.
  • Arc<Cell<T>> isn’t allowed. Cell<T> provides copying-based internal mutability, and isn’t Sync (so the Arc<Cell<T>> isn’t Send). If this were allowed, we could have cases where larger structs are getting written to from different threads simultaneously resulting in some random mishmash of the two. In other words, a data race.
  • Arc<Mutex<T>> or Arc<RwLock<T>> are allowed (for Send T). The inner types use threadsafe locks and provide lock-based internal mutability. They can guarantee that only one thread is writing to them at any point in time. For this reason, the mutexes are Sync regardless of the inner T (as long as it is Send), and Sync types can be shared safely with wrappers like Arc. From the point of view of the inner type, it’s only being accessed by one thread at a time (slightly more complex in the case of RwLock), so it doesn’t need to know about the threads involved. There can’t be data races when Sync types like these are involved.

As mentioned before, you can in fact create a Sender/Receiver pair of non-Send objects. This sounds a bit counterintuitive — shouldn’t we be only sending values which are Send? However, Sender<T> is only Send if T is Send; so even if we can use a Sender of a non-Send type, we cannot send it to another thread, so it cannot be used to violate thread safety.

There is also a way to utilize the Send-ness of &T (which is not 'static) for some Sync T, namely thread::scoped. This function does not have the 'static bound, but it instead has an RAII guard which forces a join before the borrow ends. This allows for easy fork-join parallelism without necessarily needing a Mutex. Sadly, there are problems which crop up when this interacts with Rc cycles, so the API is currently unstable and will be redesigned. This is not a problem with the language design or the design of Send/Sync, rather it is a perfect storm of small design inconsistencies in the libraries.

Discuss: HN, Reddit


  1. So much that I added bonus slides about thread safety to the end of my deck, and of course I ended up using them at the talk I gave recently