In Pursuit of Laziness

Manish Goregaokar’s blog

Making Your Open Source Project Newcomer-friendly

One reason I really like open source is that it offers a lot of great opportunities for newish programmers to get some hands-on experience with real world problems. There’s only so much one can learn from small personal projects; but in open source one often gets to tackle interesting problems on large codebases — problems which wouldn’t occur in small/personal ones. There are also valuable skills related to collaboration to be learnt.

Because of this, I care quite a bit about making projects welcoming to new contributions, and try to improve this experience on projects I’m involved in. I’ve picked up a few tricks along the way. Most of these aren’t my ideas, I’ve gleaned them from watching people like Josh Matthews, Margaret Leibovic, Mike Conley, and Joel Maher do their thing. If you’re interested, here is a post by Margaret, and here are some of Josh’s slides from a presentation, both on the same subject.

Before I get started, bear in mind that making a project “newbie-friendly” isn’t something that magically happens. Like most things, it takes effort, but often this effort can come to fruition in the form of motivated contributors helping out on your project and eventually even becoming co-maintainers. It’s really worth it!

The simple stuff

There’s a lot of really easy stuff you can do to kickstart contributions to your own project. Most of this is obvious:

CONTRIBUTING.md

Add a CONTRIBUTING.md file. Keep it up to date. Link to it prominently from the README. The README should also have clear and detailed instructions for building the project. These two files are different – README is for those who want to use your project (perhaps by building from sources), CONTRIBUTING is for people who want to contribute.

Mention steps for getting involved: how to find something to work on, how to send a patch/make a pull request, a checklist of things to ensure your patch/PR satisfies before submission (e.g. passing tests, commit message guidelines, etc). Additionally, include some tips and tricks (like a link to the internal documentation) that can help new contributors, links to communication channels (IRC, Slack, Gitter, whatever) and anything else you may find helpful for someone considering contributing to your project. If you use some form of issue labeling, explanations of the labeling scheme can help folks find stuff they want to work on. An overview of the directory structure can be similarly helpful.

For some examples, check out the CONTRIBUTING.md files for servo and rust-clippy.

Maintain a list of easy bugs

More on this later, but try to use some form of tagging to mark easy bugs. I love this slide from Josh’s talk.

How to politely say f*** off

“Choose something to work on from our issue tracker.”
                                           - every project maintainer

Most bugs on issue trackers are nontrivial, steeped with jargon, devoid of actionable information, or otherwise inaccessible to the average new contributor. It’s relatively low effort to recognize bugs which are “easy” for maintainers, but it’s a lot of work for people unfamiliar to this project to figure this out.

A simple label on GitHub is all you need in most cases. Be sure to link to it from your contributing file!

Communicate!

Have open channels for communication. IRC is often the favorite here, though IRC is pretty alien for people getting involved in open source for the first time. If you’re using IRC, see if you can link to a web client (like Mibbit) with short instructions on how to join. Stuff like Gitter works too.

Mailing lists also work — everyone knows how to email! However, email can be intimidating to newcomers; many have a “omg I can’t ask my silly questions here!” attitude which stops them from progressing.

Explicitly inviting questions in each issue helps here. Clippy doesn’t have a mailing list or IRC channel (too small a project), but I encourage people working on new bugs to ping me on any communication channel they’d like. I’ve mentored people over email, GH issue threads, IRC, even reddit PMs, and it’s worked out fine in each case.

Often folks will PM you for help. Provide help, but encourage them to ask questions in the main venue. This has the twofold benefit of showing everyone that the main channel is open to questions, and it also helps people get quicker answers since someone else can answer if you’re not around.

Recognition

Celebrate new contributors. Tweet about them. Mention them in blog posts. Getting a two-line patch accepted in an open source project doesn’t sound like much, but when you’re just getting started, it’s a very awesome feeling. Make it more awesome! Both This Week In Rust and This Week In Servo mention new contributors (sometimes we tweet about it too), and I’ve often got very happy messages from these contributors about the mentions.

Add a code of conduct

A lot of folks have had bad experiences with people online, often in other open source communities and may be wary about joining others. A code of conduct is a statement that unsavory behavior won’t be tolerated, which helps make the project more welcoming and appealing to these people, simultaneously making it a nicer place which is helpful for everyone. Of course, you should be prepared to enforce the code of conduct if the situation requires it.

I use the Rust code of conduct but the Contributor Covenant is good, too. Various language/framework communities often have their own favorite code of conduct. Pick one.

Empathize!

We often forget how hard it is to jumpstart in something we’re an active part of. For example, for many of us the process of making a pull request is almost second nature.

However, not everyone is used to these things. I’ve seen contributors who can code well but haven’t used Github in the past having lots of trouble making and updating a pull request. The same applies to other workflow things; like code review, version control1, or build system peculiarities.

Keep this in mind when dealing with new contributors. These are skills which can be picked up relatively quickly, but those without them will have a frustrating experience and end up asking you a lot of questions.

Improving the newcomer experience

Alright, now you’ve gotten all the basics done. People now have a vague idea of how to contribute to your project. Let’s make it more concrete.

Mentoring

Don’t just leave an easy bug open. Offer to mentor it! This is a very fun and rewarding experience, and of course contributors are more likely to stick around in a project they percieve to be helpful and welcoming.

It’s often better to go one step further and give tips for fixing the issue before anyone even picks it up (example). Communication in open source has latency — the contributor might be on the other side of the planet, or might otherwise be contributing at a different time of the day than you2. Reducing the number of back-and-forth cycles is really helpful here, and giving some info so that a contributor can get started immediately without needing to wait for a response goes a long way in improving the newcomer experience.

Avoid creating a mentored bug where you yourself aren’t certain on how to fix it. Ideally, you should know the exact steps to take to fix a bug before marking it as mentored. Don’t divulge all the steps to the mentee, but the exercise of solving the bug yourself (without writing the code) ensures that there aren’t any hidden traps.

Mostly mentorship just involves answering questions and laying out a path for the mentee. Be sure to encourage questions in the first place! A lot of people, especially students, are intimidated when joining open source and try to stay as quiet as possible. For a healthy mentorship, you want them to ask questions. A lot. Encourage this.

Remember that in many cases the new contributor may be intimidated by you. For example, I’ve often come across new Firefox contributors (who I introduced to the project) asking me questions instead of their assigned mentor because “the mentor works for Mozilla and is way too awesome for me to bug with questions”. This wasn’t something the mentor told them (Firefox mentors are all very nice and helpful people), it was a conclusion they came to on their own — one which would impede their progress on the bug.

One trick that helps mitigate this is encouraging questions in your main channel. When people PM me with questions on IRC, I answer their questions, but also encourage them to ask in the main IRC channel next time. This is good for everyone — It gives the channel an aura of being “okay to ask questions in” (if other people see that questions are being asked and answered in the channel), and it also lets other maintainers jump in to help the new contributor in case I’m not around.

Once a new contributor has fixed a bug, mentorship isn’t over – it’s just started! See if you can find something more involved for them to work in a related area of the codebase. Get to know the contributor too, a sense of familiarity goes a long way in reducing intimidation and other friction.

Tailoring process for newcomers

Most open source projects have a set of hoops you have to jump through for a pull request to be accepted. These are necessary for the health of the project and pretty straightforward for existing contributors, but can be intimidating for new ones. They also add extra cycles of communication. I’ve often seen people put up almost-working patches, and disappear after a few cycles — even though the bulk of the work was done and there were just process issues (or code nits) left over for merging; which can be quite disheartening. Reducing extra process helps mitigate this.

For example, Servo uses this great tool called Reviewable for code review. Regular contributors don’t have much friction whilst using this, so we use it wherever possible. However, for small pull requests from new contributors I avoid using Reviewable and instead opt to review directly from the GitHub interface. For these pull requests I don’t need Reviewable’s features, so I don’t lose much, but now the contributor has to go through one less hoop.

Similarly, for rust-clippy, I often make minor fixes and run the readme update script on behalf of the contributor instead of asking them to do it themselves. I usually check out the PR locally, run git merge pr-branch --no-commit --no-ff3, make edits, commit and push. This way the PR still gets marked as merged (commit --amend doesn’t do that), and the history stays bisectable.

OpenSSL uses a mailing list for patches, however they allow contributions via GitHub too. Most seasoned contributors probably stick to the mailing list, but new contributors can use the familiar GitHub interface if they want, reducing friction.

Of course, cutting down on (necessary) process should only be done for the first one or two contributions; try to educate the newcomer about your processes as time passes.

An alternate way to tackle this issue is to go the other way around and teach process first. Give newcomers an extremely easy bug that just involves replacing a string or some other simple one-line fix, and help them push it through the process. This way, the next time they work on something, they’ll be familiar with the process and be able to devote more time to the actual code.

Creating easy bugs

At some point down this road many projects have a problem where there are people who want to contribute, but not enough suitable easy bugs.

One technique that has helped me create a lot of easy bugs is to just look out for separable and non-critical subfeatures when working on something. There often are things like polish or other small features which you don’t need to include in the main pull request, but you do anyway because it’s a few extra seconds of work. If you think it can be split out as an easy bug, go ahead and file it!

For example, whilst working on some form issues, instead of completely implementing something, I implemented just what I needed, and filed an easy bug. This is another bug with a very similar situation; I’d implemented the framework for form submission, made it work with <input>, and filed an easy bug for wiring it up to <button>.

Sometimes you may not find a subfeature that can be split out, but you may notice something else which could be improved. This is an example of such an issue. I was working on something else, and noticed that this area of the code could be designed better. Whilst I could have fixed it myself with very little effort as part of my other changes, I made it into an easy bug instead.

Simple refactorings can be a source of easy bugs too. These require familiarity with the language, but not much more, so they’re ideal for people new to the project.

It’s also possible to take a hard bug and make it easier, either by partially implementing it, or giving enough hints (code links, explanations, etc) that the hard part is already taken care of.

Avoid making “critical” (i.e, needs to land in a week or two) features into easy bugs. Even simple changes can take a while for new contributors (especially due to the nature of asynchronous communication, lack of time, and/or getting bogged down in the process). Easy bugs should be something which you eventually want, but are okay with it taking longer to solve. It’s very disheartening for a new contributor if they are working on something and a maintainer solves it for them because it was needed to land quickly. (Given enough time this will eventually happen for some bug, in such a case see if you can provide a different bug for them to work on and apologize)

Discoverability

Make it super easy for newcomers to find a bug they want to work on; not just any easy bug!

Bugs Ahoy and What Can I do for Mozilla are both great examples of this. Servo has servo-starters.

There are also various sites where you can list your easy bugs, some of which are listed in Emily’s post.

Projects and more involved participation

Having easy bugs and mentoring newcomers is just one step. You probably want to have these newcomers working on harder stuff, projects, and perhaps eventually maintianing/reviewing!

For many people these steps may not necessarily require involvement from you; I’ve seen professional software developers move their way to being a maintainer with very little mentorship just because they’re experienced enough to figure out how the project works on their own.

However, many of your contributors may be students or otherwise inexperienced; indeed they may be contributing to your project to gain this experience and become better developers. Such people can become valuable members of the team with some effort.

This mostly involves nudging people towards harder bugs and/or projects. It’s also very valuable to maintain a list of “student projects” (noncritical but large bodies of work). These can be picked up by contributors or sometimes students wishing to do a project for course credit.

It’s important to try and provide a logical series of issues instead of picking things randomly around the project so that the contributor can focus on one part of the codebase while starting out. If the issues all culminate in a large feature, even better.

Joel Maher and the Mozilla Tools team have started a pretty great program called “Quarter of Contribution” which provides focused mentorship for a particular project. It seems to work out well. Programs like Google Summer of Code and Outreachy also provide ways for new contributors to try out your project at a significant level of involvement.

Creating such projects or harder bugs is a nontrivial problem, and I don’t have a clear idea on how this can be done (aside from using similar techniques as listed in the “creating easy bugs” section above). Ideas welcome!

Projects aren’t always necessary here either. Depending on the contributor, sometimes you can present them with a regular (i.e., not “easy” or otherwise earmarked) issue to work on, provide some hints, and tell them to try and figure stuff out without your help (or with less help). Stay involved, and encourage them to ask questions of others or initiate discussions, but try to stick to observing. It’s really fun to watch someone figure stuff out on their own. I did this with a contributor here, where I only provided the initial hint and the code review; as well as here, where I encouraged the contributor to initiate and direct the relevant bikeshedding on various channels without my involvement. The contributor is now more in touch with the Rust community and codebase as a result; and for me I enjoyed watching him figure stuff out and direct discussions on his own.

Mentor! Share!

I’m still exploring these techniques myself. I’ve had great results applying some of these to clippy, and we already apply many of them to Servo. I’m slowly working on applying these techniques to Rust.

While some of these projects I’m always open to hearing about more ideas for making it easier for newcomers to contribute, so please let me know if you have any ideas or experiences to share!

Mentoring, while a lot of work, is an insanely rewarding experience, and I hope you try to incorporate it into your open source projects!

Thanks to Josh Matthews, James Graham, Emily Dunham, and Joel Maher for feedback on drafts of this post

discuss: HN


  1. I’ve lost track of the number of times I’ve helped someone through git rebase and merge conflicts on Servo

  2. This often happens with open source projects with paid staff – the staff is around during the workday, but the contributors are around during the evenings, so there’s less overlap.

  3. There are various reasons why you should not do this, mainly because non-merge-related changes in merge commits are hard to track down. Be aware of the downsides and use this trick judiciously.

Designing a GC in Rust

For a while I’ve been working on a garbage collector for Rust with Michael Layzell. I thought this would be a good time to talk of our design and progress so far.

Motivation

“Wait”, you ask, “why does Rust need a garbage collector”? Rust is supposed to work without a GC, that’s one of its main selling points!

True. Rust does work pretty well without a GC. It’s managed to do without one so far, and we still have all sorts of well-written crates out there (none of which use a GC).

But Rust is not just about low-cost memory safety. It’s also about choosing your costs and guarantees. Box<T> and stack allocation are not always sufficient, sometimes one needs to reach for something like Rc<T> (reference counting). But even Rc is not perfect; it can’t handle cycles between pointers. There are solutions to that issue like using Weak<T>, but that only works in limited cases (when you know what the points-to graph looks like at compile time), and isn’t very ergonomic.

Cases where one needs to maintain a complicated, dynamic graph are where a GC becomes useful. Similarly, if one is writing an interpreter for a GCd language, having a GC in Rust would simplify things a lot.

Not to say that one should pervasively use a GC in Rust. Similar to Rc<T>, it’s best to use regular ownership-based memory management as much as possible, and sprinkle Rc/Gc in places where your code needs it.

Previous designs

This isn’t the first GC in Rust. Automatic memory management has existed before in various forms, but all were limited.

Besides the ones listed below, Nick Fitzgerald’s cycle collector based on this paper exists and is something that you should look into if you’re interested. There’s also an RFC by Peter Liniker which sketches out a design for an immutable GC.

Core Rust GC(s)

Rust itself had a garbage collector until a bit more than a year ago. These “managed pointers” (@T) were part of the language. They were removed later with a plan to make GC a library feature.

I believe these were basically reference counted (cycle collected?) pointers with some language integration, but I’m not sure.

Nowadays, the only form of automatic memory management in Rust are via Rc and Arc which are nonatomic and atomic reference counted pointers respectively. In other words, they keep track of the number of shared references via a reference count (incremented when it is cloned, decremented when destructors run). If the reference count reaches zero, the contents are cleaned up.

This is a pretty useful abstraction, however, as mentioned above, it doesn’t let you create cycles without leaking them.

Spidermonkey

You can read more about Servo’s Spidermonkey bindings in this blog post (somewhat outdated, but still relevant)

In Servo we use bindings to the Spidermonkey Javascript engine. Since Javascript is a garbage collected language, the Rust representations of Javascript objects are also garbage collected.

Of course, this sort of GC isn’t really useful for generic use since it comes bundled with a JS runtime. However, the Rust side of the GC is of a design that could be used in an independent library.

The Rust side of the Spidermonkey GC is done through a bunch of smart pointers, and a trait called JSTraceable. JSTraceable is a trait which can “trace” recursively down some data, finding and marking all GC-managed objects inside it. This is autoderived using Rust’s plugin infrastructure, so a simple #[jstraceable] annotation will generate trace hooks for the struct it is on.

Now, we have various smart pointers. The first is JS<T>. This is opaque, but can be held by other GC-managed structs. To use this on the stack, this must be explicitly rooted, via .root(). This produces a Root<T>, which can be dereferenced to get the inner object. When the Root is created, the contained object is listed in a collection of “roots” in a global. A root indicates that the value is being used on the stack somewhere, and the GC starts tracing usage from these roots. When the Root<T> is destroyed, the root is removed.

The problem with this is that JS<T> doesn’t work on the stack. There is no way for the GC to know that we are holding on to JS<T> on the stack. So, if I copy a JS<T> to the stack, remove all references to it from objects in the GC heap, and trigger a collection, the JS<T> will still be around on the stack after collection since the GC can’t trace to it. If I attempt to root it, I may get a panic or a segfault depending on the implementation.

To protect against this, we have a bunch of lints. The relevant one here protects against JS<T> from being carried around on the stack; but like most lints, it’s not perfect.

To summarize: Spidermonkey gives us a good GC. However using it for a generic Rust program is ill advised. Additionally, Servo’s wrappers around the GC are cheap, but need lints for safety. While it would probably be possible to write safer wrappers for general usage, it’s pretty impractical to carry around a JS runtime when you don’t need one.

However, Spidermonkey’s GC did inspire me to think more into the matter.

Brainstorming a design

For quite a while I’d had various ideas about GCs. Most were simplifications of Servo’s wrappers (there’s some complexity brought in there by Spidermonkey that’s not necessary for a general GC). Most were tracing/rooting with mark-and-sweep collection. All of them used lints. Being rather busy, I didn’t really work on it past that, but planned to work on it if I could find someone to work with.

One day, Michael pinged me on IRC and asked me about GCs. Lots of people knew that I was interested in writing a GC for Rust, and one of them directed him to me when he expressed a similar interest.

So we started discussing GCs. We settled on a tracing mark-and-sweep GC. In other words, the GC runs regular “sweeps” where it first “traces” the usage of all objects and marks them and their children as used, and then sweeps up all unused objects.

This model on its own has a flaw. It doesn’t know about GC pointers held on the stack as local variables (“stack roots”). There are multiple methods for solving this. We’ve already seen one above in the Spidermonkey design – maintain two types of pointers (one for the stack, one for the heap), and try very hard using static analysis to ensure that they don’t cross over.

A common model (used by GCs like Boehm, called “conservative GCs”) is to do something called “stack scanning”. In such a system, the GC goes down the stack looking for things which may perhaps be GC pointers. Generally the GC allocates objects in known regions of the memory, so a GC pointer is any value on the stack which belongs to one of these regions.

Of course, this makes garbage collection rather inefficient, and will miss cases like Box<Gc<T>> where the GCd pointer is accessible, but through a non-GC pointer.

We decided rather early on that we didn’t want a GC based on lints or stack scanning. Both are rather suboptimal solutions in my opinion, and very hard to make sound1. We were also hoping that Rust’s type system and ownership semantics could help us in designing a good, safe, API.

So, we needed a way to keep track of roots, and we needed a way to trace objects.

Tracing

The latter part was easy. We wrote a compiler plugin (well, we stole Servo’s tracing plugin which I’d written earlier) which autoderives an implementation of the Trace trait on any given struct or enum, using the same internal infrastructure that #[derive(PartialEq)] and the rest use. So, with just the following code, it’s easy to make a struct or enum gc-friendly:

1
2
3
4
5
6
7
8
9
10
#[derive(Trace)]
struct Foo {
    x: u8,
    y: Bar,
}

#[derive(Trace)]
enum Bar {
    Baz(u8), Quux
}

For a foo of type Foo foo.trace(), will expand to a call of foo.x.trace() and foo.y.trace(). bar.trace() will check which variant it is and call trace() on the u8 inside if it’s a Baz. For most structs this turns out to be a no-op and is often optimized away by inlining, but if a struct contains a Gc<T>, the special implementation of Trace for Gc<T> will “mark” the traceability of the Gc<T>. Types without Trace implemented cannot be used in types implementing Trace or in a Gc, which is enforced with a T: Trace bound on Gc<T>.

So, we have a way of walking the fields of a given object and finding inner Gc<T>s. Splendid. This lets us write the mark&sweep phase easily: Take the list of known reachable Gc<T>s, walk their contents until you find more Gc<T>s (marking all you find), and clean up any which aren’t reachable.

Rooting

Of course, now we have to solve the problem of keeping track of the known reachable Gc<T>s, i.e. the roots. This is a hard problem to solve without language support, and I hope that eventually we might be able to get the language hooks necessary to solve it. LLVM has support for tracking GCthings on the stack, and some day we may be able to leverage that in Rust.

As noted above, Spidermonkey’s solution was to have non-rooted (non-dereferencable) heap pointers, which can be explicitly converted to rooted pointers and then read.

We went the other way. All Gc<T> pointers, when created, are considered “rooted”. The instance of Gc<T> has a “rooted” bit set to true, and the underlying shared box (GcBox, though this is not a public interface) has its “root count” set to one.

When this Gc<T> is cloned, an identical Gc<T> (with rooted bit set to true) is returned, and the underlying root count is incremented. Cloning a Gc does not perform a deep copy.

1
2
3
4
5
6
let a = Gc::new(20); // a.root = true, (*a.ptr).roots = 1, (*a.ptr).data = 20

// ptr points to the underlying box, which contains the data as well as
// GC metadata like the root count. `Gc::new()` will allocate this box

let b = a.clone(); // b.root = true, (*a.ptr).roots++, b.ptr = a.ptr

This is rather similar to how Rc works, however there is no root field, and the roots counter is called a “reference counter”.

For regular local sharing, it is recommended to just use a borrowed reference to the inner variable (borrowing works fine with rust-gc!) since there is no cost to creating this reference.

When a GC thing is put inside another GC thing, the first thing no longer can remain a root. This is handled by “unrooting” the first GC thing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
struct Foo {
    bar: u32,
    baz: Gc<u32>,
}

let a = Gc::new(20); // why anyone would want to GC an integer I'll never know
                     // but I'll stick with this example since it's simple

let b = Gc::new(Foo {bar: 1, baz: a});
// a.root = false, (*a.ptr).roots--
// b initialized similar to previous example

// `a` was moved into `b`, so now `a` cannot be accessed directly here
// other than through `b`, and `a` is no longer a root.
// To avoid moving a, passing `a.clone()` to `b` will work

Of course, we need a way to traverse the object passed to the Gc<T>, in this case Foo, and look for any contained Gc<T>s to unroot. Sound familiar? This needs the same mechanism that trace() needed! We add struct-walking root() and unroot() methods to the Trace trait which are auto- derived exactly the same way, and continue. (We don’t need root() right now, but we will need it later on).

Now, during collection, we can just traverse the list of GcBoxs and use the ones with a nonzero root count as roots for our mark traversal.

So far, so good. We have a pretty sound design for a GC that works … for immutable data.

Mutability

Like Rc<T>, Gc<T> is by default immutable. Rust abhors aliasable mutability, even in single threaded contexts, and both these smart pointers allow aliasing.

Mutation poses a problem for our GC, beyond the regular problems of aliasable mutability: It’s possible to move rooted things into heap objects and vice versa:

1
2
3
4
5
6
7
8
9
10
11
12
let x = Gc::new(20);

let y = Gc::new(None);

*y = Some(x); // uh oh, x is still considered rooted!

// and the reverse!

let y = Gc::new(Some(Gc::new(20)));

let x = y.take(); // x was never rooted!
// `take()` moves the `Some(Gc<u32>)` out of `y`, replaces it with `None`       

Since Gc<T> doesn’t implement DerefMut, none of this is possible — one cannot mutate the inner data. This is one of the places where Rust’s ownership/mutability system works out awesomely in our favor.

Of course, an immutable GC isn’t very useful. We can’t even create cycles in an immutable GC, so why would anyone need this in the first place2?

So of course, we needed to make it somehow mutable. People using Rc<T> solve this problem by using RefCell<T>, which maintains something similar to the borrow semantics at runtime and is internally mutable. RefCell<T> itself can’t be used by us since it doesn’t guard against the problem illustrated above (and hence won’t implement Trace, but a similar cell type would work).

So we created GcCell<T>. This behaves just like RefCell<T>, except that it will root() before beginning a mutable borrow, and unroot() before ending it (well, only if it itself is not rooted, which is tracked by an internal field similar to Gc<T>). Now, everything is safe:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#[derive(Trace)]
struct Foo {
    a: u8,
    b: GcCell<Gc<u8>>,
}

let x = Gc::new(20);

let y = Gc::new(Foo {a: 10, b: Gc::new(30)});
{
    *y.b.borrow_mut() = x; // the `Gc(30)` from `y.b` was rooted by this call
                           // but since we don't actually use it here,
                           // the destructor gets rid of it.
                           // We could use swap() to retain access to it.
    // ...
    // x unrooted
}


// and the reverse case works too:

let y = Gc::new(GcCell::new(Some(Gc::new(20))));

let x = y.borrow_mut().take(); // the inner `Some(Gc(20))` gets rooted by `borrow_mut()`
                               // before `x` can access it

So now, mutation works too! We have a working garbage collector!

Open problems

Destructors

I believe this can be solved without lints, but it may require some upcoming features of Rust to be implemented first (like specialization).

In essence, destructors implemented on a value inside Gc<T> can be unsafe. This will only happen if they try to access values within a Gc<T> — if they do, they may come across a box that has already been collected, or they may lengthen the lifetime of a box scheduled to be collected.

The basic solution to this is to use “finalizers” instead of destructors. Finalizers, like in Java, are not guaranteed to run. However, we may need further drop hooks or trait specialization to make an airtight interface for this. I don’t have a concrete design for this yet, though.

Concurrency

Our model mostly just works in a concurrent situation (with thread safety tweaks, of course); in fact it’s possible to make it so that the concurrent GC will not “stop the world” unless someone tries to do a write to a GcCell. We have an experimental concurrent GC in this pull request. We still need to figure out how to make interop between both GCs safe, though we may just end up making them such that an object using one GC cannot be fed to an object using the other.

Performance

So far we haven’t really focused on performance, and worked on ensuring safety. Our collection triggering algorithm, for example, was horribly inefficient, though we planned on improving it. The wonderful Huon fixed this, though.

Similarly, we haven’t yet optimized storage. We have some ideas which we may work on later. (If you want to help, contributions welcome!)

Cross-crate deriving

Currently, an object deriving Trace should have Traceable children. This isn’t always possible when members from another crate (which does not depend on rust-gc) are involved. At the moment, we allow an #[unsafe_ignore_trace] annotation on fields which are of this type (which excludes it from being traced – if that crate doesn’t transitively depend on rust-gc, its members cannot contain GCthings anyway unless generics are involved). It should be possible to detect whether or not this is safe, and/or autoderive Trace using the opt-in builtin traits framework (needs specialization to work), but at the moment we don’t do anything other than expose that annotation.

Stdlib support for a global Trace trait that everyone derives would be awesome.

Conclusion

Designing a GC was a wonderful experience! I didn’t get to write much code (I was busy and Michael was able to implement most of it overnight because he’s totally awesome), but the long design discussions followed by trying to figure out holes in the GC design in every idle moment of the day were quite enjoyable. GCs are very hard to get right, but it’s very satisfying when you come up with a design that works! I’m also quite happy at how well Rust helped in making a safe interface.

I encourage everyone to try it out and/or find holes in our design. Contributions of all kind welcome, we’d especially love performance improvements and testcases.

Discuss: HN, Reddit


  1. I’m very skeptical that it’s possible to make either of these completely sound without writing lints which effectively rewrite a large chunk of the compiler

  2. There is a case to be made for an immutable GC which allows some form of deferred initialization of GC fields, however.

The World’s Most Over-engineered Alarm Clock

A few weeks ago, I set up my Raspberry Pi as a music server so that I could listen to music without having to deal with keeping my laptop in a certain corner of the room.

After setting it all up, it occurred to me: “I can do much more with this!”.

Waking up to go to class in the morning is always a challenge for me. It’s not that I don’t wake up — I often wake up, cancel all the alarms, and go back to bed. Half-asleep me somehow has the skill to turn off alarms, but not the discipline of going to class1. I’ve tried those apps that make you do math before you can cancel the alarm, and I’m able to do the math and go back to sleep without fully waking up in the process (Hey, I’m a physics student; math is what we do!).

So I decided to create an alarm clock. Not just any alarm clock, the most overengineered alarm clock I can think of.

It consists of the Pi connected to a speaker and kept in a hard-to-reach area of the room. The Pi is on a DHCP network. Every night, I ssh to the Pi, set the volume to full, and run a script which, using at and mpg123, will schedule jobs to run around the desired time of waking up. First, there will be a few pieces of soothing music (either violin music or parts of the Interstellar OST) run once or twice, a while before the time of waking up. Close to the time of waking up, there are a bunch of jobs where each one will run a string of annoying music. In my case, it’s the Minions’ banana song followed by Nyan Cat (I sometimes add more to this list).

So far so good.

Now, the soothing music gives asleep-me me a chance to surrender and wake up before the Nyan Cat begins, and often fear of Nyan Cat is a pretty good motivator to wake up. If I don’t wake up to the soft songs, the annoying ones invariably work.

At this stage I’m still pretty groggy and have the intense urge to go back to bed. However, turning off the alarm isn’t simple. Since it’s in a hard to reach area of the room, I can’t just turn it off. I need to get up, sit in my chair, and turn on the laptop (which is hibernated/suspended), and kill it via ssh.

This needs me to:

  • nmap the network to find the Pi (I’m the only one on this subnet who uses ssh, so this just needs a port filter)
  • ssh into the Pi, remembering the password (I haven’t done this yet but I could further complicate things by changing the password often to reduce muscle-memory)
  • killall mpg123 to kill the currently playing song
  • Cancel the remaining at jobs. This can be done with atq + atrm for every job (tedious and long), or with awk. If I’ve already fully woken up, I’m able to do the awk one, otherwise half-asleep me ends up doing the brute-force one, which is enough manual typing to shake the remaining bits of sleepiness off.

After this whole process, it’s pretty much guaranteed that I’m fully awake – there’s no going back now!

So far it’s worked pretty well (both when I’ve slept on time and when I haven’t). The first ten minutes after this I’m rather annoyed, but after that I’m happy I woke up. If half-asleep me eventually gets the muscle memory to get past this process, I should probably be able to tweak it to add more complexity or change the way it works.

Of course, having an arms race with oneself probably isn’t the best way to solve this problem. I suspect I’ll go back to regular alarms in a month or so, but it’s a fun experiment for now.

However, by the time I’m done with this alarm clock, I’ll either be waking up on time, or I’ll be able to Linux in my sleep, which is a win-win!


  1. As a fourth year student the fully-awake me also has a bit of trouble with this ;)

Adventures in Systems Programming: C++ Local Statics

For a while now I’ve been quite interested in compilers and systems programming in general; and I feel that an important feature of systems programming is that it’s relatively easy to figure out what a line of code does (modulo optimizations) at the OS or hardware level1. Conversely, it’s important to know how your tools work more than ever in systems programming. So when I see a language feature I’m not familiar with, I’m interested in finding out how it works under the hood.

I’m not a C++ expert. I can work on C++ codebases, but I’m not anywhere near knowing all of the features and nuances of C++. However, I am pretty good at Rust and understand a decent portion of the compiler internals. This gives me a great perspective — I’ve not yet internalized most C++ features to take them for granted, and I’m well equipped to investigate these features.

Today I came across some C++ code similar to the following2:

1
2
3
4
void foo() {
    static SomeType bar = Env()->someMethod();
    static OtherType baz = Env()->otherMethod(bar);
}

This code piqued my interest. Specifically, the local static stuff. I knew that when you have a static like

1
static int FOO = 1;

the 1 is stored somewhere in the .data section of the program. This is easily verified with gdb:

1
2
3
4
5
static int THING = 0xAAAA;

int main() {
 return 1;
}
1
2
3
4
5
6
$ g++ test.cpp -g
$ gdb a.out
(gdb) info addr THING
Symbol "THING" is static storage at address 0x601038.
(gdb) info symbol 0x601038
THING in section .data

This is basically a part of the compiled program as it is loaded into memory.

Similarly, when you have a static that is initialized with a function, it’s stored in the .bss section, and initialized before main(). Again, easily verified:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include<iostream>
using namespace std;

int bar() {
 cout<<"bar called\n";
 return 0xFAFAFA;
}

static int THING = bar();

int main() {
 cout<<"main called\n";
 return 0;
}
1
2
3
4
5
6
7
8
$ ./a.out
bar called
main called
$ gdb a.out
(gdb) info addr THING
Symbol "THING" is static storage at address 0x601198.
(gdb) info symbol 0x601198
THING in section .bss

We can also leave statics uninitialized (static int THING;) and they will be placed in .bss3.

So far so good.

Now back to the original snippet:

1
2
3
4
void foo() {
    static SomeType bar = Env()->someMethod();
    static OtherType baz = Env()->otherMethod(bar);
}

Naïvely one might say that these are statics which are scoped locally to avoid name clashes. It’s not much different from static THING = bar() aside from the fact that it isn’t a global identifier.

However, this isn’t the case. What tipped me off was that this called Env(), and I wasn’t so sure that the environment was guaranteed to be properly initialized and available before main() is called 4.

Instead, these are statics which are initialized the first time the function is called.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include<iostream>
using namespace std;

int bar() {
 cout<<"bar called\n";
 return 0xFAFAFA;
}

void foo() {
 cout<<"foo called\n";
 static int i = bar();
 cout<<"Static is:"<< i<<"\n";
}

int main() {
 cout<<"main called\n";
 foo();
 foo();
 foo();
 return 0;
}
1
2
3
4
5
6
7
8
9
10
$ g++ test.cpp
$ ./a.out
main called
foo called
bar called
Static is:16448250
foo called
Static is:16448250
foo called
Static is:16448250

Wait, “the first time the function is called”? Alarm bells go off… Surely there’s some cost to that! Let’s investigate.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ gdb a.out
(gdb) disas bar
   // snip
   0x0000000000400c72 <+15>:    test   %al,%al
   0x0000000000400c74 <+17>:    jne    0x400ca4 <_Z3foov+65>
   0x0000000000400c76 <+19>:    mov    $0x6021f8,%edi
   0x0000000000400c7b <+24>:    callq  0x400a00 <__cxa_guard_acquire@plt>
   0x0000000000400c80 <+29>:    test   %eax,%eax
   0x0000000000400c82 <+31>:    setne  %al
   0x0000000000400c85 <+34>:    test   %al,%al
   0x0000000000400c87 <+36>:    je     0x400ca4 <_Z3foov+65>
   0x0000000000400c89 <+38>:    mov    $0x0,%r12d
   0x0000000000400c8f <+44>:    callq  0x400c06 <_Z3barv>
   0x0000000000400c94 <+49>:    mov    %eax,0x201566(%rip)        # 0x602200 <_ZZ3foovE1i>
   0x0000000000400c9a <+55>:    mov    $0x6021f8,%edi
   0x0000000000400c9f <+60>:    callq  0x400a80 <__cxa_guard_release@plt>
   0x0000000000400ca4 <+65>:    mov    0x201556(%rip),%eax        # 0x602200 <_ZZ3foovE1i>
   0x0000000000400caa <+71>:    mov    %eax,%esi
   0x0000000000400cac <+73>:    mov    $0x6020c0,%edi
   // snip

The instruction at +44 calls bar(), and it seems to be surrounded by calls to some __cxa_guard functions.

We can take a naïve guess at what this does: It probably just sets a hidden static flag on initialization which ensures that it only runs once.

Of course, the actual solution isn’t as simple. It needs to avoid data races, handle errors, and somehow take care of recursive initialization.

Let’s look at the spec and one implementation, found by searching for __cxa_guard.

Both of them show us the generated code for initializing things like local statics:

1
2
3
4
5
6
7
8
9
10
11
12
  if (obj_guard.first_byte == 0) {
    if ( __cxa_guard_acquire (&obj_guard) ) {
      try {
      // ... initialize the object ...;
      } catch (...) {
        __cxa_guard_abort (&obj_guard);
        throw;
      }
      // ... queue object destructor with __cxa_atexit() ...;
      __cxa_guard_release (&obj_guard);
    }
  }

Here, obj_guard is our “hidden static flag”, with some other extra data.

__cxa_guard_acquire and __cxa_guard_release acquire and release a lock to prevent recursive initialization. So this program will crash:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include<iostream>
using namespace std;

void foo(bool recur);

int bar(bool recur) {
 cout<<"bar called\n";
 if(recur) {
    foo(false);
 }
 return 0xFAFAFA;
}

void foo(bool recur) {
 cout<<"foo called\n";
 static int i = bar(recur);
 cout<<"Static is:"<< i<<"\n";
}



int main() {
 foo(true);
 return 0;
}
1
2
3
4
5
6
7
8
$ g++ test.cpp
$ ./a.out
foo called
bar called
foo called
terminate called after throwing an instance of '__gnu_cxx::recursive_init_error'
  what():  std::exception
Aborted (core dumped)

Over here, to initialize i, bar() needs to be called, but bar() calls foo() which needs i to be initialized, which again will call bar() (though this time it won’t recurse). If i wasn’t static it would be fine, but now we have two calls trying to initialize i, and it’s unclear as to which value should be used.

The implementation is pretty interesting. Before looking at the code my quick guess was that the following would happen for local statics:

  • obj_guard is a struct containing a mutex and a flag with three states: “uninitialized”, “initializing”, and “initialized”. Alternatively, use an atomic state indicator.
  • When we try to initialize for the first time, the mutex is locked, the flag is set to “initializing”, the mutex is released, the value is initialized, and the flag is set to “initialized”.
  • If when acquiring the mutex, the value is “initialized”, don’t initialize again
  • If when acquiring the mutex, the value is “initializing”, throw some exception

(We need the tristate flag because without it recursion would cause deadlocks)

I suppose that this implementation would work, though it’s not the one being used. The implementation in bionic (the Android version of the C stdlib) is similar; it uses per-static atomics which indicate various states. However, it does not throw an exception when we have a recursive initialization, it instead seems to deadlock5. This is okay because the C++ spec says (Section 6.7.4)

If control re-enters the declaration (recursively) while the object is being initialized, the behavior is undefined.

However, the implementations in gcc/libstdc++ (also this version of libcppabi from Apple, which is a bit more readable) do something different. They use a global recursive mutex to handle reentrancy. Recursive mutexes basically can be locked multiple times by a single thread, but cannot be locked by another thread till the locking thread unlocks them the same number of times. This means that recursion/reentrancy won’t cause deadlocks, but we still have one- thread-at-a-time access. What these implementations do is:

  • guard_object is a set of two flags, one which indicates if the static is initialized, and one which indicates that the static is being initialized (“in use”)
  • If the object is initialized, do nothing (this doesn’t use mutexes and is cheap). This isn’t exactly part of the implementation in the library, but is part of the generated code.
  • If it isn’t initialized, acquire the global recursive lock
  • If the object is initialized by the time the lock was acquired, unlock and return
  • If not, check if the static is being initialized from the second guard_object flag. If it is “in use”, throw an exception.
  • If it wasn’t, mark the second flag of the static’s guard object as being “in use”
  • Call the initialization function, bubble errors
  • Unlock the global mutex
  • Mark the second flag as “not in use”

At any one time, only one thread will be in the process of running initialization routines, due to the global recursive mutex. Since the mutex is recursive, a function (eg bar()) used for initializing local statics may itself use (different) local statics. Due to the “in use” flag, the initialization of a local static may not recursively call its parent function without causing an error.

This doesn’t need per-static atomics, and doesn’t deadlock, however it has the cost of a global mutex which is called at most once per local static. In a highly threaded situation with lots of such statics, one might want to reevaluate directly using local statics.

LLVM’s libcxxabi is similar to the libstdc++ implementation, but instead of a recursive mutex it uses a regular mutex (on non-ARM Apple systems) which is unlocked before __cxa_guard_acquire exits and tests for reentrancy by noting the thread ID in the guard object instead of the “in use” flag. Condvars are used for waiting for a thread to stop using an object. On other platforms, it seems to deadlock, though I’m not sure.

So here we have a rather innocent-looking feature that has some hidden costs and pitfalls. But now I can look at a line of code where this feature is being used, and have a good idea of what’s happening there. One step closer to being a better systems programmer!

Thanks to Rohan Prinja, Eduard Burtescu, and Nishant Sunny for reviewing drafts of this blog post


  1. Emphasis on relatively. This article will show that it’s definitely not “easy” all the time.

  2. This was JNI code which obtained a JNI environment and pulled out method/class IDs from it to be used later

  3. Unless it has a constructor or otherwise isn’t made out of trivially constructible types; in this case it is treated similar to the previous case.

  4. I checked later, and it was indeed the case that global statics are initialized before Env() is ready

  5. I later verified this with a modification of the crashing program above stuck inside some JNI Android code.

How Rust Achieves Thread Safety

In every talk I have given till now, the question “how does Rust achieve thread safety?” has invariably come up1. I usually just give an overview, but this provides a more comprehensive explanation for those who are interested

See also: Huon’s blog post on the same topic

In my previous post I touched a bit on the Copy trait. There are other such “marker” traits in the standard library, and the ones relevant to this discussion are Send and Sync. I recommend reading that post if you’re not familiar with Rust wrapper types like RefCell and Rc, since I’ll be using them as examples throughout this post; but the concepts explained here are largely independent.

For the purposes of this post, I’ll restrict thread safety to mean no data races or cross-thread dangling pointers. Rust doesn’t aim to solve race conditions. However, there are projects which utilize the type system to provide some form of extra safety, for example rust- sessions attempts to provide protocol safety using session types.

These traits are auto-implemented using a feature called “opt in builtin traits”. So, for example, if struct Foo contains only Sync fields, it will also be Sync, unless we explicitly opt out using impl !Sync for Foo {}. Similarly, if struct Foo contains at least one non-Sync type, it will not be Sync either, unless it explicitly opts in (unsafe impl Sync for Foo {})

This means that, for example, a Sender for a Send type is itself Send, but a Sender for a non-Send type will not be Send. This pattern is quite powerful; it lets one use channels with non-threadsafe data in a single-threaded context without requiring a separate “single threaded” channel abstraction.

At the same time, structs like Rc and RefCell which contain Send/Sync fields have explicitly opted out of one or more of these because the invariants they rely on do not hold in threaded situations.

It’s actually possible to design your own library with comparable thread safety guarantees outside of the compiler — while these marker traits are specially treated by the compiler, the special treatment is not necessary for their working. Any two opt-in builtin traits could be used here.

Send and Sync have slightly differing meanings, but are very intertwined.

Send types can be moved between threads without an issue. It answers the question “if this variable were moved to another thread, would it still be valid for use?”. Most objects which completely own their contained data qualify here. Notably, Rc doesn’t (since it is shared ownership). Another exception is LocalKey, which does own its data but isn’t valid from other threads. Borrowed data does qualify to be Send, but in most cases it can’t be sent across threads due to a constraint that will be touched upon later.

Even though types like RefCell use non-atomic reference counting, it can be sent safely between threads because this is a transfer of ownership (a move). Sending a RefCell to another thread will be a move and will make it unusable from the original thread; so this is fine.

Sync, on the other hand, is about synchronous access. It answers the question: “if multiple threads were all trying to access this data, would it be safe?”. Types like Mutex and other lock/atomic based types implement this, along with primitive types. Things containing pointers generally are not Sync.

Sync is sort of a crutch to Send; it helps make other types Send when sharing is involved. For example, &T and Arc<T> are only Send when the inner data is Sync (there’s an additional Send bound in the case of Arc<T>). In words, stuff that has shared/borrowed ownership can be sent to another thread if the shared/borrowed data is synchronous-safe.

RefCell, while Send, is not Sync because of the non atomic reference counting.

Bringing it together, the gatekeeper for all this is thread::spawn(). It has the signature

1
pub fn spawn<F, T>(f: F) -> JoinHandle<T> where F: FnOnce() -> T, F: Send + 'static, T: Send + 'static

Admittedly, this is confusing/noisy, partially because it’s allowed to return a value, and also because it returns a handle from which we can block on a thread join. We can conjure a simpler spawn API for our needs though:

1
pub fn spawn<F>(f: F) where F: FnOnce(), F: Send + 'static

which can be called like:

1
2
3
4
5
6
7
8
9
let mut x = vec![1,2,3,4];

// `move` instructs the closure to move out of its environment
thread::spawn(move || {
   x.push(1);

});

// x is not accessible here since it was moved

In words, spawn() will take a callable (usually a closure) that will be called once, and contains data which is Send and 'static. Here, 'static just means that there is no borrowed data contained in the closure. This is the aforementioned constraint that prevents the sharing of borrowed data across threads; without it we would be able to send a borrowed pointer to a thread that could easily outlive the borrow, causing safety issues.

There’s a slight nuance here about the closures — closures can capture outer variables, but by default they do so by-reference (hence the move keyword). They autoimplement Send and Sync depending on their capture clauses. For more on their internal representation, see huon’s post. In this case, x was captured by-move; i.e. as Vec<T> (instead of being similar to &Vec<T> or something), so the closure itself can be Send. Without the move keyword, the closure would not be `‘static’ since it contains borrowed content.

Since the closure inherits the Send/Sync/'static-ness of its captured data, a closure capturing data of the correct type will satisfy the F: Send+'static bound.

Some examples of things that are allowed and not allowed by this function (for the type of x):

  • Vec<T>, Box<T> are allowed because they are Send and 'static (when the inner type is of the same kind)
  • &T isn’t allowed because it’s not 'static. This is good, because borrows should have a statically-known lifetime. Sending a borrowed pointer to a thread may lead to a use after free, or otherwise break aliasing rules.
  • Rc<T> isn’t Send, so it isn’t allowed. We could have some other Rc<T>s hanging around, and end up with a data race on the refcount.
  • Arc<Vec<u32>> is allowed (Vec<T> is Send and Sync if the inner type is); we can’t cause a safety violation here. Iterator invalidation requires mutation, and Arc<T> doesn’t provide this by default.
  • Arc<Cell<T>> isn’t allowed. Cell<T> provides copying-based internal mutability, and isn’t Sync (so the Arc<Cell<T>> isn’t Send). If this were allowed, we could have cases where larger structs are getting written to from different threads simultaneously resulting in some random mishmash of the two. In other words, a data race.
  • Arc<Mutex<T>> or Arc<RwLock<T>> are allowed (for Send T). The inner types use threadsafe locks and provide lock-based internal mutability. They can guarantee that only one thread is writing to them at any point in time. For this reason, the mutexes are Sync regardless of the inner T (as long as it is Send), and Sync types can be shared safely with wrappers like Arc. From the point of view of the inner type, it’s only being accessed by one thread at a time (slightly more complex in the case of RwLock), so it doesn’t need to know about the threads involved. There can’t be data races when Sync types like these are involved.

As mentioned before, you can in fact create a Sender/Receiver pair of non-Send objects. This sounds a bit counterintuitive — shouldn’t we be only sending values which are Send? However, Sender<T> is only Send if T is Send; so even if we can use a Sender of a non-Send type, we cannot send it to another thread, so it cannot be used to violate thread safety.

There is also a way to utilize the Send-ness of &T (which is not 'static) for some Sync T, namely thread::scoped. This function does not have the 'static bound, but it instead has an RAII guard which forces a join before the borrow ends. This allows for easy fork-join parallelism without necessarily needing a Mutex. Sadly, there are problems which crop up when this interacts with Rc cycles, so the API is currently unstable and will be redesigned. This is not a problem with the language design or the design of Send/Sync, rather it is a perfect storm of small design inconsistencies in the libraries.

Discuss: HN, Reddit


  1. So much that I added bonus slides about thread safety to the end of my deck, and of course I ended up using them at the talk I gave recently

Github Streak: End-game and Post-mortem

More than a year ago I blogged (and blogged again later) about my ongoing github streak.

The GitHub streak has since gone on for more than 500 days1, and has been a really enriching experience.

Yesterday; I noticed something. I had reached end-game with this exercise. The streak is something I don’t think about anymore, and I don’t make any efforts to keep it going. It just … happens. My involvement in open source has reached a level where I don’t need to consciously try contributing daily; I have enough interests/responsibilities that the streak is a side effect. This is despite my current internship being at a place where the code I’m working on is not hosted on GitHub (unlike last year). If the streak breaks, I won’t particularly care; and I haven’t been caring for a while now.

… I think that’s amazing. Despite me not realizing it at the time, this is the state of affairs that such an exercise would ideally lead to — the initial motivation for the exercise replaced with something more substantial, until the excercise is no longer relevant.

I initially started this off after realizing that I had inadvertantly been contributing daily to open source for a week or so. In the past, my contributions to open source used to be in bursts, which meant that I would be out of touch at times. I decided to try and work on extending this. After around 30 days, I had a concrete habit. After around 40, I realized that I’d become much more efficient at working on random bugs (even in unfamiliar codebases), thus being able to spend more time writing real code.

Initially I had set a bunch of rules (listed in my original post), which had stuff like issues/readme edits not counting (and no date manipulation!). I tweaked the rules around the 80-mark to include issues/readmes when I had written code that day but it was lost in a commit squash or rebase. I think much later I dropped the rules about issues and readme edits entirely; just considering “anything that shows up on the punchcard which is not a result of date manipulation” to be valid. At that point I was already quite involved in multiple projects and didn’t care so much about the streak or the original rules — it was just a habit at that point.

Now, I’m a regular contributor to both the Servo and Rust projects. I also have a bunch of personal projects (some lints and syntax extensions for Rust, as well as a gc, along with a lot of older projects that I don’t actively work on but maintain) and am trying to regularly blog. I’ve also gotten well into the habit of sending pull requests for fixing mistakes or annoyances. When all this comes together, I end up with at least one contribution a day. Sometimes more. I have tons of things queued that I want to work on (both personal and as a part of Servo/Rust/elsewhere), if only I had the time.

If you do have enough spare time, I do recommend trying this. Once you get enough momentum the power of habit will keep it going, and if my case is anything of an indicator2 you’ll eventually have a good chunk of contributions and some concrete open source involvement.

Please note that GitHub streaks shouldn’t be used as a metric, ever. They’re great for self motivation. As a metric for tracking employee performance, or for filtering interview candidates; not so much. It’s way too rough a metric (like LoC written), and oversimplifies the work that goes into code. As far as using it to boost interview candidates; not everyone has the time or inclination to contribute to open source after a day job in programming, and that’s okay. I’m a physics student — programming is like a hobby for me3 which I’ll gladly do daily. Now that I have a programming intern, I’m pretty sure there will be days where I don’t want to program further after leaving the office.


  1. The punchcard on GitHub only shows 400-something because the streak got retroactively broken by some deletion or rebase — at that point I didn’t care enough to investigate

  2. It could equally be just a bunch of luck with meeting the right people and choosing the right projects

  3. Though now it’s a serious hobby which is a possible career option

Wrapper Types in Rust: Choosing Your Guarantees

This post is now a part of the official rust book

In my previous post I talked a bit about why the RWlock pattern is important for accessing data, which is why Rust enforces this pattern either at compile time or runtime depending on the abstractions used.

It occurred to me that there are many such abstractions in Rust, each with their unique guarantees. The programmer once again has the choice between runtime and compile time enforcement. It occurred to me that this plethora of “wrapper types”1 could be daunting to newcomers; in this post I intend to give a thorough explanation of what some prominent ones do and when they should be used.

I’m assuming the reader knows about ownership and borrowing in Rust. Nevertheless, I will attempt to keep the majority of this post accessible to those not yet familiar with these concepts. Aside from the two links into the book above, these two blog posts cover the topic in depth.

Basic pointer types

Box<T>

Box<T> is an “owned pointer” or a “box”. While it can hand out borrowed references to the data, it is the only owner of the data. In particular, when something like the following occurs:

1
2
3
let x = Box::new(1);
let y = x;
// x no longer accessible here

Here, the box was moved into y. As x no longer owns it, the compiler will no longer allow the programmer to use x after this. A box can similarly be moved out of a function by returning, and when a box (one which hasn’t been moved) goes out of scope, destructors are run, deallocating the inner data.

This abstraction is a low cost abstraction for dynamic allocation. If you want to allocate some memory on the heap and safely pass a pointer to that memory around, this is ideal. Note that you will only be allowed to share borrowed references to this by the regular borrowing rules, checked at compile time.

Interlude: Copy

Move/ownership semantics are not special to Box<T>; it is a feature of all types which are not Copy.

A Copy type is one where all the data it logically encompasses (usually, owns) is part of its stack representation2. Most types containing pointers to other data are not Copy, since there is additional data elsewhere, and simply copying the stack representation may accidentally share ownership of that data in an unsafe manner.

Types like Vec<T> and String which also have data on the heap are also not Copy. Types like the integer/boolean types are Copy

&T and raw pointers are Copy. Even though they do point to further data, they do not “own” that data. Whereas Box<T> can be thought of as “some data which happens to be dynamically allocated”, &T is thought of as “a borrowing reference to some data”. Even though both are pointers, only the first is considered to be “data”. Hence, a copy of the first should involve a copy of the data (which is not part of its stack representation), but a copy of the second only needs a copy of the reference. &mut T is not Copy because mutable aliases cannot be shared, and &mut T “owns” the data it points to somewhat since it can mutate.

Practically speaking, a type can be Copy if a copy of its stack representation doesn’t violate memory safety.

&T and &mut T

These are immutable and mutable references respectively. They follow the “read-write lock” pattern described in my previous post, such that one may either have only one mutable reference to some data, or any number of immutable ones, but not both. This guarantee is enforced at compile time, and has no visible cost at runtime. In most cases such pointers suffice for sharing cheap references between sections of code.

These pointers cannot be copied in such a way that they outlive the lifetime associated with them.

*const T and *mut T

These are C-like raw pointers with no lifetime or ownership attached to them. They just point to some location in memory with no other restrictions. The only guarantee that these provide is that they cannot be dereferenced except in code marked unsafe.

These are useful when building safe, low cost abstractions like Vec<T>, but should be avoided in safe code.

Rc<T>

This is the first wrapper we will cover that has a runtime cost.

Rc<T> is a reference counted pointer. In other words, this lets us have multiple “owning” pointers to the same data, and the data will be freed (destructors will be run) when all pointers are out of scope.

Internally, it contains a shared “reference count”, which is incremented each time the Rc is cloned, and decremented each time one of the Rcs goes out of scope. The main responsibility of Rc<T> is to ensure that destructors are called for shared data.

The internal data here is immutable, and if a cycle of references is created, the data will be leaked. If we want data that doesn’t leak when there are cycles, we need a garbage collector. I do not know of any existing GCs in Rust, but I am working on one with Michael Layzell and there’s another cycle collecting one being written by Nick Fitzgerald.

Guarantees

The main guarantee provided here is that the data will not be destroyed until all references to it are out of scope.

This should be used when you wish to dynamically allocate and share some data (read-only) between various portions of your program, where it is not certain which portion will finish using the pointer last. It’s a viable alternative to &T when &T is either impossible to statically check for correctness, or creates extremely unergonomic code where the programmer does not wish to spend the development cost of working with.

This pointer is not thread safe, and Rust will not let it be sent or shared with other threads. This lets one avoid the cost of atomics in situations where they are unnecessary.

There is a sister smart pointer to this one, Weak<T>. This is a non-owning, but also non-borrowed, smart pointer. It is also similar to &T, but it is not restricted in lifetime — a Weak<T> can be held on to forever. However, it is possible that an attempt to access the inner data may fail and return None, since this can outlive the owned Rcs. This is useful for when one wants cyclic data structures and other things.

Cost

As far as memory goes, Rc<T> is a single allocation, though it will allocate two extra words as compared to a regular Box<T> (for “strong” and “weak” refcounts).

Rc<T> has the computational cost of incrementing/decrementing the refcount whenever it is cloned or goes out of scope respectively. Note that a clone will not do a deep copy, rather it will simply increment the inner reference count and return a copy of the Rc<T>

Cell types

“Cells” provide interior mutability. In other words, they contain data which can be manipulated even if the type cannot be obtained in a mutable form (for example, when it is behind an &-ptr or Rc<T>).

The documentation for the cell module has a pretty good explanation for these.

These types are generally found in struct fields, but they may be found elsewhere too.

Cell<T>

Cell<T> is a type that provides zero-cost interior mutability, but only for Copy types. Since the compiler knows that all the data owned by the contained value is on the stack, there’s no worry of leaking any data behind references (or worse!) by simply replacing the data.

It is still possible to violate your own invariants using this wrapper, so be careful when using it. If a field is wrapped in Cell, it’s a nice indicator that the chunk of data is mutable and may not stay the same between the time you first read it and when you intend to use it.

1
2
3
4
5
6
7
let x = Cell::new(1);
let y = &x;
let z = &x;
x.set(2);
y.set(3);
z.set(4);
println!("{}", x.get());

Note that here we were able to mutate the same value from various immutable references.

This has the same runtime cost as the following:

1
2
3
4
5
6
7
let mut x = 1;
let y = &mut x;
let z = &mut x;
x = 2;
*y = 3;
*z = 4;
println!("{}", x;

but it has the added benefit of actually compiling successfully.

Guarantees

This relaxes the “no aliasing with mutability” restriction in places where it’s unnecessary. However, this also relaxes the guarantees that the restriction provides; so if one’s invariants depend on data stored within Cell, one should be careful.

This is useful for mutating primitives and other Copy types when there is no easy way of doing it in line with the static rules of & and &mut.

Gábor Lehel summed up the guarantees provided by Cell in a rather succinct manner:

The basic guarantee we need to ensure is that interior references can’t be invalidated (left dangling) by mutation of the outer structure. (Think about references to the interiors of types like Option, Box, Vec, etc.) &, &mut, and Cell each make a different tradeoff here. & allows shared interior references but forbids mutation; &mut allows mutation xor interior references but not sharing; Cell allows shared mutability but not interior references.

Ultimately, while shared mutability can cause many logical errors (as outlined in my previous post ), it can only cause memory safety errors when coupled with “interior references”. This is for types who have an “interior” whose type/size can itself be changed. One example of this is a Rust enum; where by changing the variant you can change what type is contained. If you have an alias to the inner type whilst the variant is changed, pointers within that alias may be invalidated. Similarly, if you change the length of a vector while you have an alias to one of its elements, that alias may be invalidated.

Since Cell doesn’t allow references to the insides of a type (you can only copy out and copy back in), enums and structs alike are safe to be aliased mutably within this.

This comment by Eddy also touches on the guarantees of Cell and the alternatives

Cost

There is no runtime cost to using Cell<T>, however if one is using it to wrap larger (Copy) structs, it might be worthwhile to instead wrap individual fields in Cell<T> since each write is a full copy of the struct.

RefCell<T>

RefCell<T> also provides interior mutability, but isn’t restricted to Copy types.

Instead, it has a runtime cost. RefCell<T> enforces the RWLock pattern at runtime (it’s like a single-threaded mutex), unlike &T/&mut T which do so at compile time. This is done by the borrow() and borrow_mut() functions, which modify an internal reference count and return smart pointers which can be dereferenced immutably and mutably respectively. The refcount is restored when the smart pointers go out of scope. With this system, we can dynamically ensure that there are never any other borrows active when a mutable borrow is active. If the programmer attempts to make such a borrow, the thread will panic.

1
2
3
4
5
6
7
8
9
let x = RefCell::new(vec![1,2,3,4]);
{
    println!("{:?}", *x.borrow())
}

{
    let my_ref = x.borrow_mut();
    my_ref.push(1);
}

Similar to Cell, this is mainly useful for situations where it’s hard or impossible to satisfy the borrow checker. Generally one knows that such mutations won’t happen in a nested form, but it’s good to check.

For large, complicated programs, it becomes useful to put some things in RefCells to make things simpler. For example, a lot of the maps in the ctxt struct in the rust compiler internals are inside this wrapper. These are only modified once (during creation, which is not right after initialization) or a couple of times in well-separated places. However, since this struct is pervasively used everywhere, juggling mutable and immutable pointers would be hard (perhaps impossible) and probably form a soup of &-ptrs which would be hard to extend. On the other hand, the RefCell provides a cheap (not zero-cost) way of safely accessing these. In the future, if someone adds some code that attempts to modify the cell when it’s already borrowed, it will cause a (usually deterministic) panic which can be traced back to the offending borrow.

Similarly, in Servo’s DOM we have a lot of mutation, most of which is local to a DOM type, but some of which crisscrosses the DOM and modifies various things. Using RefCell and Cell to guard all mutation lets us avoid worrying about mutability everywhere, and it simultaneously highlights the places where mutation is actually happening.

Note that RefCell should be avoided if a mostly simple solution is possible with & pointers.

Guarantees

RefCell relaxes the static restrictions preventing aliased mutation, and replaces them with dynamic ones. As such the guarantees have not changed.

Cost

RefCell does not allocate, but it contains an additional “borrow state” indicator (one word in size) along with the data.

At runtime each borrow causes a modification/check of the refcount.

Synchronous types

Many of the types above cannot be used in a threadsafe manner. Particularly, Rc<T> and RefCell<T>, which both use non-atomic ref counts, cannot be used this way. This makes them cheaper to use, but one needs thread safe versions of these too. They exist, in the form of Arc<T> and Mutex<T>/RWLock<T>

Note that the non-threadsafe types cannot be sent between threads, and this is checked at compile time. I’ll touch on how this is done in a later blog post.

There are many useful wrappers for concurrent programming in the sync module, but I’m only going to cover the major ones.

Arc<T>

Arc<T> is just a version of Rc<T> that uses an atomic reference count (hence, “Arc”). This can be sent freely between threads.

C++’s shared_ptr is similar to Arc, however in C++s case the inner data is always mutable. For semantics similar to that from C++, we should use Arc<Mutex<T>>, Arc<RwLock<T>>, or Arc<UnsafeCell<T>>3 (UnsafeCell<T> is a cell type that can be used to hold any data and has no runtime cost, but accessing it requires unsafe blocks). The last one should only be used if one is certain that the usage won’t cause any memory unsafety. Remember that writing to a struct is not an atomic operation, and many functions like vec.push() can reallocate internally and cause unsafe behavior (so even monotonicity4 may not be enough to justify UnsafeCell)

Guarantees

Like Rc, this provides the (thread safe) guarantee that the destructor for the internal data will be run when the last Arc goes out of scope (barring any cycles).

Cost

This has the added cost of using atomics for changing the refcount (which will happen whenever it is cloned or goes out of scope). When sharing data from an Arc in a single thread, it is preferable to share & pointers whenever possible.

Mutex<T> and RwLock<T>

Mutex<T> and RwLock<T> provide mutual-exclusion via RAII guards. For both of these, the mutex is opaque until one calls lock() on it, at which point the thread will block until a lock can be acquired, and then a guard will be returned. This guard can be used to access the inner data (mutably), and the lock will be released when the guard goes out of scope.

1
2
3
4
5
{
    let guard = mutex.lock();
    // guard dereferences mutably to the inner type
    *guard += 1;
} // lock released when destructor runs

RwLock has the added benefit of being efficient for multiple reads. It is always safe to have multiple readers to shared data as long as there are no writers; and RwLock lets readers acquire a “read lock”. Such locks can be acquired concurrently and are kept track of via a reference count. Writers must obtain a “write lock” which can only be obtained when all readers have gone out of scope.

Guarantees

Both of these provide safe shared mutability across threads, however they are prone to deadlocks. Some level of additional protocol safety can be obtained via the type system. An example of this is rust-sessions, an experimental library which uses session types for protocol safety.

Costs

These use internal atomic-like types to maintain the locks, and these are similar pretty costly (they can block all memory reads across processors till they’re done). Waiting on these locks can also be slow when there’s a lot of concurrent access happening.

Composition

A common gripe when reading Rust code is with stuff like Rc<RefCell<Vec<T>>> and more complicated compositions of such types.

Usually, it’s a case of composing together the guarantees that one needs, without paying for stuff that is unnecessary.

For example, Rc<RefCell<T>> is one such composition. Rc itself can’t be dereferenced mutably; because Rc provides sharing and shared mutability isn’t good, so we put RefCell inside to get dynamically verified shared mutability. Now we have shared mutable data, but it’s shared in a way that there can only be one mutator (and no readers) or multiple readers.

Now, we can take this a step further, and have Rc<RefCell<Vec<T>>> or Rc<Vec<RefCell<T>>>. These are both shareable, mutable vectors, but they’re not the same.

With the former, the RefCell is wrapping the Vec, so the Vec in its entirety is mutable. At the same time, there can only be one mutable borrow of the whole Vec at a given time. This means that your code cannot simultaneously work on different elements of the vector from different Rc handles. However, we are able to push and pop from the Vec at will. This is similar to an &mut Vec<T> with the borrow checking done at runtime.

With the latter, the borrowing is of individual elements, but the overall vector is immutable. Thus, we can independently borrow separate elements, but we cannot push or pop from the vector. This is similar to an &mut [T]5, but, again, the borrow checking is at runtime.

In concurrent programs, we have a similar situation with Arc<Mutex<T>>, which provides shared mutability and ownership.

When reading code that uses these, go in step by step and look at the guarantees/costs provided.

When choosing a composed type, we must do the reverse; figure out which guarantees we want, and at which point of the composition we need them. For example, if there is a choice between Vec<RefCell<T>> and RefCell<Vec<T>>, we should figure out the tradeoffs as done above and pick one.

Discuss: HN, Reddit


  1. I’m not sure if this is the technical term for them, but I’ll be calling them that throughout this post.

  2. By “stack representation” I mean the data on the stack when a value of this type is held on the stack. For example, a Vec<T> has a stack representation of a pointer and two integers (length, capacity). While there is more data behind the indirection of the pointer, it is not part of the stack-held portion of the Vec. Looking at this a different way, a type is Copy if a memcopy of the data copies all the data owned by it.

  3. Arc<UnsafeCell<T>> actually won’t compile since UnsafeCell<T> isn’t Send or Sync, but we can wrap it in a type and implement Send/Sync for it manually to get Arc<Wrapper<T>> where Wrapper is struct Wrapper<T>(UnsafeCell<T>).

  4. By this I mean a piece of data that has a monotonic consistency requirement; i.e. a counter or a monotonically growing stack

  5. &[T] and &mut [T] are slices; they consist of a pointer and a length and can refer to a portion of a vector or array. &mut [T] can have its elements mutated, however its length cannot be touched.

The Problem With Single-threaded Shared Mutability

Edit (Jan 2017): I re-discovered Niko’s post which touches on this and reaches for the same realization. I suspect I subconsciously got the idea for this from that post, at least in part.

This is a post that I’ve been meaning to write for a while now; and the release of Rust 1.0 gives me the perfect impetus to go ahead and do it.

Whilst this post discusses a choice made in the design of Rust; and uses examples in Rust; the principles discussed here apply to other languages for the most part. I’ll also try to make the post easy to understand for those without a Rust background; please let me know if some code or terminology needs to be explained.

What I’m going to discuss here is the choice made in Rust to disallow having multiple mutable aliases to the same data (or a mutable alias when there are active immutable aliases), even from the same thread. In essence, it disallows one from doing things like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
let mut x = Vec::new();
{
    let ptr = &mut x; // Take a mutable reference to `x`
    ptr.push(1); // Allowed
    let y = x[0]; // Not allowed (will not compile): as long as `ptr` is active,
                  // x cannot be read from ...
    x.push(1);    // .. or written to
}


// alternatively,

let mut x = Vec::new();
x.push(1); // Allowed
{
    let ptr = &x; // Create an immutable reference
    let y = ptr[0]; // Allowed, nobody can mutate
    let y = x[0]; // Similarly allowed
    x.push(1); // Not allowed (will not compile): as long as `ptr` is active,
               // `x` is frozen for mutation
}

This is essentially the “Read-Write lock” (RWLock) pattern, except it’s not being used in a threaded context, and the “locks” are done via static analysis (compile time “borrow checking”).

Newcomers to the language have the recurring question as to why this exists. Ownership semantics and immutable borrows can be grasped because there are concrete examples from languages like C++ of problems that these concepts prevent. It makes sense that having only one “owner” and then multiple “borrowers” who are statically guaranteed to not stick around longer than the owner will prevent things like use-after-free.

But what could possibly be wrong with having multiple handles for mutating an object? Why do we need an RWLock pattern? 1

It causes memory unsafety

This issue is specific to Rust, and I promise that this will be the only Rust-specific answer.

Rust enums provide a form of algebraic data types. A Rust enum is allowed to “contain” data, for example you can have the enum

1
2
3
4
enum StringOrInt {
    Str(String),
    Int(i64)
}

which gives us a type that can either be a variant Str, with an associated string, or a variant Int2, with an associated integer.

With such an enum, we could cause a segfault like so:

1
2
3
4
5
6
7
let x = Str("Hi!".to_string()); // Create an instance of the `Str` variant with associated string "Hi!"
let y = &mut x; // Create a mutable alias to x

if let Str(ref insides) = x { // If x is a `Str`, assign its inner data to the variable `insides`
    *y = Int(1); // Set `*y` to `Int(1), therefore setting `x` to `Int(1)` too
    println!("x says: {}", insides); // Uh oh!
}

Here, we invalidated the insides reference because setting x to Int(1) meant that there is no longer a string inside it. However, insides is still a reference to a String, and the generated assembly would try to dereference the memory location where the pointer to the allocated string was, and probably end up trying to dereference 1 or some nearby data instead, and cause a segfault.

Okay, so far so good. We know that for Rust-style enums to work safely in Rust, we need the RWLock pattern. But are there any other reasons we need the RWLock pattern? Not many languages have such enums, so this shouldn’t really be a problem for them.

Iterator invalidation

Ah, the example that is brought up almost every time the question above is asked. While I’ve been quite guilty of using this example often myself (and feel that it is a very appropriate example that can be quickly explained), I also find it to be a bit of a cop-out, for reasons which I will explain below. This is partly why I’m writing this post in the first place; a better idea of the answer to The Question should be available for those who want to dig deeper.

Iterator invalidation involves using tools like iterators whilst modifying the underlying dataset somehow.

For example,

1
2
3
4
5
let buf = vec![1,2,3,4];

for i in &buf {
    buf.push(i);
}

Firstly, this will loop infinitely (if it compiled, which it doesn’t, because Rust prevents this). The equivalent C++ example would be this one, which I use at every opportunity.

What’s happening in both code snippets is that the iterator is really just a pointer to the vector and an index. It doesn’t contain a snapshot of the original vector; so pushing to the original vector will make the iterator iterate for longer. Pushing once per iteration will obviously make it iterate forever.

The infinite loop isn’t even the real problem here. The real problem is that after a while, we could get a segmentation fault. Internally, vectors have a certain amount of allocated space to work with. If the vector is grown past this space, a new, larger allocation may need to be done (freeing the old one), since vectors must use contiguous memory.

This means that when the vector overflows its capacity, it will reallocate, invalidating the reference stored in the iterator, and causing use-after-free.

Of course, there is a trivial solution in this case — store a reference to the Vec/vector object inside the iterator instead of just the pointer to the vector on the heap. This leads to some extra indirection or a larger stack size for the iterator (depending on how you implement it), but overall will prevent the memory unsafety.

This would still cause problems with more complex situations involving multidimensional vectors, however.

“It’s effectively threaded”

Aliasing with mutability in a sufficiently complex, single-threaded program is effectively the same thing as accessing data shared across multiple threads without a lock

(The above is my paraphrasing of someone else’s quote; but I can’t find the original or remember who made it)

Edit (Jan 2017): I found the original, it’s a comment by kmc:

My intuition is that code far away from my code might as well be in another thread, for all I can reason about what it will do to shared mutable state.

Let’s step back a bit and figure out why we need locks in multithreaded programs. The way caches and memory work; we’ll never need to worry about two processes writing to the same memory location simultaneously and coming up with a hybrid value, or a read happening halfway through a write.

What we do need to worry about is the rug being pulled out underneath our feet. A bunch of related reads/writes would have been written with some invariants in mind, and arbitrary reads/writes possibly happening between them would invalidate those invariants. For example, a bit of code might first read the length of a vector, and then go ahead and iterate through it with a regular for loop bounded on the length. The invariant assumed here is the length of the vector. If pop() was called on the vector in some other thread, this invariant could be invalidated after the read to length but before the reads elsewhere, possibly causing a segfault or use-after-free in the last iteration.

However, we can have a situation similar to this (in spirit) in single threaded code. Consider the following:

1
2
3
4
5
let x = some_big_thing();
let len = x.some_vec.len();
for i in 0..len {
    x.do_something_complicated(x.some_vec[i]);
}

We have the same invariant here; but can we be sure that x.do_something_complicated() doesn’t modify x.some_vec for some reason? In a complicated codebase, where do_something_complicated() itself calls a lot of other functions which may also modify x, this can be hard to audit.

Of course, the above example is a simplification and contrived; but it doesn’t seem unreasonable to assume that such bugs can happen in large codebases — where many methods being called have side effects which may not always be evident.

Which means that in large codebases we have almost the same problem as threaded ones. It’s very hard to maintain invariants when one is not completely sure of what each line of code is doing. It’s possible to become sure of this by reading through the code (which takes a while), but further modifications may also have to do the same. It’s impractical to do this all the time and eventually bugs will start cropping up.

On the other hand, having a static guarantee that this can’t happen is great. And when the code is too convoluted for a static guarantee (or you just want to avoid the borrow checker), a single-threaded RWlock-esque type called RefCell is available in Rust. It’s a type providing interior mutability and behaves like a runtime version of the borrow checker. Similar wrappers can be written in other languages.

Edit: In case of many primitives like simple integers, the problems with shared mutability turn out to not be a major issue. For these, we have a type called Cell which lets these be mutated and shared simultaenously. This works on all Copy types; i.e. types which only need to be copied on the stack to be copied. (Unlike types involving pointers or other indirection)

This sort of bug is a good source of reentrancy problems too.

Safe abstractions

In particular, the issue in the previous section makes it hard to write safe abstractions, especially with generic code. While this problem is clearer in the case of Rust (where abstractions are expected to be safe and preferably low-cost), this isn’t unique to any language.

Every method you expose has a contract that is expected to be followed. Many times, a contract is handled by type safety itself, or you may have some error-based model to throw out uncontractual data (for example, division by zero).

But, as an API (can be either internal or exposed) gets more complicated, so does the contract. It’s not always possible to verify that the contract is being violated at runtime either, for example many cases of iterator invalidation are hard to prevent in nontrivial code even with asserts.

It’s easy to create a method and add documentation “the first two arguments should not point to the same memory”. But if this method is used by other methods, the contract can change to much more complicated things that are harder to express or check. When generics get involved, it only gets worse; you sometimes have no way of forcing that there are no shared mutable aliases, or of expressing what isn’t allowed in the documentation. Nor will it be easy for an API consumer to enforce this.

This makes it harder and harder to write safe, generic abstractions. Such abstractions rely on invariants, and these invariants can often be broken by the problems in the previous section. It’s not always easy to enforce these invariants, and such abstractions will either be misused or not written in the first place, opting for a heavier option. Generally one sees that such abstractions or patterns are avoided altogether, even though they may provide a performance boost, because they are risky and hard to maintain. Even if the present version of the code is correct, someone may change something in the future breaking the invariants again.

My previous post outlines a situation where Rust was able to choose the lighter path in a situation where getting the same guarantees would be hard in C++.

Note that this is a wider problem than just with mutable aliasing. Rust has this problem too, but not when it comes to mutable aliasing. Mutable aliasing is important to fix however, because we can make a lot of assumptions about our program when there are no mutable aliases. Namely, by looking at a line of code we can know what happened wrt the locals. If there is the possibility of mutable aliasing out there; there’s the possibility that other locals were modified too. A very simple example is:

1
2
3
4
5
6
7
8
9
10
11
fn look_ma_no_temp_var_l33t_interview_swap(&mut x, &mut y) {
    *x = *x + *y;
    *y = *x - *y;
    *x = *x - *y;
}
// or
fn look_ma_no_temp_var_rockstar_interview_swap(&mut x, &mut y) {
    *x = *x ^ *y;
    *y = *x ^ *y;
    *x = *x ^ *y;
}

In both cases, when the two references are the same3, instead of swapping, the two variables get set to zero. A user (internal to your library, or an API consumer) would expect swap() to not change anything when fed equal references, but this is doing something totally different. This assumption could get used in a program; for example instead of skipping the passes in an array sort where the slot is being compared with itself, one might just go ahead with it because swap() won’t change anything there anyway; but it does, and suddenly your sort function fills everything with zeroes. This could be solved by documenting the precondition and using asserts, but the documentation gets harder and harder as swap() is used in the guts of other methods.

Of course, the example above was contrived. It’s well known that those swap() implementations have that precondition, and shouldn’t be used in such cases. Also, in most swap algorithms it’s trivial to ignore cases when you’re comparing an element with itself, generally done by bounds checking.

But the example is a simplified sketch of the problem at hand.

In Rust, since this is statically checked, one doesn’t worry much about these problems, and robust APIs can be designed since knowing when something won’t be mutated can help simplify invariants.

Wrapping up

Aliasing that doesn’t fit the RWLock pattern is dangerous. If you’re using a language like Rust, you don’t need to worry. If you’re using a language like C++, it can cause memory unsafety, so be very careful. If you’re using a language like Java or Go, while it can’t cause memory unsafety, it will cause problems in complex bits of code.

This doesn’t mean that this problem should force you to switch to Rust, either. If you feel that you can avoid writing APIs where this happens, that is a valid way to go around it. This problem is much rarer in languages with a GC, so you might be able to avoid it altogether without much effort. It’s also okay to use runtime checks and asserts to maintain your invariants; performance isn’t everything.

But this is an issue in programming; and make sure you think of it when designing your code.

Discuss: HN, Reddit


  1. Hereafter referred to as “The Question”

  2. Note: Str and Int are variant names which I chose; they are not keywords. Additionally, I’m using “associated foo” loosely here; Rust does have a distinct concept of “associated data” but it’s not relevant to this post.

  3. Note that this isn’t possible in Rust due to the borrow checker.

Where Rust Really Shines

Yesterday I was working on a small feature for the Rust compiler, and came across a situation which really showcased Rust’s awesomeness as a language.

There was a struct which was exposed to an API, and I wished to give it access to a list of things known as “attributes”, where the list was a heap-allocated vector.

Now, I have two ways of actually giving the struct access to a vector. I can either clone it (i.e. make a copy of its contents), or use a reference (pointer) to it or its contents.

In a language like C++ there’s only once choice in this situation; that is to clone the vector1. In a large C++ codebase if I wished to use a pointer I would need to be sure that the vector isn’t deallocated by the time I’m done with it, and more importantly, to be sure that no other code pushes to the vector (when a vector overflows its capacity it will be reallocated, invalidating any other pointers to its contents).

For a smaller codebase this might be possible, but in this specific case it could have taken me a while to become sure of this. The code was related to the “expansion” portion of compilation, where the AST is expanded to a bigger AST. A lot of things change and get moved around, so it is reasonable to assume that it might not be possible to safely use it. I would have had to find out where the vector is originally stored; all the entry points for the code I was modifying, and make sure it isn’t being mutated (not as hard in Rust, but I would still need to muck around a large codebase). And then I would have to somehow make sure that nobody tries to mutate it in the future. This is a task which I would not even consider trying in C++.

However, I had another option here, because this was Rust. In Rust I can store a reference to the contents of the vector without fear of invalidation, since the compiler will prevent me from using the vector in a way that could cause unsafety. Such a reference is known as a slice.

Whilst in C++ I would have to manually go through a lot of code to be sure of safety (and even after all that be left with code that would be brittle to changes elsewhere the codebase), in Rust the compiler can do this for me!

Being able to do this was important — this code is called quite often for a regular compile, and all those extra allocations could be heavy, especially given that this was a feature that would be used by very few.

So first I started off by adding a field to the FieldInfo struct which was a slice of attributes. Notice that I added a lifetime specifier, the 'a to the struct definition.

1
2
3
4
5
6
/// Summary of the relevant parts of a struct/enum field.
pub struct FieldInfo<'a> {
    /// ...
    /// The attributes on the field
    pub attrs: &'a [ast::Attribute],
}

For those of you new to Rust, a lifetime is part of the type of a reference. It’s related to the scope of the reference, and generally can be treated as a generic parameter. So, for example, here, I have a FieldInfo with a lifetime parameter of 'a where 'a is the lifetime of the inner slice of attributes. If I construct this struct with slices from different scopes, its type will be different each time. Lifetimes can get automatically cast depending on their context however, and quite often they get elided away, so one doesn’t need to specify them that much (aside from struct/enum definitions). You can find more information in the Rust book

I then updated code everywhere to pass the attributes from their source to their destination through the chained methods.

An important thing to note here is that none of the lifetime specifiers you see now in the commit were added when I did this. For example, the return value of create_struct_pattern was (P<ast::Pat>, Vec<(Span, Option<Ident>, P<Expr>, &[ast::Attribute])>) at this point, not (P<ast::Pat>, Vec<(Span, Option<Ident>, P<Expr>, &'a [ast::Attribute])>). You can ignore the complicated types being passed around, for now just pretend that a slice of attributes was returned.

Now comes the magic. After these small changes necessary for the feature, I basically let the compiler do the rest of the work. See, at this point the code was wrong. I had forgotten lifetime specifiers in places where they were important, and still wasn’t sure if storing a reference would in fact be possible in the first place. However, the compiler was smart enough to figure things out for me. It would tell me to add lifetime specifiers, and I would add them.

First, the compiler asked me to add a lifetime to the FieldInfo parts of SubstructureFields. So, the following:

1
2
3
4
5
pub enum SubstructureFields<'a> {
    Struct(Vec<FieldInfo>),
    EnumMatching(usize, &'a ast::Variant, Vec<FieldInfo>),
    // ...
}

became

1
2
3
4
5
pub enum SubstructureFields<'a> {
    Struct(Vec<FieldInfo<'a>>),
    EnumMatching(usize, &'a ast::Variant, Vec<FieldInfo<'a>>),
    // ...
}

This needed to happen because elision doesn’t work for structs and enums, and besides, the compiler would need to know if the &ast::Variant was supposed to be the same lifetime as the parameter of the FieldInfos. I decided to just use the existing 'a parameter, which meant that yes, the &ast::Variant was supposed to live just as long. I could also have opted to give the FieldInfos a different lifetime by adding a 'b parameter, but I guessed that it would work this way too (knowing the origin of the fieldinfo and variant, and that implicit lifetime casting would fix most issues that cropped up). I didn’t need to think this out much, though — the compiler gave me a suggestion and I could simply copy it.

The next error was in create_enum_variant_pattern() and create_struct_pattern() as well as some other places.

Here, the method had a signature of

1
2
3
4
5
6
7
fn create_enum_variant_pattern(&self,
                               cx: &mut ExtCtxt,
                               enum_ident: ast::Ident,
                               variant: &ast::Variant,
                               prefix: &str,
                               mutbl: ast::Mutability)
-> (P<ast::Pat>, Vec<(Span, Option<Ident>, P<Expr>, &[ast::Attribute])>)

and I changed it to

1
2
3
4
5
6
7
fn create_enum_variant_pattern<'a>(&self,
                               cx: &mut ExtCtxt,
                               enum_ident: ast::Ident,
                               variant: &'a ast::Variant,
                               prefix: &str,
                               mutbl: ast::Mutability)
-> (P<ast::Pat>, Vec<(Span, Option<Ident>, P<Expr>, &'a [ast::Attribute])>)

In this case, the code was uncomfortable with taking a slice of attributes out of an arbitrary StructDef reference and returning it. What if the StructDef doesn’t live long enough? Generally the compiler internally figures out the lifetimes necessary and uses them here, but if you have too many references there’s no single way to make the fix. In this case, the compiler suggested I add a 'a to &StructDef and the returned &[Attribute], and I did so. The 'a lifetime was declared at the top of the impl, so it was the lifetime parameter of self2. This meant that the returned attribute of the function will have a lifetime tied to self and the input StructDef, and due to this it cannot outlive the inputs, which is what we wanted in the first place. In essence, I took a bit of code that was doing:

1
2
3
4
fn minicreate(&self, variant: &ast::Variant) -> &[ast::Attribute] {
    // do stuff
    // return variant.attributes
}

and changed it to

1
2
3
4
5
// we are sure that the returned slice cannot outlive the variant argument
fn minicreate<'a>(&self, variant: &'a ast::Variant) -> &'a [ast::Attribute] {
    // do stuff
    // return variant.attributes
}

Again, I didn’t need to think this out much (I’m only thinking it through now for this blog post). I followed the suggestion given to me by the compiler:

1
2
error: cannot infer an appropriate lifetime for automatic coercion due to conflicting requirements
help: consider using an explicit lifetime parameter as shown: fn create_enum_variant_pattern<'a>(&self, cx: &mut ExtCtxt, enum_ident: ast::Ident, variant: &'a ast::Variant, prefix: &str, mutbl: ast::Mutability) -> (P<ast::Pat>, Vec<(Span, Option<Ident>, P<Expr>, &'a [ast::Attribute])>)

There were a couple of similar errors elsewhere that were caused by tying these two lifetimes together. Since these methods were chained, updating the lifetimes of a child method would mean that I would have to now update the parent method which passes its arguments down to the children and returns a modification of its return value (and thus must now impose the same restrictions on its own signature). All of this was done by just listening to the suggestions of the compiler (which all contain a function signature to try out). In some cases I introduced a 'b lifetime, because tying it to 'a (the self lifetime parameter) was possibly too restrictive. All of this at the suggestion of the compiler.

While this all seems long and complicated, in reality it wasn’t. I simply added the field to the initial struct, tried compiling a couple of times to figure out which code needed updating to pass around the attributes, and then went through 3-4 more compilation attempts to fix the lifetimes. It didn’t take long, and I didn’t need to put much mental effort into it. I just listened to the compiler, and it worked.

And now I trust completely that that code will not cause any segfaults due to attempted access of a destroyed or moved vector. And this is despite the fact that I still don’t know where that particular vector is modified or destroyed — I didn’t explore that far because I didn’t need to! (or want to :P)

And this is one place Rust really shines. It lets you do optimizations which you wouldn’t dream of doing in C++. In fact, while the C++ way of looking at this problem would probably be to just clone and move on, most Rust programmers would think of using slices as the default, and not even consider it an “optimization”. And again, this wasn’t with much cognitive overhead; I could just follow the compiler and it fixed everything for me.


  1. Some people have pointed out that a shared pointer to the vector itself would work here too. This is correct, but a shared pointer also has a runtime overhead, and more importantly doesn’t prevent iterator invalidation. I had no idea how the vector was being used elsewhere, so this was a risk I didn’t want to take. Additionally, whilst a shared pointer to the vector itself is immune to the issue of the vector being moved, since this was an API, someone consuming the API might take a reference of an attribute and hold on to it long enough for it to become invalidated. This is something we can’t have either – an API consumer should not have to worry about where the pointers will invalidate.

  2. Note: This is not the lifetime of the reference &self, which is the lifetime of the pointer (&'b self), but the lifetime parameter of self, a TraitDef<'a>, which has a lifetime parameter for its child fields.

New Blog!

I’ll be moving from my old Blogger-powered blog to this new one powered by github pages and Octopress. I never enjoyed writing a blog in WYSIWYG or HTML (I would constantly switch between both and still get the formatting wrong); Markdown is my cup of tea.

I may “uplift” some of my favorite posts to this blog later.

I’m quite excited!