In Pursuit of Laziness

Manish Goregaokar’s blog

Font-size: An Unexpectedly Complex CSS Property

font-size is the worst.

It’s a CSS property probably everyone who writes CSS has used at some point. It’s pretty ubiquitous.

And it’s super complicated.

“But it’s just a number”, you say. “How can that be complicated?”

I too felt that way one time. And then I worked on implementing it for stylo.

Stylo is the project to integrate Servo’s styling system into Firefox. The styling system handles parsing CSS, determining which rules apply to which elements, running this through the cascade, and eventually computing and assigning styles to individual elements in the tree. This happens not only on page load, but also whenever various kinds of events (including DOM manipulation) occur, and is a nontrivial portion of pageload and interaction times.

Servo is in Rust, and makes use of Rust’s safe parallelism in many places, one of them being styling. Stylo has the potential to bring these speedups into Firefox, along with the added safety of the code being in a safer systems language.

Anyway, as far as the styling system is concerned, I believe that font-size is the most complex property it has to handle. Some properties may be more complicated when it comes to layout or rendering, but font-size is probably the most complex one in the department of styling.

I’m hoping this post can give an idea of how complex the Web can get, and also serve as documentation for some of these complexities. I’ll also try to give an idea of how the styling system works throughout this post.

Alright. Let’s see what is so complex about font-size.

The basics

The syntax of the property is pretty straightforward. You can specify it as:

  • A length (12px, 15pt, 13em, 4in, 8rem)
  • A percentage (50%)
  • A compound of the above, via a calc (calc(12px + 4em + 20%))
  • An absolute keyword (medium, small, large, x-large, etc)
  • A relative keyword (larger, smaller)

The first three are common amongst quite a few length-related properties. Nothing abnormal in the syntax.

The next two are interesting. Essentially, the absolute keywords map to various pixel values, and match the result of <font size=foo> (e.g. size=3 is the same as font-size: medium). The actual value they map to is not straightforward, and I’ll get to that later in this post.

The relative keywords basically scale the size up or down. The mechanism of the scaling was also complex, however this has changed. I’ll get to that too.

em and rem units

First up: em units. One of the things you can specify in any length-based CSS property is a value with an em or rem unit.

5em means “5 times the font-size of the element this is applied to”. 5rem means “5 times the font-size of the root element”

The implications of this are that font-size needs to be computed before all the other properties (well, not quite, but we’ll get to that!) so that it is available during that time.

You can also use em units within font-size itself. In this case, it computed relative to the font-size of the parent element, since you can’t use the font-size of the element to compute itself.

Minimum font size

Browsers let you set a “minimum” font size in their preferences, and text will not be scaled below it. It’s useful for those with trouble seeing small text.

However, this doesn’t affect properties which depend on font-size via em units. So if you’re using a minimum font size, <div style="font-size: 1px; height: 1em; background-color: red"> will have a very tiny height (which you’ll notice from the color), but the text will be clamped to the minimum size.

What this effectively means is that you need to keep track of two separate computed font size values. There’s one value that is used to actually determine the font size used for the text, and one value that is used whenever the style system needs to know the font-size (e.g. to compute an em unit.)

This gets slightly more complicated when ruby is involved. In ideographic scripts (usually, Han and Han-based scripts like Kanji or Hanja) it’s sometimes useful to have the pronunciation of each character above it in a phonetic script, for the aid of readers without proficiency in that script, and this is known as “ruby” (“furigana” in Japanese). Because these scripts are ideographic, it’s not uncommon for learners to know the pronunciation of a word but have no idea how to write it. An example would be ほん, which is 日本 (“nihon”, i.e. “Japan”) in Kanji with ruby にほん in the phonetic Hiragana script above it.

As you can probably see, the phonetic ruby text is in a smaller font size (usually 50% of the font size of the main text1). The minimum font-size support respects this, and ensures that if the ruby is supposed to be 50% of the size of the text, the minimum font size for the ruby is 50% of the original minimum font size. This avoids clamped text from looking like ほん (where both get set to the same size), which is pretty ugly.

Text zoomm

Firefox additionally lets you zoom text only when zooming. If you have trouble reading small things, it’s great to be able to just blow up the text on the page without having the whole page get zoomed (which means you need to scroll around a lot).

In this case, em units of other properties do get zoomed as well. After all, they’re supposed to be relative to the text’s font size (and may have some relation to the text), so if that size has changed so should they.

(Of course, that argument could also apply to the min font size stuff. I don’t have an answer for why it doesn’t.)

This is actually pretty straightforward to implement. When computing absolute font sizes (including keywords), zoom them if text zoom is on. For everything else continue as normal.

Text zoom is also disabled within <svg:text> elements, which leads to some trickiness here.

Interlude: How the style system works

Before I go ahead it’s probably worth giving a quick overview of how everything works.

The responsibiltiy of a style system is to take in CSS code and a DOM tree, and assign computed styles to each element.

There’s a distinction between “specified” and “computed” here. “specified” styles are in the format you specify in CSS, whereas computed styles are those that get attached to the elements, sent to layout, and inherited from. A given specified style may compute to different values when applied to different elements.

So while you can specify width: 5em, it will compute to something like width: 80px. Computed values are usually a cleaned up form of the specified value.

The style system will first parse the CSS, producing a bunch of rules usually containing declarations (a declaration is like width: 20%;; i.e. a property name and a specified value)

It then goes through the tree in top-down order (this is parallelized in Stylo), figuring out which declarations apply to each element and in which order – some declarations have precedence over others. Then it will compute each relevant declaration against the element’s style (and parent style, among other bits of info), and store this value in the element’s “computed style”.

There are a bunch of optimizations that Gecko and Servo do here to avoid duplicated work2. There’s a bloom filter for quickly checking if deep descendent selectors apply to a subtree. There’s a “rule tree” that helps cache effort from determining applicable declarations. Computed styles are reference counted and shared very often (since the default state is to inherit from the parent or from the default style).

But ultimately, this is the gist of what happens.

Keyword values

Alright, this is where it gets complicated.

Remember when I said font-size: medium was a thing that mapped to a value?

So what does it map to?

Well, it turns out, it depends on the font family. For the following HTML:

1
2
<span style="font: medium monospace">text</span>
<span style="font: medium sans-serif">text</span>

you get (codepen)

text text

where the first one computes to a font-size of 13px, and the second one computes to a font-size of 16px. You can check this in the computed style pane of your devtools, or by using getComputedStyle().

I think the reason behind this is that monospace fonts tend to be wider, so the default font size (medium) is scaled so that they have similar widths, and all other keyword font sizes get shifted as well. The final result is something like this:

Firefox and Servo have a matrix that helps derive the values for all the absolute font-size keywords based on the “base size” (i.e. the computed of font-size: medium). Actually, Firefox has three tables to support some legacy use cases like quirks mode (Servo has yet to add support for these tables). We query other parts of the browser for what the “base size” is based on the language and font family.

Wait, but what does the language have to do with this anyway? How does the language impact font-size?

It turns out that the base size depends on the font family and the language, and you can configure this.

Both Firefox and Chrome (using an extension) actually let you tweak which fonts get used on a per-language basis, as well as the default (base) font-size.

This is not as obscure as one might think. Default system fonts are often really ugly for non-Latin- using scripts. I have a separate font installed that produces better-looking Devanagari ligatures.

Similarly, some scripts are just more intricate than Latin. My default font size for Devanagari is set to 18 instead of 16. I’ve started learning Mandarin and I’ve set that font size to 18 as well. Hanzi glyphs can get pretty complicated and I still struggle to learn (and later recognize) them. A larger font size is great for this.

Anyway, this doesn’t complicate things too much. This does mean that the font family needs to be computed before font-size, which already needs to be computed before most other properties. The language, which can be set using a lang HTML attribute, is internally treated as a CSS property by Firefox since it inherits, and it must be computed earlier as well.

Not too bad. So far.

Now here’s the kicker. This dependence on the language and family inherits.

Quick, what’s the font-size of the inner div?

1
2
3
4
5
6
<div style="font-size: medium; font-family: sans-serif;"> <!-- base size 16 -->
    font size is 16px
    <div style="font-family: monospace"> <!-- base size 13 -->
        font size is ??
    </div>
</div>

For a normal inherited CSS property3, if the parent has a computed value of 16px, and the child has no additional values specified, the child will inherit a value of 16px. Where the parent got that computed value from doesn’t matter.

Here, font-size “inherits” a value of 13px. You can see this below (codepen):

font size is 16px
font size is ??

Basically, if the computed value originated from a keyword, whenever the font family or language change, font-size is recomputed from the original keyword with the new font family and language.

The reason this exists is because otherwise the differing font sizes wouldn’t work anyway! The default font size is medium, so basically the root element gets a font-size: medium and all elements inherit from it. If you change to monospace or a different language in the document you need the font-size recomputed.

But it doesn’t stop here. This even inherits through relative units (Not in IE).

1
2
3
4
5
6
7
8
9
<div style="font-size: medium; font-family: sans-serif;"> <!-- base size 16 -->
    font size is 16px
    <div style="font-size: 0.9em"> <!-- could also be font-size: 50%-->
        font size is 14.4px (16 * 0.9)
        <div style="font-family: monospace"> <!-- base size 13 -->
            font size is 11.7px! (13 * 0.9)
        </div>
    </div>
</div>

(codepen)

font size is 16px
font size is 14.4px (16 * 0.9)
font size is 11.7px! (13 * 0.9)

So we’re actually inheriting a font-size of 0.9*medium when we inherit from the second div, not 14.4px.

Another way of looking at it is whenever the font family or language changes, you should recompute the font-size as if the language and family were always that way up the tree.

Firefox code uses both of these strategies. The original Gecko style system handles this by actually going back to the top of the tree and recalculating the font size as if the language/family were different. I suspect this is inefficient, but the rule tree seems to be involved in making this slightly more efficient

Servo, on the other hand, stores some extra data on the side when computing stuff, data which gets copied over to the child element. It basically stores the equivalent of saying “Yes, this font was computed from a keyword. The keyword was medium, and after that we applied a factor of 0.9 to it.”4

In both cases, this leads to a bunch of complexities in all the other font-size complexities, since they need to be carefully preserved through this.

In Servo, most of this gets handled via custom cascading functions for font-size.

Larger/smaller

So I mentioned that font-size: larger and smaller scale the size, but didn’t mention by what fraction.

According to the spec, if the font-size currently matches the value of an absolute keyword size (medium/large/etc), you should pick the value of the next/previous keyword sizes respectively.

If it is between two, find the same point between the next/previous two sizes.

This, of course, must play well with the weird inheritance of keyword font sizes mentioned before. In gecko’s model this isn’t too hard, since Gecko recalculates things anyway. In Servo’s model we’d have to store a sequence of applications of larger/smaller and relative units, instead of storing just a relative unit.

Additionally, when computing this during text-zoom, you have to unzoom before looking it up in the table, and then rezoom.

Overall, a bunch of complexity for not much gain — turns out only Gecko actually followed the spec here! All other browser engines used simple ratios here.

So my fix here was simply to remove this behavior from Gecko. That simplified things.

MathML

Firefox and Safari support MathML, a markup language for math. It doesn’t get used much on the Web these days, but it exists.

MathML has its own complexities when it comes to font-size. Specifically, scriptminsize, scriptlevel, and scriptsizemultiplier.

For example, in MathML, the text in the numerator or denominator of a fraction or the text of a superscript is 0.71 times the size of the text outside of it. This is because the default scriptsizemultiplier for MathML elements is 0.71, and these specific elements all get a default scriptlevel of +1.

Basically, scriptlevel=+1 means “multiply the font size by scriptsizemultiplier”, and scriptlevel=-1 is for dividing. This can be specified via a scriptlevel HTML attribute on an mstyle element. You can similarly tweak the (inherited) multiplier via the scriptsizemultiplier HTML attribute, and the minimum size via scriptminsize.

So, for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<math><msup>
    <mi>text</mi>
    <mn>small superscript</mn>
</msup></math><br>
<math>
    text
    <mstyle scriptlevel=+1>
        small
        <mstyle scriptlevel=+1>
            smaller
            <mstyle scriptlevel=-1>
                small again
            </mstyle>
        </mstyle>
    </mstyle>
</math>

will show as (you will need Firefox to see the rendered version, Safari supports MathML too but the support isn’t as good):

textsmall superscript
text small smaller small again

(codepen)

So this isn’t as bad. It’s as if scriptlevel is a weird em unit. No biggie, we know how to deal with those already.

Except you also have scriptminsize. This lets you set the minimum font size for changes caused by scriptlevel.

This means that scriptminsize will make sure scriptlevel never causes changes that make the font smaller than the min size, but it will ignore cases where you deliberately specify an em unit or a pixel value.

There’s already a subtle bit of complexity introduced here, scriptlevel now becomes another thing that tweaks how font-size inherits. Fortunately, in Firefox/Servo internally scriptlevel (as are scriptminsize and scriptsizemultiplier) is also handled as a CSS property, which means that we can use the same framework we used for font-family and language here – compute the script properties before font-size, and if scriptlevel is set, force-recalculate the font size even if font-size itself was not set.

Interlude: early and late computed properties

In Servo the way we handle dependencies in properties is to have a set of “early” properties and a set of “late” properties (which are allowed to depend on early properties). We iterate the declarations twice, once looking for early properties, and once for late. However, now we have a pretty intricate set of dependencies, where font-size must be calculated after language, font-family, and the script properties, but before everything else that involves lengths. Additionally, font-family has to be calculated after all the other early properties due to another font complexity I’m not covering here.

The way we handle this is to pull font-size and font-family out during the early computation, but not deal with them until after the early computation is done.

At that stage we first handle the disabling of text-zoom, and then handle the complexities of font-family.

We then compute the font family. If a font size was specified, we just compute that. If it was not, but a font family, lang, or scriptlevel was specified, we force compute as inherited, which handles all the constraints.

Why scriptminsize gets complicated

Unlike with the other “minimum font size”, using an em unit in any property will calculate the length with the clamped value, not the “if nothing had been clamped” value, when the font size has been clamped with scriptminsize. So at first glance handling this seems straightforward; only consider the script min size when deciding to scale because of scriptlevel.

As always, it’s not that simple 😀:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<math>
<mstyle scriptminsize="10px" scriptsizemultiplier="0.75" style="font-size:20px">
    20px
    <mstyle scriptlevel="+1">
        15px
        <mstyle scriptlevel="+1">
            11.25px
                <mstyle scriptlevel="+1">
                    would be 8.4375, but is clamped at 10px
                        <mstyle scriptlevel="+1">
                            would be 6.328125, but is clamped at 10px
                                <mstyle scriptlevel="-1">
                                    This is not 10px/0.75=13.3, rather it is still clamped at 10px
                                        <mstyle scriptlevel="-1">
                                            This is not 10px/0.75=13.3, rather it is still clamped at 10px
                                            <mstyle scriptlevel="-1">
                                                This is 11.25px again
                                                    <mstyle scriptlevel="-1">
                                                        This is 15px again
                                                    </mstyle>
                                            </mstyle>
                                        </mstyle>
                                </mstyle>
                        </mstyle>
                </mstyle>
        </mstyle>
    </mstyle>
</mstyle>
</math>

(codepen)

Basically, if you increase the level a bunch of times after hitting the min size, decreasing it by one should not immediately compute min size / multiplier. That would make things asymmetric; something with a net script level of +5 should have the same size as something with a net script level of +6 -1, provided the multiplier hasn’t changed.

So what happens is that the script level is calculated against the font size as if scriptminsize had never applied, and we only use that size if it is greater than the min size.

It’s not just a matter of keeping track of the script level at which clamping happened – the multiplier could change in the process and you need to keep track of that too. So this ends up in creating yet another font-size value to inherit.

To recap, we are now at four different notions of font size being inherited:

  • The main font size used by styling
  • The “actual” font size, i.e. the main font size but clamped by the min size
  • (In servo only) The “keyword” size; i.e. the size stored as a keyword and ratio, if it was derived from a keyword
  • The “script unconstrained” size; the font size as if scriptminsize never existed.

Another complexity here is that the following should still work:

1
2
3
4
5
6
7
8
<math>
<mstyle scriptminsize="10px" scriptsizemultiplier="0.75" style="font-size: 5px">
    5px
    <mstyle scriptlevel="-1">
        6.666px
    </mstyle>
</mstyle>
</math>

(codepen)

Basically, if you were already below the scriptminsize, reducing the script level (to increase the font size) should not get clamped, since then you’d get something too large.

This basically means you only apply scriptminsize if you are applying the script level to a value greater than the script min size.

In Servo, all of the MathML handling culminates in this wonderful function that is more comment than code, and some code in the functions near it.


So there you have it. font-size is actually pretty complicated. A lot of the web platform has hidden complexities like this, and it’s always fun to encounter more of them.

(Perhaps less fun when I have to implement them 😂)

Thanks to mystor, mgattozzi, bstrie, and projektir for reviewing drafts of this post


  1. Interestingly, in Firefox, this number is 50% for all ruby except for when the language is Taiwanese Mandarin (where it is 30%). This is because Taiwan uses a phonetic script called Bopomofo, and each Han glyph can be represented as a maximum of 3 Bopomofo letters. So it is possible to choose a reasonable minimum size such that the ruby never extends the size of the glyph below it. On the other hand, pinyin can be up to six letters, and Hiranaga up to (I think) 5, and the corresponding “no overflow” scaling will be too tiny. So fitting them on top of the glyph is not a consideration and instead we elect to have a larger font size for better readability. Additionally, Bopomofo ruby is often set on the side of the glyph instead of on top, and 30% works better there. (h/t @upsuper for pointing this out)

  2. Other browser engines have other optimizations, I’m just less familiar with them

  3. Some properties are inherited, some are “reset”. For example, font-family is inherited — child elements inherit font family from the parent unless otherwise specified. However transform is not, if you transform an element that does not further transform the children.

  4. This won’t handle calcs, which is something I need to fix. Fixing this is trivial, you store an absolute offset in addition to the ratio.

Teaching Programming: Proactive vs Reactive

I’ve been thinking about this a lot these days. In part because of an idea I had but also due to this twitter discussion.

When teaching most things, there are two non-mutually-exclusive ways of approaching the problem. One is “proactive”1, which is where the teacher decides a learning path beforehand, and executes it. The other is “reactive”, where the teacher reacts to the student trying things out and dynamically tailors the teaching experience.

Most in-person teaching experiences are a mix of both. Planning beforehand is very important whilst teaching, but tailoring the experience to the student’s reception of the things being taught is important too.

In person, you can mix these two, and in doing so you get a “best of both worlds” situation. Yay!

But … we don’t really learn much programming in a classroom setup. Sure, some folks learn the basics in college for a few years, but everything they learn after that isn’t in a classroom situation where this can work2. I’m an autodidact, and while I have taken a few programming courses for random interesting things, I’ve taught myself most of what I know using various sources. I care a lot about improving the situation here.

With self-driven learning we have a similar divide. The “proactive” model corresponds to reading books and docs. Various people have proactively put forward a path for learning in the form of a book or tutorial. It’s up to you to pick one, and follow it.

The “reactive” model is not so well-developed. In the context of self-driven learning in programming, it’s basically “do things, make mistakes, hope that Google/Stackoverflow help”. It’s how a lot of people learn programming; and it’s how I prefer to learn programming.

It’s very nice to be able to “learn along the way”. While this is a long and arduous process, involving many false starts and a lack of a sense of progress, it can be worth it in terms of the kind of experience this gets you.

But as I mentioned, this isn’t as well-developed. With the proactive approach, there still is a teacher – the author of the book! That teacher may not be able to respond in real time, but they’re able to set forth a path for you to work through.

On the other hand, with the “reactive” approach, there is no teacher. Sure, there are Random Answers on the Internet, which are great, but they don’t form a coherent story. Neither can you really be your own teacher for a topic you do not understand.

Yet plenty of folks do this. Plenty of folks approach things like learning a new language by reading at most two pages of docs and then just diving straight in and trying stuff out. The only language I have not done this for is the first language I learned3 4.

I think it’s unfortunate that folks who prefer this approach don’t get the benefit of a teacher. In the reactive approach, teachers can still tell you what you’re doing wrong and steer you away from tarpits of misunderstanding. They can get you immediate answers and guidance. When we look for answers on stackoverflow, we get some of this, but it also involves a lot of pattern-matching on the part of the student, and we end up with a bad facsimile of what a teacher can do for you.

But it’s possible to construct a better teacher for this!

In fact, examples of this exist in the wild already!

The Elm compiler is my favorite example of this. It has amazing error messages

The error messages tell you what you did wrong, sometimes suggest fixes, and help correct potential misunderstandings.

Rust does this too. Many compilers do. (Elm is exceptionally good at it)

One thing I particularly like about Rust is that from that error you can try rustc --explain E0373 and get a terminal-friendly version of this help text.

Anyway, diagnostics basically provide a reactive component to learning programming. I’ve cared about diagnostics in Rust for a long time, and I often remind folks that many things taught through the docs can/should be taught through diagnostics too. Especially because diagnostics are a kind of soapbox for compiler writers — you can’t guarantee that your docs will be read, but you can guarantee that your error messages will. These days, while I don’t have much time to work on stuff myself I’m very happy to mentor others working on improving diagnostics in Rust.

Only recently did I realize why I care about them so much – they cater exactly to my approach to learning programming languages! If I’m not going to read the docs when I get started and try the reactive approach, having help from the compiler is invaluable.

I think this space is relatively unexplored. Elm might have the best diagnostics out there, and as diagnostics (helping all users of a language – new and experienced), they’re great, but as a teaching tool for newcomers; they still have a long way to go. Of course, compilers like Rust are even further behind.

One thing I’d like to experiment with is a first-class tool for reactive teaching. In a sense, clippy is already something like this. Clippy looks out for antipatterns, and tries to help teach. But it also does many other things, and not all are teaching moments are antipatterns.

For example, in C, this isn’t necessarily an antipattern:

1
2
3
4
struct thingy *result;
if (result = do_the_thing()) {
    frob(*result)
}

Many C codebases use if (foo = bar()). It is a potential footgun if you confuse it with ==, but there’s no way to be sure. Many compilers now have a warning for this that you can silence by doubling the parentheses, though.

In Rust, this isn’t an antipattern either:

1
2
3
4
5
6
7
fn add_one(mut x: u8) {
    x += 1;
}

let num = 0;
add_one(num);
// num is still 0

For someone new to Rust, they may feel that the way to have a function mutate arguments (like num) passed to it is to use something like mut x: u8. What this actually does is copies num (because u8 is a Copy type), and allows you to mutate the copy within the scope of the function. The right way to make a function that mutates arguments passed to it by-reference would be to do something like fn add_one(x: &mut u8). If you try the mut x thing for non-Copy values, you’d get a “reading out of moved value” error when you try to access num after calling add_one. This would help you figure out what you did wrong, and potentially that error could detect this situation and provide more specific help.

But for Copy types, this will just compile. And it’s not an antipattern – the way this works makes complete sense in the context of how Rust variables work, and is something that you do need to use at times.

So we can’t even warn on this. Perhaps in “pedantic clippy” mode, but really, it’s not a pattern we want to discourage. (At least in the C example that pattern is one that many people prefer to forbid from their codebase)

But it would be nice if we could tell a learning programmer “hey, btw, this is what this syntax means, are you sure you want to do this?”. With explanations and the ability to dismiss the error.

In fact, you don’t even need to restrict this to potential footguns!

You can detect various things the learner is trying to do. Are they probably mixing up String and &str? Help them! Are they writing a trait? Give a little tooltip explaining the feature.

This is beginning to remind me of the original “office assistant” Clippy, which was super annoying. But an opt-in tool or IDE feature which gives helpful suggestions could still be nice, especially if you can strike a balance between being so dense it is annoying and so sparse it is useless.

It also reminds me of well-designed tutorial modes in games. Some games have a tutorial mode that guides you through a set path of doing things. Other games, however, have a tutorial mode that will give you hints even if you stray off the beaten path. Michael tells me that Prey is a recent example of such a game.

This really feels like it fits the “reactive” model I prefer. The student gets to mold their own journey, but gets enough helpful hints and nudges from the “teacher” (the tool) so that they don’t end up wasting too much time and can make informed decisions on how to proceed learning.

Now, rust-clippy isn’t exactly the place for this kind of tool. This tool needs the ability to globally “silence” a hint once you’ve learned it. rust-clippy is a linter, and while you can silence lints in your code, you can’t silence them globally for the current user. Nor does that really make sense.

But rust-clippy does have the infrastructure for writing stuff like this, so it’s an ideal prototyping point. I’ve filed this issue to discuss this topic.

Ultimately, I’d love to see this as an IDE feature.

I’d also like to see more experimentation in the department of “reactive” teaching — not just tools like this.

Thoughts? Ideas? Let me know!

thanks to Andre (llogiq) and Michael Gattozzi for reviewing this


  1. This is how I’m using these terms. There seems to be precedent in pedagogy for the proactive/reactive classification, but it might not be exactly the same as the way I’m using it.

  2. This is true for everything, but I’m focusing on programming (in particular programming languages) here.

  3. And when I learned Rust, it only had two pages of docs, aka “The Tutorial”. Good times.

  4. I do eventually get around to doing a full read of the docs or a book but this is after I’m already able to write nontrivial things in the language, and it takes a lot of time to get there.

Mentally Modelling Modules

The module and import system in Rust is sadly one of the many confusing things you have to deal with whilst learning the language. A lot of these confusions stem from a misunderstanding of how it works. In explaining this I’ve seen that it’s usually a common set of misunderstandings.

In the spirit of “You’re doing it wrong”, I want to try and explain one “right” way of looking at it. You can go pretty far1 without knowing this, but it’s useful and helps avoid confusion.



First off, just to get this out of the way, mod foo; is basically a way of saying “look for foo.rs or foo/mod.rs and make a module named foo with its contents”. It’s the same as mod foo { ... } except the contents are in a different file. This itself can be confusing at first, but it’s not what I wish to focus on here. The Rust book explains this more in the chapter on modules.

In the examples here I will just be using mod foo { ... } since multi-file examples are annoying, but keep in mind that the stuff here applies equally to multi-file crates.

Motivating examples

To start off, I’m going to provide some examples of Rust code which compiles. Some of these may be counterintuitive, based on your existing model.

1
2
3
4
5
6
7
pub mod foo {
    extern crate regex;

    mod bar {
        use foo::regex::Regex;
    }
}

(playpen)

1
2
3
4
5
6
7
8
9
10
11
use std::mem;


pub mod foo {
    // not std::mem::transmute!
    use mem::transmute;

    pub mod bar {
        use foo::transmute;
    }
}

(playpen)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
pub mod foo {
    use bar;
    use bar::bar_inner;

    fn foo() {
        // this works!
        bar_inner();
        bar::bar_inner();
        // this doesn't
        // baz::baz_inner();

        // but these do!
        ::baz::baz_inner();
        super::baz::baz_inner();

        // these do too!
        ::bar::bar_inner();
        super::bar::bar_inner();
        self::bar::bar_inner();

    }
}

pub mod bar {
    pub fn bar_inner() {}
}
pub mod baz {
    pub fn baz_inner() {}
}

(playpen)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
pub mod foo {
    use bar::baz;
    // this won't work
    // use baz::inner();

    // this will
    use self::baz::inner;
    // or
    // use bar::baz::inner

    pub fn foo() {
        // but this will work!
        baz::inner();
    }
}

pub mod bar {
    pub mod baz {
        pub fn inner() {}
    }
}

(playpen)

These examples remind me of the “point at infinity” in elliptic curve crypto or fake particles in physics or fake lattice elements in various fields of CS2. Sometimes, for something to make sense, you add in things that don’t normally exist. Similarly, these examples may contain code which is not traditional Rust style, but the import system still makes more sense when you include them.

Imports

The core confusion behind how imports work can really be resolved by remembering two rules:

  • use foo::bar::baz resolves foo relative to the root module (lib.rs or main.rs)
    • You can resolve relative to the current module by explicily trying use self::foo::bar::baz
  • foo::bar::baz within your code3 resolves foo relative to the current module
    • You can resolve relative to the root by explicitly using ::foo::bar::baz

That’s actually … it. There are no further caveats. The rest of this is modelling what constitutes as “being within a module”.

Let’s take a pretty standard setup, where extern crate declarations are placed in the the root module:

1
2
3
4
5
6
7
8
9
10
11
extern crate regex;

mod foo {
    use regex::Regex;

    fn foo() {
        // won't work
        // let ex = regex::Regex::new("");
        let ex = Regex::new("");
    }
}

When we say extern crate regex, we pull in the regex crate into the crate root. This behaves pretty similar to mod regex { /* contents of regex crate */}. Basically, we’ve imported the crate into the crate root, and since all use paths are relative to the crate root, use regex::Regex works fine inside the module.

Inline in code, regex::Regex won’t work because as mentioned before inline paths are relative to the current module. However, you can try ::regex::Regex::new("").

Since we’ve imported regex::Regex in mod foo, that name is now accessible to everything inside the module directly, so the code can just say Regex::new().

The way you can view this is that use blah and extern crate blah create an item named blah “within the module”, which is basically something like a symbolic link, saying “yes this item named blah is actually elsewhere but we’ll pretend it’s within the module”

The error message from this code may further drive this home:

1
2
3
4
5
use foo::replace;

pub mod foo {
    use std::mem::replace;
}

(playpen)

The error I get is

1
2
3
4
5
error: function `replace` is private
 --> src/main.rs:3:5
  |
3 | use foo::replace;
  |     ^^^^^^^^^^^^

There’s no function named replace in the module foo! But the compiler seems to think there is?

That’s because use std::mem::replace basically is equivalent to there being something like:

1
2
3
4
5
6
7
8
9
10
11
12
pub mod foo {
    fn replace(...) -> ... {
        ...
    }

    // here we can refer to `replace` freely (in inline paths)
    fn whatever() {
        // ...
        let something = replace(blah);
        // ...
    }
}

except it’s actually like a symlink to the function defined in std::mem. Because inline paths are relative to the current module, saying use std::mem::replace works as if you had defined a function replace in the same module, and you can refer to replace() without needing any extra qualification in inline paths.

This also makes pub use fit perfectly in our model. pub use says “make this symlink, but let others see it too”:

1
2
3
4
5
6
// works now!
use foo::replace;

pub mod foo {
    pub use std::mem::replace;
}


Folks often get annoyed when this doesn’t work:

1
2
3
4
5
mod foo {
    use std::mem;
    // nope
    // use mem::replace;
}

As mentioned before, use paths are relative to the root module. There is no mem in the root module, so this won’t work. We can make it work via self, which I mentioned before:

1
2
3
4
5
mod foo {
    use std::mem;
    // yep!
    use self::mem::replace;
}

Note that this brings overloading of the self keyword up to a grand total of four! Two cases which occur in the import/path system:

  • use self::foo means “find me foo within the current module”
  • use foo::bar::{self, baz} is equivalent to use foo::bar; use foo::bar::baz;
  • fn foo(&self) lets you define methods and specify if the receiver is by-move, borrowed, mutably borrowed, or other
  • Self within implementations lets you refer to the type being implemented on

Oh well, at least it’s not static.




Going back to one of the examples I gave at the beginning:

1
2
3
4
5
6
7
8
9
10
use std::mem;


pub mod foo {
    use mem::transmute;

    pub mod bar {
        use foo::transmute;
    }
}

(playpen)

It should be clearer now why this works. The root module imports mem. Now, from everyone’s point of view, there’s an item called mem in the root.

Within mod foo, use mem::transmute works because use is relative to the root, and mem already exists in the root! When you use something, all child modules will see it as if it were actually belonging to the module. (Non-child modules won’t see it because of privacy, we saw an example of this already)

This is why use foo::transmute works from mod bar, too. bar can refer to the contents of foo via use foo::whatever, since foo is a child of the root module, and use is relative to the root. foo already has an item named transmute inside it because it imported one. Nothing in the parent module is private from the child, so we can use foo::transmute from bar.

Generally, the standard way of doing things is to either not use modules (just a single lib.rs), or, if you do use modules, put nothing other than extern crates and mods in the root. This is why we rarely see shenanigans like the above; there’s nothing in the root crate to import, aside from other crates specified by extern crate. The trick of “reimport something from the parent module” is also pretty rare because there’s basically no point to using that (just import it directly!). So this is not the kind of code you’ll see in the wild.



Basically, the way the import system works can be summed up as:

  • extern crate and use will act as if they were defining the imported item in the current module, like a symbolic link
  • use foo::bar::baz resolves the path relative to the root module
  • foo::bar::baz in an inline path (i.e. not in a use) will resolve relative to the current module
  • ::foo::bar::baz will always resolve relative to the root module
  • self::foo::bar::baz will always resolve relative to the current module
  • super::foo::bar::baz will always resolve relative to the parent module

Alright, on to the other half of this. Privacy.

Privacy

So how does privacy work?

Privacy, too, follows some basic rules:

  • If you can access a module, you can access all of its pub contents
  • A module can always access its child modules, but not recursively
    • This means that a module cannot access private items in its children, nor can it access private grandchildren modules
  • A child can always access its parent modules (and their parents), and all their contents
  • pub(restricted) is a proposal which extends this a bit, but it’s experimental so we won’t deal with it here

Giving some examples,

1
2
3
4
5
6
7
8
9
10
mod foo {
    mod bar {
        // can access `foo::foofunc`, even though `foofunc` is private

        pub fn barfunc() {}

    }
    // can access `foo::bar::barfunc()`, even though `bar` is private
    fn foofunc() {}
}
1
2
3
4
5
6
7
8
9
10
11
12
mod foo {
    mod bar {
        // We can access our parent and _all_ its contents,
        // so we have access to `foo::baz`. We can access
        // all pub contents of modules we have access to, so we
        // can access `foo::baz::bazfunc`
        use foo::baz::bazfunc;
    }
    mod baz {
        pub fn bazfunc() {}
    }
}

It’s important to note that this is all contextual; whether or not a particular path works is a function of where you are. For example, this works4:

1
2
3
4
5
6
7
8
9
10
pub mod foo {
    /* not pub */ mod bar {
        pub mod baz {
            pub fn bazfunc() {}
        }
        pub mod quux {
            use foo::bar::baz::bazfunc;
        }
    }
}

We are able to write the path foo::bar::baz::bazfunc even though bar is private!

This is because we still have access to the module bar, by being a descendent module.



Hopefully this is helpful to some of you. I’m not really sure how this can fit into the official docs, but if you have ideas, feel free to adapt it5!


  1. This is because most of these misunderstandings lead to a model where you think fewer things compile, which is fine as long as it isn’t too restrictive. Having a mental model where you feel more things will compile than actually do is what leads to frustration; the opposite can just be restrictive.

  2. One example closer to home is how Rust does lifetime resolution. Lifetimes form a lattice with 'static being the bottom element. There is no top element for lifetimes in Rust syntax, but internally there is the “empty lifetime” which is used during borrow checking. If something resolves to have an empty lifetime, it can’t exist, so we get a lifetime error.

  3. When I say “within your code”, I mean “anywhere but a use statement”. I may also term these as “inline paths”.

  4. Example adapted from this discussion

  5. Contact me if you have licensing issues; I still have to figure out the licensing situation for the blog, but am more than happy to grant exceptions for content being uplifted into official or semi-official docs.

Two Interpretations Diverged in a Yellow Wood

Whose words are these I think I know
His house is in the village though
He will not see me stopping here
To interpret his work as I go

My little student must think it queer
To read without some context near
Between the words and the intent
He wonders what the poem meant

He gives his head a little shake
To ask if there is some mistake
“That’s not what the author said!”
Providing another view instead

The words are lovely, dark, and deep
But I have literary criticism to preach
And miles to go before I sleep
And miles to go before I sleep







Seriously though, try reading The Road Not Taken as metacircular commentary on how the poem is very often “mis”interpreted, and the nature of interpretation / Death of the Author. It fits perfectly when you read “road” as “interpretation”.

(Yes, I know, the parody above is not based on The Road Not Taken but instead a different Frost poem. I was originally going to modify The Road Not Taken but realized all I had to do was change a few words to get there, which was no fun at all)

Prolonging Temporaries in Rust

A colleague of mine learning Rust had an interesting type / borrow checker error. The solution needs a less-used feature of Rust (which basically exists precisely for this kind of thing), so I thought I’d document it.

The code was like this:

1
2
3
4
5
6
7
let maybe_foo = if some_condition {
    thing.get_ref() // returns Option<&Foo>, borrowed from `thing`
} else {
    thing.get_owned() // returns Option<Foo>
};

use(maybe_foo);

If you want to follow along, here is a full program that does this (playpen):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#[derive(Debug)]
struct Foo;

struct Thingy {
    foo: Foo
}

impl Thingy {
    pub fn get_ref(&self) -> Option<&Foo> {
        Some(&self.foo)
    }
    pub fn get_owned(&self) -> Option<Foo> {
        Some(Foo)
    }
    pub fn new() -> Self {
        Thingy {
            foo: Foo
        }
    }
}



pub fn main() {
    let some_condition = true;
    let thing = Thingy::new();

    let maybe_foo = if some_condition {
        thing.get_ref() // returns Option<&Foo>, borrowed from `thing`
    } else {
        thing.get_owned() // returns Option<Foo>
    };

    println!("{:?}", maybe_foo);
}

I’m only going to be changing the contents of main() here.

What’s happening here is that a non-Copy type, Foo, is returned in an Option. In one case, we have a reference to the Foo, and in another case an owned copy.

We want to set a variable to these, but of course we can’t because they’re different types.

In one case, we have an owned Foo, and we can usually obtain a borrow from an owned type. For Option, there’s a convenience method .as_ref() that does this1. Let’s try using that (playpen):

1
2
3
4
5
let maybe_foo = if some_condition {
    thing.get_ref()
} else {
    thing.get_owned().as_ref()
};

This will give us an error.

1
2
3
4
5
6
7
8
9
10
11
12
error: borrowed value does not live long enough
  --> <anon>:32:5
   |
31 |         thing.get_owned().as_ref()
   |         ----------------- temporary value created here
32 |     };
   |     ^ temporary value dropped here while still borrowed
...
35 | }
   | - temporary value needs to live until here

error: aborting due to previous error

The problem is, thing.get_owned() returns an owned value. There’s nothing that it gets anchored to (we don’t set its value to a variable), so it is just a temporary – we can call methods on it, but once we’re done the value will go out of scope.

What we want is something like

1
2
3
4
5
6
let maybe_foo = if some_condition {
    thing.get_ref()
} else {
    let owned = thing.get_owned();
    owned.as_ref()
};

but this will still give a borrow error – owned will still go out of scope within the if block, and we need the reference to it last as long as maybe_foo (outside the block) is supposed to last.

So this is no good.

An alternate solution here can be copying/cloning the Foo in the first case by calling .map(|x| x.clone()) or .cloned() or something. Sometimes you don’t want to clone, so this isn’t great.

Another solution here – the generic advice for dealing with values which may be owned or borrow – is to use Cow. It does incur a runtime check, though; one which can be optimized out if things are inlined enough.

What we need to do here is to extend the lifetime of the temporary returned by thing.get_owned(). We need to extend it past the scope of the if.

One way to do this is to have an Option outside that scope which we mutate (playpen).

1
2
3
4
5
6
7
let mut owned = None;
let maybe_foo = if some_condition {
    thing.get_ref()
} else {
    owned = thing.get_owned();
    owned.as_ref()
};

This works in this case, but in this case we already had an Option. If get_ref() and get_owned() returned &Foo and Foo respectively, then we’d need to do something like:

1
2
3
4
5
6
7
let mut owned = None;
let maybe_foo = if some_condition {
    thing.get_ref()
} else {
    owned = Some(thing.get_owned());
    owned.as_ref().unwrap()
};

which is icky since it introduces an unwrap.

What we really need is a way to signal to the compiler that it needs to hold on to that temporary for the scope of the enclosing block.

We can do that! (playpen)

1
2
3
4
5
6
7
let owned; // 😯😯😯😯😯
let maybe_foo = if some_condition {
    thing.get_ref()
} else {
    owned = thing.get_owned();
    owned.as_ref()
};

We know that Rust doesn’t do “uninitialized” variables. If you want to name a variable, you have to initialize it. let foo; feels rather like magic in this context, because it looks like we’ve declared an uninitialized variable.

What’s less well known is that Rust can do “deferred” initialization. Here, you declare a variable and can initialize it later, but expressions involving the variable can only exist in branches where the compiler knows it has been initialized.

This is the case here. We declared the owned variable beforehand. It now lives in the outer scope and won’t be destroyed until the end of the outer scope. However, the variable cannot be used directly in an expression in the first branch, or after the if. Doing so will give a compile time error saying use of possibly uninitialized variable: `owned`. We can only use it in the else branch because the compiler can see that it is unconditionally initialized in that branch.

We can still read the value of owned indirectly through maybe_foo from outside the branch. This is okay because the storage of owned is guaranteed to live as long as the outer scope, and maybe_foo borrows from it. The only time maybe_foo is set to a value inside owned is when owned has been initialized, so it is safe.


  1. In my experience .as_ref() is the solution to many, many borrow check issues newcomers come across, especially those involving .map()

You’re Doing It Wrong

“You’re doing it wrong”

A common refrain in issue trackers and discussion forums everywhere. In isolation, it’s a variant of RTFM – give a non-answer when someone wants help, and bounce them back to a manual or docs which they probably have already read. Not very helpful, and not useful to anyone. Of course, one can accompany it with a nice explanation of how to do it right; “You’re doing it wrong” isn’t always a bad thing :)

Especially when it comes to programming languages, but in general in the context of any programming tool or library, “you’re doing it wrong” is almost always due to a “bad” mental model. The person, whilst learning, has built a mental model of how the tool works, but this doesn’t accurately reflect reality. Other times, it does reflect reality, but it does not reflect the mental model of the maintainers (there can be multiple valid ways of looking at something!), which leads to an impedance mismatch when reading docs or error messages.

In other cases, “doing it wrong” is a case of the XY problem, where the user has problem X, and think they can solve it with solution Y, and end up asking how they can achieve Y. This happens pretty often — folks may be approaching your technology with prior experience with related things that work differently, and may think the same idioms apply.

When I was at WONTFIX, someone who had done support work in the past mentioned that one thing everyone learns in support is “the user is always wrong …. and it’s not their fault!”.

This is a pretty good template for an attitude to approach “doing it wrong” questions about your technology on online forums as well. And this doesn’t just benefit the users who ask questions, this attitude can benefit your technology!

Back when I used to be more active contributing to the Rust compiler, I also used to hang out in #rust a lot, and often answer newbie questions (now #rust-beginners exists too, and I hang out in both, but I don’t really actively participate as much). One thing I learned to do was probe deeper into why people hit that confusion in the first place. It’s almost always a “bad” mental model. Rust is rarely the first programming language folks learn, and people approach it with preconceptions about how programming works. This isn’t unique to Rust, this happens any time someone learns a language with a different paradigm — learning C or C++ after doing a GCd language, learning a functional language after an imperative one, statically typed after dynamic, or one of the many other axes by which programming languages differ.

Other times, it’s just assumptions they made when reading between the lines of whatever resource they used to learn the language.

So, anyway, folks often have a “bad” mental model. If we are able to identify that model and correct it, we have saved that person from potentially getting confused at every step in the future. Great!

With a tiny bit more effort, however, we can do one step better. Not for that person, but for ourselves! We can probe a bit more and try to understand what caused them to obtain that mental model. And fix the docs so that it never happens again! Of course, not everyone reads the docs, but that’s what diagnostics are for (in the case of errors). They’re a tool to help us nudge the user towards the right mental model, whilst helping them fix their immediate problem. Rust has for a long time had pretty great diagnostics, with improvements happening all the time1. I think this is at least in part due to the attitude of the folks in #rust; always trying to figure out how to preempt various confusions they see.

It’s a good attitude to have. I hope more folks, both in and out of the Rust community, approach “You’re doing it wrong” cases like that.


  1. Diagnostics issues are often the easiest way to contribute to the compiler itself, so if you want to contribute, I suggest starting there. Willing to mentor!

I Never Hear the Phrase ‘INHTPAMA’ Anymore

Imagine never hearing the phrase ‘INHTPAMA’ again.

Oh, that’s already the case? Bummer.

Often, when talking about Rust, folks refer to the core aliasing rule as “that &mut thing”, “compile-time RWLock” (or “compile-time RefCell”), or something similar. Basically, referring to the fact that you can’t mutate the data that is currently held via an & reference, and that you can’t mutate or read the data currently held via an &mut reference except through that reference itself.

It’s always bugged me that we really don’t have a name for this thing. It’s one of the core bits of Rust, and crops up often in discussions.

But we did have a name for it! It was “INHTPAMA” (which was later butchered into “INHTWAMA”).

This is a reference to Niko’s 2012 blog post, titled “Imagine Never Hearing The Phrase ‘aliasable, mutable’ again”. It’s where the aliasing rules came from. Go read it, it’s great. It talks about this weird language with at symbols and purity, but I assure you, that language is Baby Rust. Or maybe Teenage Rust. The lifecycle of rusts is complex and interesting and I don’t know how to categorize it.

The point of this post isn’t really to encourage reviving the use of “INHTWAMA”; it’s a rather weird acronym that will probably confuse folks. I would like to have a better way of refering to “that &mut thing”, but I’d prefer if it wasn’t a confusing acronym that carries no meaning of its own if you don’t know the history of it. That’s a recipe for making new community members feel like outsiders.

But that post is amazing and I’d hate to see it drop out of the collective memory of the Rust community.

Use Signal. Use Tor.

I went to send a missive today
As I have done so oft before
But I forgot to employ that scrap of advice
“Use Signal. Use Tor.”

Intercepted of course the missive was
By a ferocious beast of lore
Because I failed to use that bit of advice
“Use Signal. Use Tor.”

The beast was strong; and formidable
He hated the amendments four
I should have remembered that piece of advice
“Use Signal. Use Tor.”

I tried to reason with the beast
but he only wanted war
Do not neglect that important advice
“Use Signal. Use Tor.”

Here I lie in the belly of the beast
I shall discount this advice no more
If I ever manage to leave this place
I’ll use Signal, and Tor.

Heed this advice, children.
It’s not something to ignore
Always, always, always, always
Use Signal. Use Tor.

Why Quantum Computing Is Weird

I’ve been meaning to write about physics for a while. When I started this blog the intention was to write about a wide variety of interests, but I ended up focusing on programming, despite the fact that I was doing more physics than programming for most of the lifetime of this blog. Time to change that, and hopefully write about other non-programming topics too.

Quantum Computing. It’s the new hip thing that’s going to change the world1. Someday.

In it’s essence, where classical computing deals with “bits”, which are on/off states, quantum computing deals with “qubits”, which are probabalistic quantum states that are often a mixture of on and off. These have interesting properties which make certain kinds of so-far-hard computation very easy to perform.

The goal of this post is not to teach quantum computing, rather to garner interest. I come to praise quantum computing, not bury it2. As a result, this post doesn’t require a background in physics. Having worked with very simple logic circuits is probably enough, though you may not even need that.

I’m basically going to sketch out an example of a very simple quantum algorithm. One that’s very logic-defying. It’s even logic-defying for many who have studied quantum mechanics; it certainly was for me. When I learned this first I could understand why it worked but there was a lot of dissonance between that and my intuitive conviction that it was wrong.

The algorithm

This is a quantum circuit (specifically, the circuit for the Deutsch-Jozsa algorithm). It’s used to find out the nature of a black-box function f(x), which takes in one qubit and outputs another3. For now, you can try to interpret this circuit as if it were a regular logic circuit. You’ll soon see that this interpretation is wrong, but it’s useful for the purposes of this explanation.

To run this algorithm, you first construct an “oracle” out of the black-box function. The oracle, given inputs x and y, has outputs x and y ⊕ f(x) (where is the symbol for XOR, the “exclusive OR”).

As with logic circuits, data flow here goes from left to right. This circuit has two constant inputs, a zero and a one. This is similar to how we might have constant “true” and “false” inputs to logic circuits.

They are then passed through “Hadamard gates”. These are like NOT gates, in that applying them twice is a no-op (they are their own inverse), but they’re not actually NOT gates. I like to describe them as “sideways NOT gates” since that description somewhat intuitively captures what’s going on with the qubits. What’s important to note here is that they have one input and one output, so they’re unaffected by the goings-on in a different wire.

Once these inputs have been Hadamard’ed, they are fed to the oracle we constructed. The top input goes on to become the top output. It’s also passed through f(x) and XORd with the bottom input to make the bottom output.

The top output is then Hadamard’ed again, and finally we observe its value.

Here’s where the magic comes in. By observing the top output, we will know the nature of f(x)4.

Wait, what? The top output doesn’t appear to have any interaction with f(x) at all! How can that work?

In fact, we could try to rewrite this circuit such that the measured output definitely has no interaction with f(x) whatever, assuming that the Hadamard gate isn’t doing anything funky5 (it isn’t):

How in the world does this work?

Why it works

Sadly, I can’t give a satisfying explanation to exactly why this works. This requires some quantum mechanics background6 to grasp.

However, I can give a hopefully-satisfying explanation as to why our regular intuition doesn’t work here.

First and foremost: The rewritten circuit I showed above? It’s wrong. If this was a logic circuit, we could always do that, but in quantum computing, T-junctions like the following can’t exist:

This is due to the “No Cloning theorem”. Unlike regular logic circuits, you can’t just “duplicate” a qubit. In some cases (like this one), you can try to create a similar qubit via the same process (e.g. here we could take another 0 and pass it through a Hadamard gate), but it’s not the “same” qubit. Unlike bits, qubits have a stronger notion of unique identity.

And it’s this sense of identity that fuels this algorithm (and most of quantum computing).

You see, while the top output of the oracle was x, it wasn’t exactly the same x. This x had been mixed with the lower output. This means that the upper and lower outputs are now entangled, with their state depending on each other. In fact, it’s really misleading to show the output as two wires in the first place – it’s really a single “entangled” state of two qubits that can’t be decomposed as a “top half” and a “bottom half”. Of course, this way of representing quantum circuits is still used because it’s a tidy way of visualizing these circuits, and physicists are aware of the caveats involved.

So what happens is that when you observe the top output, you are really doing a partial observation on the combined state of the two outputs, and this includes some information about f(x), which leaks out when you perform the observation.

These properties of qubits make quantum circuits work significantly differently from regular logic ones. On one hand, this severely restricts what you can do with them, but at the same time, new avenues of erstwhile-impossible operations open up. Most useful quantum algorithms (like Shor’s factorization algorithm) involve a mixture of a classical algorithm and a quantum circuit due to this reason. It’s pretty cool!


  1. What isn’t?

  2. The abstruseness of physics lives after it; the coolness is oft interred with its bones.

  3. This actually can be generalized to a function with n input and n output qubits, and the circuit stays mostly the same, except the top “x” line becomes n lines all initialized to 0 and passing through n parallel H gates.

  4. Specifically, if the observation is 1, the function is a constant, whereas if the observation is 0, the function is “balanced” (gives a different output for inputs 1 and 0)

  5. For Hadamard is an honorable gate. So are they all, all honorable gates.

  6. If you do have this background, it’s relatively straightforward; the Wikipedia page has the equations for it.

Understanding Git Filter-branch and the Git Storage Model

The other day Steve wanted git alchemy done on the Rust repo.

Specifically, he wanted the reference and nomicon moved out into their own repositories, preserving history. Both situations had some interesting quirks, the reference has lived in src/doc/reference/* and src/doc/reference.md, and the nomicon has lived in src/doc/nomicon, src/doc/tarpl, and at the top level in a separate git root.

As you can guess from the title of this post, the correct tool for this job is git filter-branch. My colleague Greg calls it “the swiss-army knife of Git history rewriting”.

I had some fun with filter-branch that day, thought I’d finally write an accessible tutorial for it. A lot of folks treat filter-branch like rebase, but it isn’t, and this crucial difference can lead to many false starts. It certainly did for me back when I first learned it.

This kind of ties into the common bit of pedantry about the nature of a commit I keep seeing pop up:

Git commits appear to be diffs, but they’re actually file copies, but they’re actually ACTUALLY diffs.

So what is a git commit?

Generally we interact with git commits via git show or by looking at commits on a git GUI / web UI. Here, we see diffs. It’s natural to think of a commit as a diff, it’s the model that makes the most sense for the most common ways of interacting with commits. It also makes some sense from an implementation point of view, diffs seem like an efficient way of storing things.

It turns out that the “real” model is not this, it’s actually that each commit is a snapshot of the whole repo state at the time.

But actually, it isn’t, the underlying implementation does make use of deltas in packfiles and some other tricks like copy-on-write forking.

Ultimately, arguing about the “real” mental model is mostly pedantry. There are multiple ways of looking at a commit. The documentation tends to implicitly think of them as “full copies of the entire file tree”, which is where most of the confusion about filter-branch comes from. But often it’s important to picture them as diffs, too.

Understanding the implementation can be helpful, especially when you break the repository whilst doing crazy things (I do this often). I’ve explained how it works in a later section, it’s not really a prerequisite for understanding filter-branch, but it’s interesting.

How do I rewrite history with git rebase?

This is where some of the confusion around filter-branch stems from. Folks have worked with rebase, and they think filter-branch is a generalized version of this. They’re actually quite different.

For those of you who haven’t worked with git rebase, it’s a pretty useful way of rewriting history, and is probably what you should use when you want to rewrite history, especially for maintaining clean git history in an unmerged under-review branch.

Rebase does a whole bunch of things. Its core task is, given the current branch and a branch that you want to “rebase onto”, it will take all commits unique to your branch, and apply them in order to the new one. Here, “apply” means “apply the diff of the commit, attempting to resolve any conflicts”. At times, it may ask you to manually resolve the conflicts, using the same tooling you use for conflicts during git merge.

Rebase is much more powerful than that, though. git rebase -i will open up “interactive rebase”, which will show you the commits that are going to be rebased. In this interface, you can reorder commits, mark them for edits (wherein the rebase will stop at that commit and let you git commit --amend changes into it), and even “squash” commits which lets you mark a commit to be absorbed into the previous one. This is rather useful for when you’re working on a feature and want to keep your commits neat, but also want to make fixup patches to older commits. Filippo’s git fixup alias packages this particular task into a single git command. Changing EDITOR=true into EDITOR=: GIT_SEQUENCE_EDITOR=: will make it not even open the editor for confirmation and try to do the whole thing automatically.

git rebase -x some_command is also pretty neat, lets you run a shell command on each step during a rebase.

In this model, you are fundamentally thinking of commits as diffs. When you move around commits in the interactive rebase editor, you’re moving around diffs. When you mark things for squashing, you’re basically merging diffs. The whole process is about taking a set of diffs and applying them to a different “base commit”.

How do I rewrite history with git filter-branch?

filter-branch does not work with diffs. You’re working with the “snapshot” model of commits here, where each commit is a snapshot of the tree, and rewriting these commits.

What git filter-branch will do is for each commit in the specified branch, apply filters to the snapshot, and create a new commit. The new commit’s parent will be the filtered version of the old commit’s parent. So it creates a parallel commit DAG.

Because the filters apply on the snapshots instead of the diffs, there’s no chance for this to cause conflicts like in git rebase. In git rebase, if I have one commit that makes changes to a file, and I change the previous commit to just remove the area of the file that was changed, I’d have a conflict and git would ask me to figure out how the changes are supposed to be applied.

In git-filter-branch, if I do this, it will just power through. Unless you explicitly write your filters to refer to previous commits, the new commit is created in isolation, so it doesn’t worry about changes to the previous commits. If you had indeed edited the previous commit, the new commit will appear to undo those changes and apply its own on top of that.

filter-branch is generally for operations you want to apply pervasively to a repository. If you just want to tweak a few commits, it won’t work, since future commits will appear to undo your changes. git rebase is for when you want to tweak a few commits.

So, how do you use it?

The basic syntax is git filter-branch <filters> branch_name. You can use HEAD or @ to refer to the current branch instead of explicitly typing branch_name.

A very simple and useful filter is the subdirectory filter. It makes a given subdirectory the repository root. You use it via git filter-branch --subdirectory-filter name_of_subdir @. This is useful for extracting the history of a folder into its own repository.

Another useful filter is the tree filter, you can use it to do things like moving around, creating, or removing files. For example, if you want to move README.md to README in the entire history, you’d do something like git filter-branch --tree-filter 'mv README.md README' @ (you can also achieve this much faster with some manual work and rebase). The tree filter will work by checking out each commit (in a separate temporary folder), running your filter on the working directory, adding any changes to the index (no need to git add yourself), and committing the new index.

The --prune-empty argument is useful here, as it removes commits which are now empty due to the rewrite.

Because it is checking out each commit, this filter is quite slow. When I initially was trying to do Steve’s task on the rust repo, I wrote a long tree filter and it was taking forever.

The faster version is the index filter. However, this is a bit trickier to work with (which is why I tend to use a tree filter if I can get away with it). What this does is operate on the index, directly.

The “index” is basically where things go when you git add them. Running git add will create temporary objects for the added file, and modify the WIP index (directory tree) to include a reference to the new file or change an existing file reference to the new one. When you commit, this index is packaged up into a commit and stored as an object. (More on how these objects work in a later section)

Now, since this deals with files that are already stored as objects, git doesn’t need to unwrap these objects and create a working directory to operate on them. So, with --index-filter, you can operate on these in a much faster way. However, since you don’t have a working directory, stuff like adding and moving files can be trickier. You often have to use git update-index to make this work.

However, a useful index filter is one which just scrubs a file (or files) from history:

1
$ git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD

The --ignore-unmatch makes the command still succeed if the file doesn’t exist. filter-branch will fail if one of the filters fails. In general I tend to write fallible filters like command1 1>&2 2>/dev/null ; command2 1>&2 2>/dev/null ; true, which makes it always succeed and also ignores any stdout/stderr output (which tends to make the progress screen fill up fast).

The --cached argument on git rm makes it operate only on the index, not the working directory. This is great, because we don’t have a working directory right now.

I rarely use git update-index so I’m not really going to try and explain how it can be used here. But if you need to do more complex operations in an index filter, that’s the way to go.

There are many other filters, like --commit-filter (lets you discard a commit entirely), --msg-filter (rewriting commit messages), and --env-filter (changing things like author metadata or other env vars). You can see a complete list with examples in the docs

How did I perform the rewrites on the reference and nomicon?

For the Rust Reference, basically I had to extract the history of src/doc/reference.md, AND src/doc/reference/* (reference.md was split up into reference/*.md recently) into its own commit. This is an easy tree filter to write, but tree filters take forever.

Instead of trying my luck with an index filter, I decided to just make it so that the tree filter would be faster. I first extracted src/doc/:

1
$ git filter-branch -f --prune-empty --subdirectory-filter src/doc @

Now I had a branch that contained only the history of src/doc, with the root directory moved to doc. This is a much smaller repo than the entirety of Rust.

Now, I moved reference.md into reference/:

1
$ git filter-branch -f --prune-empty --tree-filter 'mkdir -p reference; mv reference.md reference 1>/dev/null 2>/dev/null; true' @

As mentioned before, the /dev/null and true bits are because the mv command will fail in some cases (when reference.md doesn’t exist), and I want it to just continue without complaining when that happens. I only care about moving instances of that file, if that file doesn’t exist there it’s still okay.

Now, everything I cared about was within reference. The next step was simple:

1
$ git filter-branch -f --prune-empty --subdirectory-filter reference @

The whole process took maybe 10 minutes to run, most of the time being spent by the second command. The final result can be found here.

For the nomicon, the task was easier. In the case of the nomicon, it has always resided in src/doc/nomicon, src/doc/tarpl, or at the root. This last bit is interesting, when Alexis was working on the nomicon, he started off by hacking on it in a separate repo, but then within that repo moved it to src/doc/tarpl, and performed a merge commit with rustc. There’s no inherent restriction in Git that all merges must have a common ancestor, and you can do stuff like this. I was quite surprised when I saw this, since it’s pretty uncommon in general, but really, many projects of that size will have stuff like this. Servo and html5ever do too, and usually it’s when a large project is merged into it after being developed on the side.

This sounds complicated to work with, but it wasn’t that hard. I took the same subdirectory-filtere’d doc directory branch used for the reference. Then, I renamed tarpl/ to nomicon/ via a tree filter, and ran another subdirectory filter:

1
2
$ git filter-branch -f --prune-empty --tree-filter 'mv tarpl nomicon 1>/dev/null 2>/dev/null; true' @
$ git filter-branch -f --prune-empty --subdirectory-filter nomicon @

Now, I had the whole history of the nomicon in the root dir. Except for the commits made by Alexis before his frankenmerge, because these got removed in the first subdirectory filter (the commits were operating outside of src/doc, even though their contents eventually got moved there).

But, at this stage, I already had a branch with the nomicon at the root. Alexis’ original commits were also operating on the root directory. I can just rebase here, and the diffs of my commits will cleanly apply!

I found the commit (a54e64) where everything was moved to tarpl/, and took its parent (c7919f). Then, I just ran git rebase --root c7919f, and everything cleanly rebased. As expected, because I had a history going back to the first child of a54e64 with files moved, and a54e64 itself only moved files, so the diffs should cleanly apply.

The final result can be found here.

Appendix: How are commits actually stored?

The way the actual implementation of a commit works is that each file being stored is hashed and stored in a compressed format, indexed by the hash. A directory (“tree”) will be a list of hashes, one for each file/directory inside it, alongside the filenames and other metadata. This list will be hashed and used everywhere else to refer to the directory.

A commit will reference the “tree” object for the root directory via its hash.

Now, if you make a commit changing some files, most of the files will be unchanged. So will most of the directories. So the commits can share the objects for the unchanged files/directories, reducing their size. This is basically a copy-on-write model. Furthermore, there’s a second optimization called a “packfile”, wherein instead of storing a file git will store a delta (a diff) and a reference to the file the diff must be applied to.

We can see this at work using git cat-file. cat-file lets you view objects in the “git filesystem”, which is basically a bunch of hash-indexed objects stored in .git/objects. You can view them directly by traversing that directory (they’re organized as a trie), but cat-file -p will let you pretty-print their contents since they’re stored in a binary format.

I’m working with the repo for the Rust Book, playing with commit 4822f2. It’s a commit that changes just one file (second-edition/src/ch15-01-box.md), perfect.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ git show 4822f2baa69c849e4fa3b85204f219a16bde2f42
commit 4822f2baa69c849e4fa3b85204f219a16bde2f42
Author: Jake Goulding <...>
Date:   Fri Mar 3 14:07:24 2017 -0500

    Reorder sentence about a generic cons list.

diff --git a/second-edition/src/ch15-01-box.md b/second-edition/src/ch15-01-box.md
index 14c5533..29d8793 100644
--- a/second-edition/src/ch15-01-box.md
+++ b/second-edition/src/ch15-01-box.md
(diff omitted)

$ git cat-file -p 4822f2baa69c849e4fa3b85204f219a16bde2f42

tree ec7cd2821d4bcbafe08f3eca6ea60487bfdc1b52
parent 24cd100e061bb11c3f7f3219467d6d644c50d811
author Jake Goulding <...> 1488568044 -0500
committer GitHub <noreply@github.com> 1488568044 -0500

Reorder sentence about a generic cons list.

This tells us that the commit is a thing with some author information, a pointer to a parent, a commit message, and a “tree”. What’s this tree?

1
2
3
4
5
6
7
8
9
10
11
$ git cat-file -p ec7cd2821d4bcbafe08f3eca6ea60487bfdc1b52
100644 blob 4cab1f4d267628ab5f4f7c14b1b64a9d4b032409    .gitattributes
040000 tree e1dcc1c754d72450b03542b2106fcb67c78805ff    .github
100644 blob 4c699f440ac134c577cb6f67b04ec5b93c652440    .gitignore
100644 blob e86d887d84a839417c960faf877c9057a8dc6823    .travis.yml
100644 blob 7990f2738876fc0fbc2ca30f5f91e91745b0b8eb    README.md
040000 tree 17b33cb52a5abb67ff678a03e7ed88cf9f163c69    ci
040000 tree 0ffd2c1238345c1b0e99af6c1c618eee4a0bab58    first-edition
100644 blob 5d1d2bb79e1521b28dd1b8ff67f9b04f38d83620    index.md
040000 tree b7160f7d05d5b5bfe28bad029b1b490e310cff22    redirects
040000 tree d5672dd9ef15adcd1527813df757847d745e299a    second-edition

This is just a directory! You can see that each entry has a hash. We can use git cat-file -p to view each one. Looking at a tree object will just give us a subdirectory, but the blobs will show us actual files!

1
2
3
4
5
6
7
8
$ git cat-file -p 7990f2738876fc0fbc2ca30f5f91e91745b0b8eb # Show README
# The Rust Programming Language

[![Build Status](https://travis-ci.org/rust-lang/book.svg?branch=master)](https://travis-ci.org/rust-lang/book)

To read this book online, visit [rust-lang.github.io/book/][html].

(rest of file omitted)

So how does this share objects? Let’s look at the previous commit:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ git cat-file -p 4822f2baa69c849e4fa3b85204f219a16bde2f42^ # `^` means "parent"
tree d219be3c5010f64960ddb609a849fc42a01ad31b
parent 21c063868f9d7fb0fa488b6f1124262f055d275b
author steveklabnik <...> 1488567224 -0500
committer steveklabnik <...> 1488567239 -0500

mdbook needs to be on the PATH for deploy

$ git cat-file -p d219be3c5010f64960ddb609a849fc42a01ad31b # the tree
100644 blob 4cab1f4d267628ab5f4f7c14b1b64a9d4b032409    .gitattributes
040000 tree e1dcc1c754d72450b03542b2106fcb67c78805ff    .github
100644 blob 4c699f440ac134c577cb6f67b04ec5b93c652440    .gitignore
100644 blob e86d887d84a839417c960faf877c9057a8dc6823    .travis.yml
100644 blob 7990f2738876fc0fbc2ca30f5f91e91745b0b8eb    README.md
040000 tree 17b33cb52a5abb67ff678a03e7ed88cf9f163c69    ci
040000 tree 0ffd2c1238345c1b0e99af6c1c618eee4a0bab58    first-edition
100644 blob 5d1d2bb79e1521b28dd1b8ff67f9b04f38d83620    index.md
040000 tree b7160f7d05d5b5bfe28bad029b1b490e310cff22    redirects
040000 tree d48b2e06970cf3a6ae65655c340922ae69723989    second-edition

If you look closely, all of these hashes are the same, except for the hash for second-edition. For the hashes which are the same, these objects are being shared across commits. The differing hash is d5672d in the newer commit, and d48b2e in the older one.

Let’s look at the objects:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ git cat-file -p d5672d
100644 blob 82dc67a6b08f0eb62420e4da3b3aa9c0dc10911a    CONTRIBUTING.md
100644 blob 5cd51aa43f05416996c4ef055df5d6eb58fbe737    Cargo.lock
100644 blob 7ab2575fa5bf4abf6eaf767c72347580c9f769dd    Cargo.toml
100644 blob 96e9f0458b55a4047927de5bf04ceda89d772b2b    LICENSE-APACHE
100644 blob 5a56e6e8ed1909b4e4800aa8d2a0e7033ab4babe    LICENSE-MIT
100644 blob be1135fc6d28eca53959c7fc9ae191523e4bc96f    book.json
100644 blob 1400454f36840e916a7d7028d987c42fcb31b4db    dictionary.txt
100644 blob 5103c84d034d6e8a0e4b6090453ad2cdcde21537    doc-to-md.sh
040000 tree 6715d1d4c97e3d17a088922f687b8d9ffacb5953    dot
100644 blob f9e045c4c1824520534270a2643ebe68311503b8    nostarch.sh
040000 tree f8d9a9452b4bbaeba256b95d40b303cd5fb20a64    nostarch
100644 blob 0a2d16852c11355ef9d8758a304b812633dcf03c    spellcheck.sh
040000 tree 3f8db396566716299330cdd5f569fb0a0c4615dd    src
100644 blob 56677811f451084de7c3a2478587a09486209b14    style-guide.md
040000 tree 7601821a2ff38906332082671ea23e4074464dd2    tools

$ git cat-file -p d48b2e
100644 blob 82dc67a6b08f0eb62420e4da3b3aa9c0dc10911a    CONTRIBUTING.md
100644 blob 5cd51aa43f05416996c4ef055df5d6eb58fbe737    Cargo.lock
100644 blob 7ab2575fa5bf4abf6eaf767c72347580c9f769dd    Cargo.toml
100644 blob 96e9f0458b55a4047927de5bf04ceda89d772b2b    LICENSE-APACHE
100644 blob 5a56e6e8ed1909b4e4800aa8d2a0e7033ab4babe    LICENSE-MIT
100644 blob be1135fc6d28eca53959c7fc9ae191523e4bc96f    book.json
100644 blob 1400454f36840e916a7d7028d987c42fcb31b4db    dictionary.txt
100644 blob 5103c84d034d6e8a0e4b6090453ad2cdcde21537    doc-to-md.sh
040000 tree 6715d1d4c97e3d17a088922f687b8d9ffacb5953    dot
100644 blob f9e045c4c1824520534270a2643ebe68311503b8    nostarch.sh
040000 tree f8d9a9452b4bbaeba256b95d40b303cd5fb20a64    nostarch
100644 blob 0a2d16852c11355ef9d8758a304b812633dcf03c    spellcheck.sh
040000 tree f9fc05a6ff78b8211f4df931ed5e32c937aba66c    src
100644 blob 56677811f451084de7c3a2478587a09486209b14    style-guide.md
040000 tree 7601821a2ff38906332082671ea23e4074464dd2    tools

Again, these are the same, except for that of src. src has a lot of files in it, which will clutter this post, so I’ll run a diff on the outputs of cat-file:

$ diff -U5 <(g cat-file -p f9fc05a6ff78b8211f4df931ed5e32c937aba66c) <(g cat-file -p 3f8db396566716299330cdd5f569fb0a0c4615dd)
--- /dev/fd/63  2017-03-05 11:58:22.000000000 -0800
+++ /dev/fd/62  2017-03-05 11:58:22.000000000 -0800
@@ -63,11 +63,11 @@
 100644 blob ff6b8f8cd44f624e1239c47edda59560cdf491ae   ch14-02-publishing-to-crates-io.md
 100644 blob c53ef854a74b6c9fbd915be1bf824c6e78439c42   ch14-03-cargo-workspaces.md
 100644 blob 3fb59f9cc85b6b81994e83a34d542871a260a8f0   ch14-04-installing-binaries.md
 100644 blob e1cd1ca779fdf202af433108a8af6eda317f2717   ch14-05-extending-cargo.md
 100644 blob 3173cc508484cc447ebe42a024eac7d9e6c2ddcd   ch15-00-smart-pointers.md
-100644 blob 14c5533bb3b604c6e6274db278d1e7129f78d55d   ch15-01-box.md
+100644 blob 29d87933d6832374b87d98aa5588e09e0c1a4991   ch15-01-box.md
 100644 blob 47b35ed489d63ce6a885289fec01b7b16ba1afea   ch15-02-deref.md
 100644 blob 2d20c55cc8605c0c899bc4867adc6b6ea1f5c902   ch15-03-drop.md
 100644 blob 8e3fcf4e83fe1ce985a7c0b479b8b16701765aaf   ch15-04-rc.md
 100644 blob a4ade4ae8bf5296d79ed51d69506e71a83f9f489   ch15-05-interior-mutability.md
 100644 blob 3a4db5616c4f5baeb95d04ea40c6747e60181684   ch15-06-reference-cycles.md

As you can see, only the file that was changed in the commit has a new blob stored. If you view 14c553 and 29d879 you’ll get the pre- and post- commit versions of the file respectively.

So basically, each commit stores a tree of references to objects, often sharing nodes with other commits.

I haven’t had the opportunity to work with packfiles much, but they’re an additional optimization on top of this. Aditya’s post is a good intro to these.