In Pursuit of Laziness

Manish Goregaokar’s blog

Down a Rusty Rabbit Hole

Last week I fell down a rather interesting rabbit hole in Rust, which was basically me discovering a series of quirks of the Rust compiler/language, each one leading to the next when I asked “why?”.

It started when someone asked why autogenerated Debug impls use argument names like __arg_0 which start with a double underscore.

This happened to be my fault. The reason we used a double underscore was that while a single underscore tells rustc not to warn about a possibly-unused variable, there’s an off- by-default clippy lint that warns about variables that start with a single underscore that are used, which can be silenced with a double underscore. Now, the correct fix here is to make the lint ignore derive/macros (which I believe we did as well), but at the time we needed to add an underscore anyway so a double underscore didn’t seem worse.

Except of course, this double underscore appears in the docs. Oops.

Ideally the rustc derive infrastructure would have a way of specifying the argument name to use so that we can at least have descriptive things here, but that’s a bit more work (I’m willing to mentor this work though!). So I thought I’d fix this by at least removing the double underscore, and making the unused lint ignore #[derive()] output.

While going through the code to look for underscores I also discovered a hygiene issue. The following code throws a bunch of very weird type errors:

1
2
3
4
5
6
pub const __cmp: u8 = 1;

#[derive(PartialOrd, PartialEq)]
pub enum Foo {
    A(u8), B(u8)
}

(playpen)

1
2
3
4
5
6
7
8
9
error[E0308]: mismatched types
 --> src/main.rs:6:7
  |
6 |     A(u8), B(u8)
  |       ^^^ expected enum `std::option::Option`, found u8
  |
  = note: expected type `std::option::Option<std::cmp::Ordering>`
             found type `u8`
.....

This is because the generated code for PartialOrd contains the following:

1
2
3
4
match foo.cmp(bar) {
    Some(Ordering::Equal) => .....,
    __cmp => __cmp,
}

__cmp can both be a binding to a wildcard pattern match as well as a match against a constant named __cmp, and in the presence of such a constant it resolves to the constant, causing type errors.

One way to fix this is to bind foo.cmp(bar) to some temporary variable x and use that directly in a _ => x branch.

I thought I could be clever and try cmp @ _ => cmp instead. match supports syntax where you can do foo @ <pattern>, where foo is bound to the entire matched variable. The cmp here is unambiguously a binding; it cannot be a pattern. So no conflicting with the const, problem solved!

So I made a PR for both removing the underscores and also fixing this. The change for __cmp is no longer in that PR, but you can find it here.

Except I hit a problem. With that PR, the following still breaks:

1
2
3
4
5
6
pub const cmp: u8 = 1;

#[derive(PartialOrd, PartialEq)]
pub enum Foo {
    A(u8), B(u8)
}

throwing a slightly cryptic error:

1
2
3
4
5
6
7
8
error[E0530]: match bindings cannot shadow constants
 --> test.rs:9:7
  |
4 | pub const cmp: u8 = 1;
  | ---------------------- a constant `cmp` is defined here
...
9 |     B(u8)
  |       ^^^ cannot be named the same as a constant

You can see a reduced version of this error in the following code:

1
2
3
4
5
6
7
pub const cmp : u8 = 1;

fn main() {
    match 1 {
        cmp @ _ => ()
    }
}

(playpen)

Huh. Wat. Why? cmp @ _ seems to be pretty unambiguous, what’s wrong with it shadowing a constant?

Turns out bindings cannot shadow constants at all, for a rather subtle reason:

1
2
3
4
5
const A: u8 = ...; // A_const
let A @ _ = ...; // A_let
match .. {
    A => ...; // A_match
}

What happens here is that constants and variables occupy the same namespace. So A_let shadows A_const here, and when we attempt to match, A_match is resolved to A_let and rejected (since you can’t match against a variable), and A_match falls back to resolving as a fresh binding pattern, instead of resolving to a pattern that matches against A_const.

This is kinda weird, so we disallow shadowing constants with variables. This is rarely a problem because variables are lowercase and constants are uppercase. We could technically allow this language-wise, but it’s hard on the implementation (and irrelevant in practice) so we don’t.


So I dropped that fix. The temporary local variable approach is broken as well since you can also name a constant the same as the local variable and have a clash (so again, you need the underscores to avoid surprises).

But then I realized that we had an issue with removing the underscores from __arg_0 as well.

The following code is also broken:

1
2
3
4
pub const __arg_0: u8 = 1;

#[derive(Debug)]
struct Foo(u8);

(playpen)

1
2
3
4
5
6
7
8
error[E0308]: mismatched types
 --> src/main.rs:3:10
  |
3 | #[derive(Debug)]
  |          ^^^^^ expected mutable reference, found u8
  |
  = note: expected type `&mut std::fmt::Formatter<'_>`
             found type `u8`

You can see a reduced version of this error in the following code:

1
2
3
pub const __arg_0: u8 = 1;

fn foo(__arg_0: bool) {}
1
2
3
4
5
error[E0308]: mismatched types
 --> src/main.rs:3:8
  |
3 | fn foo(__arg_0: bool) {}
  |        ^^^^^^^ expected bool, found u8

(playpen)

This breakage is not an issue with the current code because of the double underscores – there’s a very low chance someone will create a constant that is both lowercase and starts with a double underscore. But it’s a problem when I remove the underscores since that chance shoots up.

Anyway, this failure is even weirder. Why are we attempting to match against the constant in the first place? fn argument patterns1 are irrefutable, i.e. all possible values of the type should match the argument. For example, fn foo(Some(foo): Option<u8>) {} will fail to compile with “refutable pattern in function argument: None not covered”.

There’s no point trying to match against constants here; because even if we find a constant it will be rejected later. Instead, we can unambiguously resolve identifiers as new bindings, yes?

Right?

Firm in my belief, I filed an issue.

I was wrong, it’s not going to always be rejected later. With zero-sized types this can totally still work:

1
2
3
4
5
6
7
struct S;

const C: S = S;

fn main() {
    let C = S;
}

Here because S has only one state, matching against a constant of the type is still irrefutable.

I argued that this doesn’t matter – since the type has a single value, it doesn’t matter whether we resolved to a new binding or the constant; the value and semantics are the same.

This is true.

Except.

Except for when destructors come in.

It was at this point that my table found itself in the perplexing state of being upside-down.

This is still really fine, zero-sized-constants-with-destructors is a pretty rare thing in Rust and I don’t really see folks relying on this behavior.

However I later realized that this entire detour was pointless because even if we fix this, we end up with a way for bindings to shadow constants. Which … which we already realized isn’t allowed by the compiler till we fix some bugs.

Damn.


The actual fix to the macro stuff is to use hygenic generated variable names, which the current infrastructure supports. I plan to make a PR for this eventually.

But it was a very interesting dive into the nuances of pattern matching in Rust.


  1. Yes, function arguments in Rust are patterns. You can totally do things like (a, b): (u8, u8) in function arguments (like you can do in let)

Picking Apart the Crashing iOS String

So there’s yet another iOS text crash, where just looking at a particular string crashes iOS. Basically, if you put this string in any system text box (and other places), it crashes that process. I’ve been testing it by copy-pasting characters into Spotlight so I don’t end up crashing my browser.

The original sequence is U+0C1C U+0C4D U+0C1E U+200C U+0C3E, which is a sequence of Telugu characters: the consonant ja (జ), a virama ( ్ ), the consonant nya (ఞ), a zero-width non-joiner, and the vowel aa ( ా).

I was pretty interested in what made this sequence “special”, and started investigating.

So first when looking into this, I thought that the <ja, virama, nya> sequence was the culprit. That sequence forms a special ligature in many Indic scripts (ज्ञ in Devanagari) which is often considered a letter of its own. However, the ligature for Telugu doesn’t seem very “special”.

Also, from some experimentation, this bug seemed to occur for any pair of Telugu consonants with a vowel, as long as the vowel is not   ై (ai). Huh.

The ZWNJ must be doing something weird, then. <consonant, virama, consonant, vowel> is a pretty common sequence in any Indic script; but ZWNJ before a vowel isn’t very useful for most scripts (except for Bengali and Oriya, but I’ll get to that).

And then I saw that there was a sequence in Bengali that also crashed.

The sequence is U+09B8 U+09CD U+09B0 U+200C U+09C1, which is the consonant “so” (স), a virama ( ্ ), the consonant “ro” (র), a ZWNJ, and vowel u (  ু).

Before we get too into this, let’s first take a little detour to learn how Indic scripts work:

Indic scripts and consonant clusters

Indic scripts are abugidas; which means that their “letters” are consonants, which you can attach diacritics to to change the vowel. By default, consonants have a base vowel. So, for example, क is “kuh” (kə, often transcribed as “ka”), but I can change the vowel to make it के (the “ka” in “okay”) का (“kaa”, like “car”).

Usually, the default vowel is the ə sound, though not always (in Bengali it’s more of an o sound).

Because of the “default” vowel, you need a way to combine consonants. For example, if you wished to write the word “ski”, you can’t write it as स + की (sa + ki = “saki”), you must write it as स्की. What’s happened here is that the स got its vowel “killed”, and got tacked on to the की to form a consonant cluster ligature.

You can also write this as स्‌की . That little tail you see on the स is known as a “virama”; it basically means “remove this vowel”. Explicit viramas are sometimes used when there’s no easy way to form a ligature, e.g. in ङ्‌ठ because there is no simple way to ligatureify ङ into ठ. Some scripts also prefer explicit viramas, e.g. “ski” in Malayalam is written as സ്കീ, where the little crescent is the explicit virama.

In unicode, the virama character is always used to form a consonant cluster. So स्की was written as <स,  ्, क,  ी>, or <sa, virama, ka, i>. If the font supports the cluster, it will show up as a ligature, otherwise it will use an explicit virama.

For Devanagari and Bengali, usually, in a consonant cluster the first consonant is munged a bit and the second consonant stays intact. There are exceptions – sometimes they’ll form an entirely new glyph (क + ष = क्ष), and sometimes both glyphs will change (ड + ड = ड्ड, द + म = द्म, द + ब = द्ब). Those last ones should look like this in conjunct form:

Investigating the Bengali case

Now, interestingly, unlike the Telugu crash, the Bengali crash seemed to only occur when the second consonant is র (“ro”). However, I can trigger it for any choice of the first consonant or vowel, except when the vowel is  ো (o) or  ৌ (au).

Now, র is an interesting consonant in some Indic scripts, including Devanagari. In Devanagari, it looks like र (“ra”). However, it does all kinds of things when forming a cluster. If you’re having it precede another consonant in a cluster, it forms a little feather-like stroke, like in र्क (rka). In Marathi, that stroke can also look like a tusk, as in र्‍क. As a suffix consonant, it can provide a little “extra leg”, as in क्र (kra). For letters without a vertical stroke, like ठ (tha), it does this caret-like thing, ठ्र (thra).

Basically, while most consonants retain some of their form when put inside a cluster, र does not. And a more special thing about र is that this happens even when र is the second consonant in a cluster – as I mentioned before, for most consonant clusters the second consonant stays intact. While there are exceptions, they are usually specific to the cluster; it is only र for which this happens for all clusters.

It’s similar in Bengali, র as the second consonant adds a tentacle-like thing on the existing consonant. For example, প + র (po + ro) gives প্র (pro).

But it’s not just র that does this in Bengali, the consonant “jo” does as well. প + য (po + jo) forms প্য (pjo), and the য is transformed into a wavy line called a “jophola”.

So I tried it with য — , and it turns out that the Bengali crash occurs for য as well! So the general Bengali case is <consonant, virama, র OR য, ZWNJ, vowel>, where the vowel is not  ো or  ৌ.

Suffix-joining consonants

So we’re getting close, here. At least for Bengali, it occurs when the second consonant is such that it often combines with the first consonant without modifying its form much.

In fact, this is the case for Telugu as well! Consonant clusters in Telugu are usually formed by preserving the original consonant, and tacking the second consonant on below!

For example, the original crashy string contains the cluster జ + ఞ, which looks like జ్ఞ. The first letter isn’t really modified, but the second is.

From this, we can guess that it will also occur for Devanagari with र. Indeed it does! U+0915 U+094D U+0930 U+200C U+093E, that is, <क,  ्, र, zwnj,  ा> (< ka, virama, ra, zwnj, aa >) is one such crashing sequence.

But this isn’t really the whole story, is it? For example, the crash does occur for “kro” + zwnj + vowel in Bengali, and in “kro” (ক্র = ক + র = ko + ro) the resultant cluster involves the munging of both the prefix and suffix. But the crash doesn’t occur for द्ब or ड्ड. It seems to be specific to the letter, not the nature of the cluster.

Digging deeper, the reason is that for many fonts (presumably the ones in use), these consonants form “suffix joining consonants”1 (a term I made up) when preceded by a virama. This seems to correspond to the pstf OpenType feature, as well as vatu.

For example, the sequence virama + क gives   ्क, i.e. it renders a virama with a placeholder followed by a क.

But, for र, virama + र renders  ्र, which for me looks like this:

In fact, this is the case for the other consonants as well. For me,  ्र  ্র  ্য  ్ఞ  ్క (Devanagari virama-ra, Bengali virama-ro, Bengali virama-jo, Telugu virama-nya, Telugu virama-ka) all render as “suffix joining consonants”:

(This is true for all Telugu consonants, not just the ones listed).

An interesting bit is that the crash does not occur for <र, virama, र, zwnj, vowel>, because र-virama-र uses the prefix-joining form of the first र (र्र). The same occurs for র with itself or ৰ or য. Because the virama is “sticker” to the left in these cases, it doesn’t cause a crash. (h/t hackbunny for discovering this using a script to enumerate all cases).

Kannada also has “suffix joining consonants”, but for some reason I cannot trigger the crash with it. Ya in Gurmukhi is also suffix-joining.

The ZWNJ

The ZWNJ is curious. The crash doesn’t happen without it, but as I mentioned before a ZWNJ before a vowel doesn’t really do anything for most Indic scripts. In Indic scripts, a ZWNJ can be used to explicitly force a virama if used after the virama (I used it to write स्‌की in this post), however that’s not how it’s being used here.

In Bengali and Oriya specifically, a ZWNJ can be used to force a different vowel form when used before a vowel (e.g. রু vs র‌ু), however this bug seems to apply to vowels for which there is only one form, and this bug also applies to other scripts where this isn’t the case anyway.

The exception vowels are interesting. They’re basically all vowels that are made up of two glyph components. Philippe Verdy points out:

And why this bug does not occur with some vowels is because these are vowels in two parts, that are first decomposed into two separate glyphs reordered in the buffer of glyphs, while other vowels do not need this prior mapping and keep their initial direct mapping from their codepoints in fonts, which means that this has to do to the way the ZWNJ looks for the glyphs of the vowels in the glyphs buffer and not in the initial codepoints buffer: there’s some desynchronization, and more probably an uninitialized data field (for the lookup made in handling ZWNJ) if no vowel decomposition was done (the same data field is correctly initialized when it is the first consonnant which takes an alternate form before a virama, like in most Indic consonnant clusters, because the a glyph buffer is created.

Generalizing

So, ultimately, the full set of cases that cause the crash are:

Any sequence <consonant1, virama, consonant2, ZWNJ, vowel> in Devanagari, Bengali, and Telugu, where:

  • consonant2 is suffix-joining (pstf/vatu) – i.e. र, র, য, ৰ, and all Telugu consonants
  • consonant1 is not a reph-forming letter like र/র (or a variant, like ৰ)
  • vowel does not have two glyph components, i.e. it is not   ై,   ো, or   ৌ

This leaves one question open:

Why doesn’t it apply to Kannada? Or, for that matter, Khmer, which has a similar virama-like thing called a “coeng”.

Are these valid strings?

A recurring question I’m getting is if these strings are valid in the language, or unicode gibberish like Zalgo text. Breaking it down:

  • All of the rendered glyphs are valid. The original Telugu one is the root of the word for “knowledge” (and I’ve taken to calling this bug “forbidden knowledge” for that reason).
  • In Telugu and Devanagari, there is no functional use of the ZWNJ as used before a vowel. It should not be there, and one would not expect it in typical text.
  • In Bengali (also Oriya), putting a ZWNJ before some vowels prevents them from ligatureifying, and this is mentioned in the Unicode spec. However, it seems rare for native speakers to use this.
  • In all of these scripts, putting a ZWNJ after viramas can be used to force an explicit virama over a ligature. That is not the position ZWNJ is used here, but it gives a hint that this might have been a mistype. Doing this is also rare at least for Devanagari (and I believe for the other two scripts as well)
  • Android has an explicit key for ZWNJ on its keyboards for these languages2, right next to the spacebar. iOS has this as well on the long-press of the virama key. Very easy to mistype, at least for Android.

So while the crashing strings are usually invalid, and when not, very rare, they are easy enough to mistype.

An example by @FakeUnicode was the string “For/k” (or “Foŕk”, if accents were easier to type). A slash isn’t something you’d normally type there, and the produced string is gibberish, but it’s easy enough to type by accident.

Except of course that the mistake in “For/k”/“Foŕk” is visually obvious and would be fixed; this isn’t the case for most of the crashing strings.

Conclusion

I don’t really have one guess as to what’s going on here – I’d love to see what people think – but my current guess is that the “affinity” of the virama to the left instead of the right confuses the algorithm that handles ZWNJs after viramas into thinking the ZWNJ applies to the virama (it doesn’t, there’s a consonant in between), and this leads to some numbers not matching up and causing a buffer overflow or something. Philippe’s diagnosis of the vowel situation matches up with this.

An interesting thing is that I can cause this crash to happen more reliably in browsers by clicking on the string.

Additionally, sometimes it actually renders in spotlight for a split second before crashing; which means that either the crash isn’t deterministic, or it occurs in some process after rendering. I’m not sure what to think of either. Looking at the backtraces, the crash seems to occur in different places, so it’s likely that it’s memory corruption that gets uncovered later.

I’d love to hear if folks have further insight into this.

Update: Philippe on the Unicode mailing list has an interesting theory

Yes, I could attach a debugger to the crashing process and investigate that instead, but that’s no fun 😂


  1. Philippe Verdy points out that these may be called “phala forms” at least for Bengali

  2. I don’t think the Android keyboard needs this key; the keyboard seems very much a dump of “what does this unicode block let us do”, and includes things like Sindhi-specific or Kashmiri-specific characters for the Marathi keyboard as well as extremely archaic characters, whilst neglecting more common things like the eyelash reph (which doesn’t have its own code point but is a special unicode sequence; native speakers should not be expected to be aware of this sequence).

A Rough Proposal for Sum Types in Go

Sum types are pretty cool. Just like how a struct is basically “This contains one of these and one of these”, a sum type is “This contains one of these or one of these”.

So for example, the following sum type in Rust:

1
2
3
4
enum Foo {
    Stringy(String),
    Numerical(u32)
}

or Swift:

1
2
3
4
enum Foo {
    case stringy(String),
    case numerical(Int)
}

would be one where it’s either Foo::Stringy (Foo::stringy for swift), containing a String, or Foo::Numerical, containing an integer.

This can be pretty useful. For example, messages between threads are often of a “this or that or that or that” form.

The nice thing is, matching (switching) on these enums is usually exhaustive – you must list all the cases (or include a default arm) for your code to compile. This leads to a useful component of type safety – if you add a message to your message passing system, you’ll know where to update it.

Go doesn’t have these. Go does have interfaces, which are dynamically dispatched. The drawback here is that you do not get the exhaustiveness condition, and consumers of your library can even add further cases. (And, of course, dynamic dispatch can be slow). You can get exhaustiveness in Go with external tools, but it’s preferable to have such things in the language IMO.

Many years ago when I was learning Go I wrote a blog post about what I liked and disliked as a Rustacean learning Go. Since then, I’ve spent a lot more time with Go, and I’ve learned to like each Go design decision that I initially disliked, except for the lack of sum types. Most of my issues arose from “trying to program Rust in Go”, i.e. using idioms that are natural to Rust (or other languages I’d used previously). Once I got used to the programming style, I realized that aside from the lack of sum types I really didn’t find much missing from the language. Perhaps improvements to error handling.

Now, my intention here isn’t really to sell sum types. They’re somewhat controversial for Go, and there are good arguments on both sides. You can see one discussion on this topic here. If I were to make a more concrete proposal I’d probably try to motivate this in much more depth. But even I’m not very strongly of the opinion that Go needs sum types; I have a slight preference for it.

Instead, I’m going to try and sketch this proposal for sum types that has been floating around my mind for a while. I end up mentioning it often and it’s nice to have something to link to. Overall, I think this “fits well” with the existing Go language design.

The proposal

The essence is pretty straightforward: Extend interfaces to allow for “closed interfaces”. These are interfaces that are only implemented for a small list of types.

Writing the Foo sum type above would be:

1
2
3
4
5
type Foo interface {
    SomeFunction()
    OtherFunction()
    for string, int
}

It doesn’t even need to have functions defined on it.

The interface functions can only be called if you have an interface object; they are not directly available on variant types without explicitly casting (Foo("...").SomeFunction()).

(I’m not strongly for the for keyword syntax, it’s just a suggestion. The core idea is that you define an interface and you define the types it closes over. Somehow.)

A better example would be an interface for a message-passing system for Raft:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
type VoteRequest struct {
    CandidateId uint
    Term uint
    // ...
}

type VoteResponse struct {
    Term uint
    VoteGranted bool
    VoterId uint
}

type AppendRequest struct {
    //...
}

type AppendResponse struct {
    //...
}
// ...
type RaftMessage interface {
    for VoteRequest, VoteResponse, AppendRequest, AppendResponse
}

Now, you use type switches for dealing with these:

1
2
3
4
5
6
7
8
9
10
11
switch value := msg.(type) {
    case VoteRequest:
        if value.Term <= me.Term {
            me.reject_vote(value.CandidateId)
        } else {
            me.accept_vote(value.CandidateId, value.Term)
        }
    case VoteResponse: // ...
    case AppendRequest: // ...
    case AppendResponse: // ...
}

There is no need for the default case, unless you wish to leave one or more of the cases out.

Ideally, these could be implemented as inline structs instead of using dynamic dispatch. I’m not sure what this entails for the GC design, but I’d love to hear thoughts on this.

We also make it possible to add methods to closed interfaces. This is in the spirit of this proposal, where you allow

1
2
3
func (message RaftMessage) Process(me Me) error {
    // message handling logic
}

for closed interfaces.

This aligns more with how sum types are written and used in other languages; instead of assuming that each method will be a switch on the variant, you can write arbitrary code that may switch on the type but it can also just call other methods. This is really nice because you can write methods in both ways – if it’s a “responsibility of the inner type” kind of method, require it in the interface and delegate it to the individual types. If it’s a “responsibility of the interface” method, write it as a method on the interface as a whole. I kind of wish Rust had this, because in Rust you sometimes end up writing things like:

1
2
3
4
5
match foo {
    Foo::Stringy(s) => s.process(),
    Foo::Numerical(n) => n.process(),
    // ...
}

Yes, this would work better as a trait, but then you lose some niceties of Rust enums. With this proposal Go can have it both ways.


Anyway, thoughts? This is a really rough proposal, and I’m not sure how receptive other Gophers will be to this, nor how complex its implementation would be. I don’t really intend to submit this as a formal proposal, but if someone else wants to they are more than welcome to build on this idea.

What Are Tokio and Async IO All About?

The Rust community lately has been focusing a lot on “async I/O” through the tokio project. This is pretty great!

But for many in the community who haven’t worked with web servers and related things it’s pretty confusing as to what we’re trying to achieve there. When this stuff was being discussed around 1.0, I was pretty lost as well, having never worked with this stuff before.

What’s all this Async I/O business about? What are coroutines? Lightweight threads? Futures? How does this all fit together?

What problem are we trying to solve?

One of Rust’s key features is “fearless concurrency”. But the kind of concurrency required for handling a large amount of I/O bound tasks – the kind of concurrency found in Go, Elixir, Erlang – is absent from Rust.

Let’s say you want to build something like a web service. It’s going to be handling thousands of requests at any point in time (known as the “c10k problem”). In general, the problem we’re considering is having a huge number of I/O bound (usually network I/O) tasks.

“Handling N things at once” is best done by using threads. But … thousands of threads? That sounds a bit much. Threads can be pretty expensive: Each thread needs to allocate a large stack, setting up a thread involves a bunch of syscalls, and context switching is expensive.

Of course, thousands of threads all doing work at once is not going to work anyway. You only have a fixed number of cores, and at any one time only one thread will be running on a core.

But for cases like web servers, most of these threads won’t be doing work. They’ll be waiting on the network. Most of these threads will either be listening for a request, or waiting for their response to get sent.

With regular threads, when you perform a blocking I/O operation, the syscall returns control to the kernel, which won’t yield control back, because the I/O operation is probably not finished. Instead, it will use this as an opportunity to swap in a different thread, and will swap the original thread back when its I/O operation is finished (i.e. it’s “unblocked”). Without Tokio and friends, this is how you would handle such things in Rust. Spawn a million threads; let the OS deal with scheduling based on I/O.

But, as we already discovered, threads don’t scale well for things like this1.

We need “lighter” threads.

Lightweight threading

I think the best way to understand lightweight threading is to forget about Rust for a moment and look at a language that does this well, Go.

So instead, Go has lightweight threads, called “goroutines”. You spawn these with the go keyword. A web server might do something like this:

1
2
3
4
5
6
7
8
9
listener, err = net.Listen(...)
// handle err
for {
    conn, err := listener.Accept()
    // handle err

    // spawn goroutine:
    go handler(conn)
}

This is a loop which waits for new TCP connections, and spawns a goroutine with the connection and the function handler. Each connection will be a new goroutine, and the goroutine will shut down when handler finishes. In the meantime, the main loop continues executing, because it’s running in a different goroutine.

So if these aren’t “real” (operating system) threads, what’s going on?

A goroutine is an example of a “lightweight” thread. The operating system doesn’t know about these, it sees N threads owned by the Go runtime, and the Go runtime maps M goroutines onto them2, swapping goroutines in and out much like the operating system scheduler. It’s able to do this because Go code is already interruptible for the GC to be able to run, so the scheduler can always ask goroutines to stop. The scheduler is also aware of I/O, so when a goroutine is waiting on I/O it yields to the scheduler.

Essentialy, a compiled Go function will have a bunch of points scattered throughout it where it tells the scheduler and GC “take over if you want” (and also “I’m waiting on stuff, please take over”).

When a goroutine is swapped on an OS thread, some registers will be saved, and the program counter will switch to the new goroutine.

But what about its stack? OS threads have a large stack with them, and you kinda need a stack for functions and stuff to work.

What Go used to do was segmented stacks. The reason a thread needs a large stack is that most programming languages, including C, expect the stack to be contiguous, and stacks can’t just be “reallocated” like we do with growable buffers since we expect stack data to stay put so that pointers to stack data to continue to work. So we reserve all the stack we think we’ll ever need (~8MB), and hope we don’t need more.

But the expectation of stacks being contiguous isn’t strictly necessary. In Go, stacks are made of tiny chunks. When a function is called, it checks if there’s enough space on the stack for it to run, and if not, allocates a new chunk of stack and runs on it. So if you have thousands of threads doing a small amount of work, they’ll all get thousands of tiny stacks and it will be fine.

These days, Go actually does something different; it copies stacks. I mentioned that stacks can’t just be “reallocated” we expect stack data to stay put. But that’s not necessarily true — because Go has a GC it knows what all the pointers are anyway, and it can rewrite pointers to stack data on demand.

Either way, Go’s rich runtime lets it handle this stuff well. Goroutines are super cheap, and you can spawn thousands without your computer having problems.

Rust used to support lightweight/“green” threads (I believe it used segmented stacks). However, Rust cares a lot about not paying for things you don’t use, and this imposes a penalty on all your code even if you aren’t using green threads, and it was removed pre-1.0.

Async I/O

A core building block of this is Async I/O. As mentioned in the previous section, with regular blocking I/O, the moment you request I/O your thread will not be allowed to run (“blocked”) until the operation is done. This is perfect when working with OS threads (the OS scheduler does all the work for you!), but if you have lightweight threads you instead want to replace the lightweight thread running on the OS thread with a different one.

Instead, you use non-blocking I/O, where the thread queues a request for I/O with the OS and continues execution. The I/O request is executed at some later point by the kernel. The thread then needs to ask the OS “Is this I/O request ready yet?” before looking at the result of the I/O.

Of course, repeatedly asking the OS if it’s done can be tedious and consume resources. This is why there are system calls like epoll. Here, you can bundle together a bunch of unfinished I/O requests, and then ask the OS to wake up your thread when any of these completes. So you can have a scheduler thread (a real thread) that swaps out lightweight threads that are waiting on I/O, and when there’s nothing else happening it can itself go to sleep with an epoll call until the OS wakes it up (when one of the I/O requests completes).

(The exact mechanism involved here is probably more complex)

So, bringing this to Rust, Rust has the mio library, which is a platform-agnostic wrapper around non-blocking I/O and tools like epoll/kqueue/etc. It’s a building block; and while those used to directly using epoll in C may find it helpful, it doesn’t provide a nice programming model like Go does. But we can get there.

Futures

These are another building block. A Future is the promise of eventually having a value (in fact, in Javascript these are called Promises).

So for example, you can ask to listen on a network socket, and get a Future back (actually, a Stream, which is like a future but for a sequence of values). This Future won’t contain the response yet, but will know when it’s ready. You can wait() on a Future, which will block until you have a result, and you can also poll() it, asking it if it’s done yet (it will give you the result if it is).

Futures can also be chained, so you can do stuff like future.then(|result| process(result)). The closure passed to then itself can produce another future, so you can chain together things like I/O operations. With chained futures, poll() is how you make progress; each time you call it it will move on to the next future provided the existing one is ready.

This is a pretty good abstraction over things like non-blocking I/O.

Chaining futures works much like chaining iterators. Each and_then (or whatever combinator) call returns a struct wrapping around the inner future, which may contain an additional closure. Closures themselves carry their references and data with them, so this really ends up being very similar to a tiny stack!

🗼 Tokio 🗼

Tokio’s essentially a nice wrapper around mio that uses futures. Tokio has a core event loop, and you feed it closures that return futures. What it will do is run all the closures you feed it, use mio to efficiently figure out which futures are ready to make a step3, and make progress on them (by calling poll()).

This actually is already pretty similar to what Go was doing, at a conceptual level. You have to manually set up the Tokio event loop (the “scheduler”), but once you do you can feed it tasks which intermittently do I/O, and the event loop takes care of swapping over to a new task when one is blocked on I/O. A crucial difference is that Tokio is single threaded, whereas the Go scheduler can use multiple OS threads for execution. However, you can offload CPU-critical tasks onto other OS threads and use channels to coordinate so this isn’t that big a deal.

While at a conceptual level this is beginning to shape up to be similar to what we had for Go, code- wise this doesn’t look so pretty. For the following Go code:

1
2
3
4
5
6
7
8
9
10
// error handling ignored for simplicity

func foo(...) ReturnType {
    data := doIo()
    result := compute(data)
    moreData = doMoreIo(result)
    moreResult := moreCompute(data)
    // ...
    return someFinalResult
}

The Rust code will look something like

1
2
3
4
5
6
7
// error handling ignored for simplicity

fn foo(...) -> Future<ReturnType, ErrorType> {
    do_io().and_then(|data| do_more_io(compute(data)))
          .and_then(|more_data| do_even_more_io(more_compute(more_data)))
    // ......
}

Not pretty. The code gets worse if you introduce branches and loops. The problem is that in Go we got the interruption points for free, but in Rust we have to encode this by chaining up combinators into a kind of state machine. Ew.

Generators and async/await

This is where generators (also called coroutines) come in.

Generators are an experimental feature in Rust. Here’s an example:

1
2
3
4
5
6
7
8
9
10
let mut generator = || {
    let i = 0;
    loop {
        yield i;
        i += 1;
    }
};
assert_eq!(generator.resume(), GeneratorState::Yielded(0));
assert_eq!(generator.resume(), GeneratorState::Yielded(1));
assert_eq!(generator.resume(), GeneratorState::Yielded(2));

Functions are things which execute a task and return once. On the other hand, generators return multiple times; they pause execution to “yield” some data, and can be resumed at which point they will run until the next yield. While my example doesn’t show this, generators can also finish executing like regular functions.

Closures in Rust are sugar for a struct containing captured data, plus an implementation of one of the Fn traits to make it callable.

Generators are similar, except they implement the Generator trait4, and usually store an enum representing various states.

The unstable book has some examples on what the generator state machine enum will look like.

This is much closer to what we were looking for! Now our code can look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
fn foo(...) -> Future<ReturnType, ErrorType> {
    let generator = || {
        let mut future = do_io();
        let data;
        loop {
            // poll the future, yielding each time it fails,
            // but if it succeeds then move on
            match future.poll() {
                Ok(Async::Ready(d)) => { data = d; break },
                Ok(Async::NotReady(d)) => (),
                Err(..) => ...
            };
            yield future.polling_info();
        }
        let result = compute(data);
        // do the same thing for `doMoreIo()`, etc
    }

    futurify(generator)
}

where futurify is a function that takes a generator and returns a future which on each poll call will resume() the generator, and return NotReady until the generator finishes executing.

But wait, this is even more ugly! What was the point of converting our relatively clean callback-chaining code into this mess?

Well, if you look at it, this code now looks linear. We’ve converted our callback code to the same linear flow as the Go code, however it has this weird loop-yield boilerplate and the futurify function and is overall not very neat.

And that’s where futures-await comes in. futures-await is a procedural macro that does the last-mile work of packaging away this boilerplate. It essentially lets you write the above function as

1
2
3
4
5
6
#[async]
fn foo(...) -> Result<ReturnType, ErrorType> {
    let data = await!(do_io());
    let result = compute(data);
    let more_data = await!(do_more_io());
    // ....

Nice and clean. Almost as clean as the Go code, just that we have explicit await!() calls. These await calls are basically providing the same function as the interruption points that Go code gets implicitly.

And, of course, since it’s using a generator under the hood, you can loop and branch and do whatever else you want as normal, and the code will still be clean.

Tying it together

So, in Rust, futures can be chained together to provide a lightweight stack-like system. With async/await, you can neatly write these future chains, and await provides explicit interruption points on each I/O operation. Tokio provides an event loop “scheduler” abstraction, which you can feed async functions to, and under the hood it uses mio to abstract over low level non-blocking I/O primitives.

These are components which can be used independently — you can use tokio with futures without using async/await. You can use async/await without using Tokio. For example, I think this would be useful for Servo’s networking stack. It doesn’t need to do much parallel I/O (not at the order of thousands of threads), so it can just use multiplexed OS threads. However, we’d still want to pool threads and pipeline data well, and async/await would help here.

Put together, all these components get something almost as clean as the Go stuff, with a little more explicit boilerplate. Because generators (and thus async/await) play nice with the borrow checker (they’re just enum state machines under the hood), Rust’s safety guarantees are all still in play, and we get to have “fearless concurrency” for programs having a huge quantity of I/O bound tasks!

Thanks to Arshia Mufti, Steve Klabnik, Zaki Manian, and Kyle Huey for reviewing drafts of this post


  1. Note that this isn’t necessarily true for all network server applications. For example, Apache uses OS threads. OS threads are often the best tool for the job.

  2. Lightweight threading is also often called M:N threading (also “green threading”)

  3. In general future combinators aren’t really aware of tokio or even I/O, so there’s no easy way to ask a combinator “hey, what I/O operation are you waiting for?”. Instead, with Tokio you use special I/O primitives that still provide futures but also register themselves with the scheduler in thread local state. This way when a future is waiting for I/O, Tokio can check what the recentmost I/O operation was, and associate it with that future so that it can wake up that future again when epoll tells it that that I/O operation is ready.

  4. The Generator trait has a resume() function which you can call multiple times, and each time it will return any yielded data or tell you that the generator has finished running.

Rust in 2018

A week ago we put out a call for blog posts for what folks think Rust should do in 2018.

This is mine.

Overall focus

I think 2017 was a great year for Rust. Near the beginning of the year, after custom derive and a bunch of things stabilized, I had a strong feeling that Rust was “complete”. Not really “finished”, there’s still tons of stuff to improve, but this was the first time stable Rust was the language I wanted it to be, and was something I could recommend for most kinds of work without reservations.

I think this is a good signal to wind down the frightening pace of new features Rust has been getting. And that happened! We had the impl period, which took some time to focus on getting things done before proposing new things. And Rust is feeling more polished than ever.

Like Nick, I feel like 2018 should be boring. I feel like we should focus on polishing what we have, implementing all the things, and improving our approachability as a language.

Basically, I want to see this as an extended impl period.

This doesn’t mean I’m looking for a moratorium on RFCs, really. Hell, in the past few days I’ve posted one pre-pre-RFC1, one pre-RFC, and one RFC (from the pre-RFC). I’m mostly looking for prioritizing impl work over designing new things, but still having some focus on design.

Language

I think Rust still has some “missing bits” which make it hard to justify for some use cases. Rust’s async story is being fleshed out. We don’t yet have stable SIMD or stable inline ASM. The microcontroller story is kinda iffy. RLS/clippy need nightly. I’d like to see these crystallize and stabilize this year.

I think this year we need to continue to take a critical look at Rust’s ergonomics. Last year the ergonomics initiative was really good for Rust, and I’d like to see more of that. This is kind of at odds with my “focus on polishing Rust” statement, but fixing ergonomics is not just new features. It’s also about figuring out barriers in Rust, polishing mental models, improving docs/diagnostics, and in general figuring out how to best present Rust’s features. Starting dialogues about confusing bits of the language and figuring out the best mental model to present them with is something we should continue doing. Sometimes this may need new features, indeed, but not always. We must continue to take a critical look at how our language presents itself to newcomers.

Community

I’d like to see a stronger focus on mentoring. Mentoring on rustc, mentoring on major libraries, mentoring on Rust tooling, mentoring everywhere. This includes not just the mentors, but the associated infrastructure – contribution docs, sites like servo-starters and findwork, and similar tooling.

I’m also hoping for more companies to invest back into Rust. This year Buoyant became pretty well known within the community, and many of their employees are paid to work on various important parts of the Rust ecosystem. There are also multiple consulting groups that contribute to the ecosystem. It’s nice to see that “paid to work on Rust” is no longer limited to Mozilla, and this is crucial for the health of the language. I hope this trend continues.

Finally, I want to see more companies talk about Rust. Success stories are really nice to hear. I’ve heard many amazing success stories this year, but a lot of them are things which can’t be shared.

Governance

Last year we started seeing the limits of the RFC process. Large RFCs were stressful for both the RFC authors and participating community members, and rather opaque for newer community members wishing to participate. Alternative models have been discussed; I’d like to see more movement on this front.

I’d also like to grow the moderation team; it is currently rather small and doesn’t have the capacity to handle incidents in a timely fashion.

Docs / Learning

I’d like to see a focus on improving Rust for folks who learn the language by trying things over reading books 2 3.

This means better diagnostics, better alternative resources like rustbyexample, etc. Improving mentorship helps here as well.

Of course, I’d like to see our normal docs work continue to happen.


I’m overall really excited for 2018. I think we’re doing great on most fronts so far, and if we maintain the momentum we’ll have an even-more-awesome Rust by the end of this year!


  1. This isn’t a “pre rfc” because I’ve written it as a much looser sketch of the problem and a solution

  2. There is literally no programming language I’ve personally learned through a book or formal teaching. I’ve often read books after I know a language because it’s fun and instructive, but it’s always started out as “learn extreme basics” followed by “look at existing code, tweak stuff, and write your own code”.

  3. Back in my day Rust didn’t have a book, just this tiny thing called “The Tutorial”. grouches incessantly

Undefined vs Unsafe in Rust

Recently Julia Evans wrote an excellent post about debugging a segfault in Rust. (Go read it, it’s good)

One thing it mentioned was

I think “undefined” and “unsafe” are considered to be synonyms.

This is … incorrect. However, we in the Rust community have never really explicitly outlined the distinction, so that confusion is on us! This blog post is an attempt to clarify the difference of terminology as used within the Rust community. It’s a very useful but subtle distinction and I feel we’d be able to talk about safety more expressively if this was well known.

Unsafe means two things in Rust, yay

So, first off, the waters are a bit muddied by the fact that Rust uses unsafe to both mean “within an unsafe {} block” block and “something Bad is happening here”. It’s possible to have safe code within an unsafe block; indeed this is the primary function of an unsafe block. Somewhat counterintutively, the unsafe block’s purpose is to actually tell the compiler “I know you don’t like this code but trust me, it’s safe!” (where “safe” is the negation of the second meaning of “unsafe”, i.e. “something Bad is not happening here”).

Similarly, we use “safe code” to mean “code not using unsafe{} blocks” but also “code that is not unsafe”, i.e. “code where nothing bad happens”.

This blog post is primarily about the “something bad is happening here” meaning of “unsafe”. When referring to the other kind I’ll specifically say “code within unsafe blocks” or something like that.

Undefined behavior

In languages like C, C++, and Rust, undefined behavior is when you reach a point where the compiler is allowed to do anything with your code. This is distinct from implementation-defined behavior, where usually a given compiler/library will do a deterministic thing, however they have some freedom from the spec in deciding what that thing is.

Undefined behavior can be pretty scary. This is usually because in practice it causes problems when the compiler assumes “X won’t happen because it is undefined behavior”, and X ends up happening, breaking the assumptions. In some cases this does nothing dangerous, but often the compiler will end up doing wacky things to your code. Dereferencing a null pointer will sometimes cause segfaults (which is the compiler generating code that actually dereferences the pointer, making the kernel complain), but sometimes it will be optimized in a way that assumes it won’t and moves around code such that you have major problems.

Undefined behavior is a global property, based on how your code is used. The following function in C++ or Rust may or may not exhibit undefined behavior, based on how it gets used:

1
2
3
int deref(int* x) {
    return *x;
}
1
2
3
4
// do not try this at home
fn deref(x: *mut u32) -> u32 {
    unsafe { *x }
}

As long as you always call it with a valid pointer to an integer, there is no undefined behavior involved.

But in either language, if you use it with some pointer conjured out of thin air (or, like 0x01), that’s probably undefined behavior.

As it stands, UB is a property of the entire program and its execution. Sometimes you may have snippets of code that will always exhibit undefined behavior regardless of how they are called, but in general UB is a global property.

Unsafe behavior

Rust’s concept of “unsafe behavior” (I’m coining this term because “unsafety” and “unsafe code” can be a bit confusing) is far more scoped. Here, fn deref is “unsafe”1, even if you always call it with a valid pointer. The reason it is still unsafe is because it’s possible to trigger UB by only changing the “safe” caller code. I.e. “changes to code outside unsafe blocks can trigger UB if they include calls to this function”.

Basically, in Rust a bit of code is “safe” if it cannot exhibit undefined behavior under all circumstances of that code being used. The following code exhibits “safe behavior”:

1
2
3
4
5
unsafe {
    let x = 1;
    let raw = &x as *const u32;
    println!("{}", *raw);
}

We dereferenced a raw pointer, but we knew it was valid. Of course, actual unsafe blocks will usually be “actually totally safe” for less obvious reasons, and part of this is because unsafe blocks sometimes can pollute the entire module.

Basically, “safe” in Rust is a more local property. Code isn’t safe just because you only use it in a way that doesn’t trigger UB, it is safe because there is literally no way to use it such that it will do so. No way to do so without using unsafe blocks, that is2.

This is a distinction that’s possible to draw in Rust because it gives us the ability to compartmentalize safety. Trying to apply this definition to C++ is problematic; you can ask “is std::unique_ptr<T> safe?”, but you can always use it within code in a way that you trigger undefined behavior, because C++ does not have the tools for compartmentalizing safety. The distinction between “code which doesn’t need to worry about safety” and “code which does need to worry about safety” exists in Rust in the form of “code outside of unsafe {}” and “code within unsafe {}”, whereas in C++ it’s a lot fuzzier and based on expectations (and documentation/the spec).

So C++’s std::unique_ptr<T> is “safe” in the sense that it does what you expect but if you use it in a way counter to how it’s supposed to be used (constructing one from an invalid pointer, for example) it can blow up. This is still a useful sense of safety, and is how one regularly reasons about safety in C++. However it’s not the same sense of the term as used in Rust, which can be a bit more formal about what the expectations actually are.

So unsafe in Rust is a strictly more general concept – all code exhibiting undefined behavior in Rust is also “unsafe”, however not all “unsafe” code in Rust exhibits undefined behavior as written in the current program.

Rust furthermore attempts to guarantee that you will not trigger undefined behavior if you do not use unsafe {} blocks. This of course depends on the correctness of the compiler (it has bugs) and of the libraries you use (they may also have bugs) but this compartmentalization gets you most of the way there in having UB-free programs.


  1. Once again in we have a slight difference between an “unsafe fn”, i.e. a function that needs an unsafe block to call and probably is unsafe, and an “unsafe function”, a function that exhibits unsafe behavior.

  2. This caveat and the confusing dual-usage of the term “safe” lead to the rather tautological-sounding sentence “Safe Rust code is Rust code that cannot cause undefined behavior when used in safe Rust code”

Font-size: An Unexpectedly Complex CSS Property

font-size is the worst.

It’s a CSS property probably everyone who writes CSS has used at some point. It’s pretty ubiquitous.

And it’s super complicated.

“But it’s just a number”, you say. “How can that be complicated?”

I too felt that way one time. And then I worked on implementing it for stylo.

Stylo is the project to integrate Servo’s styling system into Firefox. The styling system handles parsing CSS, determining which rules apply to which elements, running this through the cascade, and eventually computing and assigning styles to individual elements in the tree. This happens not only on page load, but also whenever various kinds of events (including DOM manipulation) occur, and is a nontrivial portion of pageload and interaction times.

Servo is in Rust, and makes use of Rust’s safe parallelism in many places, one of them being styling. Stylo has the potential to bring these speedups into Firefox, along with the added safety of the code being in a safer systems language.

Anyway, as far as the styling system is concerned, I believe that font-size is the most complex property it has to handle. Some properties may be more complicated when it comes to layout or rendering, but font-size is probably the most complex one in the department of styling.

I’m hoping this post can give an idea of how complex the Web can get, and also serve as documentation for some of these complexities. I’ll also try to give an idea of how the styling system works throughout this post.

Alright. Let’s see what is so complex about font-size.

The basics

The syntax of the property is pretty straightforward. You can specify it as:

  • A length (12px, 15pt, 13em, 4in, 8rem)
  • A percentage (50%)
  • A compound of the above, via a calc (calc(12px + 4em + 20%))
  • An absolute keyword (medium, small, large, x-large, etc)
  • A relative keyword (larger, smaller)

The first three are common amongst quite a few length-related properties. Nothing abnormal in the syntax.

The next two are interesting. Essentially, the absolute keywords map to various pixel values, and match the result of <font size=foo> (e.g. size=3 is the same as font-size: medium). The actual value they map to is not straightforward, and I’ll get to that later in this post.

The relative keywords basically scale the size up or down. The mechanism of the scaling was also complex, however this has changed. I’ll get to that too.

em and rem units

First up: em units. One of the things you can specify in any length-based CSS property is a value with an em or rem unit.

5em means “5 times the font-size of the element this is applied to”. 5rem means “5 times the font-size of the root element”

The implications of this are that font-size needs to be computed before all the other properties (well, not quite, but we’ll get to that!) so that it is available during that time.

You can also use em units within font-size itself. In this case, it computed relative to the font-size of the parent element, since you can’t use the font-size of the element to compute itself.

Minimum font size

Browsers let you set a “minimum” font size in their preferences, and text will not be scaled below it. It’s useful for those with trouble seeing small text.

However, this doesn’t affect properties which depend on font-size via em units. So if you’re using a minimum font size, <div style="font-size: 1px; height: 1em; background-color: red"> will have a very tiny height (which you’ll notice from the color), but the text will be clamped to the minimum size.

What this effectively means is that you need to keep track of two separate computed font size values. There’s one value that is used to actually determine the font size used for the text, and one value that is used whenever the style system needs to know the font-size (e.g. to compute an em unit.)

This gets slightly more complicated when ruby is involved. In ideographic scripts (usually, Han and Han-based scripts like Kanji or Hanja) it’s sometimes useful to have the pronunciation of each character above it in a phonetic script, for the aid of readers without proficiency in that script, and this is known as “ruby” (“furigana” in Japanese). Because these scripts are ideographic, it’s not uncommon for learners to know the pronunciation of a word but have no idea how to write it. An example would be ほん, which is 日本 (“nihon”, i.e. “Japan”) in Kanji with ruby にほん in the phonetic Hiragana script above it.

As you can probably see, the phonetic ruby text is in a smaller font size (usually 50% of the font size of the main text1). The minimum font-size support respects this, and ensures that if the ruby is supposed to be 50% of the size of the text, the minimum font size for the ruby is 50% of the original minimum font size. This avoids clamped text from looking like ほん (where both get set to the same size), which is pretty ugly.

Text zoomm

Firefox additionally lets you zoom text only when zooming. If you have trouble reading small things, it’s great to be able to just blow up the text on the page without having the whole page get zoomed (which means you need to scroll around a lot).

In this case, em units of other properties do get zoomed as well. After all, they’re supposed to be relative to the text’s font size (and may have some relation to the text), so if that size has changed so should they.

(Of course, that argument could also apply to the min font size stuff. I don’t have an answer for why it doesn’t.)

This is actually pretty straightforward to implement. When computing absolute font sizes (including keywords), zoom them if text zoom is on. For everything else continue as normal.

Text zoom is also disabled within <svg:text> elements, which leads to some trickiness here.

Interlude: How the style system works

Before I go ahead it’s probably worth giving a quick overview of how everything works.

The responsibiltiy of a style system is to take in CSS code and a DOM tree, and assign computed styles to each element.

There’s a distinction between “specified” and “computed” here. “specified” styles are in the format you specify in CSS, whereas computed styles are those that get attached to the elements, sent to layout, and inherited from. A given specified style may compute to different values when applied to different elements.

So while you can specify width: 5em, it will compute to something like width: 80px. Computed values are usually a cleaned up form of the specified value.

The style system will first parse the CSS, producing a bunch of rules usually containing declarations (a declaration is like width: 20%;; i.e. a property name and a specified value)

It then goes through the tree in top-down order (this is parallelized in Stylo), figuring out which declarations apply to each element and in which order – some declarations have precedence over others. Then it will compute each relevant declaration against the element’s style (and parent style, among other bits of info), and store this value in the element’s “computed style”.

There are a bunch of optimizations that Gecko and Servo do here to avoid duplicated work2. There’s a bloom filter for quickly checking if deep descendent selectors apply to a subtree. There’s a “rule tree” that helps cache effort from determining applicable declarations. Computed styles are reference counted and shared very often (since the default state is to inherit from the parent or from the default style).

But ultimately, this is the gist of what happens.

Keyword values

Alright, this is where it gets complicated.

Remember when I said font-size: medium was a thing that mapped to a value?

So what does it map to?

Well, it turns out, it depends on the font family. For the following HTML:

1
2
<span style="font: medium monospace">text</span>
<span style="font: medium sans-serif">text</span>

you get (codepen)

text text

where the first one computes to a font-size of 13px, and the second one computes to a font-size of 16px. You can check this in the computed style pane of your devtools, or by using getComputedStyle().

I think the reason behind this is that monospace fonts tend to be wider, so the default font size (medium) is scaled so that they have similar widths, and all other keyword font sizes get shifted as well. The final result is something like this:

Firefox and Servo have a matrix that helps derive the values for all the absolute font-size keywords based on the “base size” (i.e. the computed of font-size: medium). Actually, Firefox has three tables to support some legacy use cases like quirks mode (Servo has yet to add support for these tables). We query other parts of the browser for what the “base size” is based on the language and font family.

Wait, but what does the language have to do with this anyway? How does the language impact font-size?

It turns out that the base size depends on the font family and the language, and you can configure this.

Both Firefox and Chrome (using an extension) actually let you tweak which fonts get used on a per-language basis, as well as the default (base) font-size.

This is not as obscure as one might think. Default system fonts are often really ugly for non-Latin- using scripts. I have a separate font installed that produces better-looking Devanagari ligatures.

Similarly, some scripts are just more intricate than Latin. My default font size for Devanagari is set to 18 instead of 16. I’ve started learning Mandarin and I’ve set that font size to 18 as well. Hanzi glyphs can get pretty complicated and I still struggle to learn (and later recognize) them. A larger font size is great for this.

Anyway, this doesn’t complicate things too much. This does mean that the font family needs to be computed before font-size, which already needs to be computed before most other properties. The language, which can be set using a lang HTML attribute, is internally treated as a CSS property by Firefox since it inherits, and it must be computed earlier as well.

Not too bad. So far.

Now here’s the kicker. This dependence on the language and family inherits.

Quick, what’s the font-size of the inner div?

1
2
3
4
5
6
<div style="font-size: medium; font-family: sans-serif;"> <!-- base size 16 -->
    font size is 16px
    <div style="font-family: monospace"> <!-- base size 13 -->
        font size is ??
    </div>
</div>

For a normal inherited CSS property3, if the parent has a computed value of 16px, and the child has no additional values specified, the child will inherit a value of 16px. Where the parent got that computed value from doesn’t matter.

Here, font-size “inherits” a value of 13px. You can see this below (codepen):

font size is 16px
font size is ??

Basically, if the computed value originated from a keyword, whenever the font family or language change, font-size is recomputed from the original keyword with the new font family and language.

The reason this exists is because otherwise the differing font sizes wouldn’t work anyway! The default font size is medium, so basically the root element gets a font-size: medium and all elements inherit from it. If you change to monospace or a different language in the document you need the font-size recomputed.

But it doesn’t stop here. This even inherits through relative units (Not in IE).

1
2
3
4
5
6
7
8
9
<div style="font-size: medium; font-family: sans-serif;"> <!-- base size 16 -->
    font size is 16px
    <div style="font-size: 0.9em"> <!-- could also be font-size: 50%-->
        font size is 14.4px (16 * 0.9)
        <div style="font-family: monospace"> <!-- base size 13 -->
            font size is 11.7px! (13 * 0.9)
        </div>
    </div>
</div>

(codepen)

font size is 16px
font size is 14.4px (16 * 0.9)
font size is 11.7px! (13 * 0.9)

So we’re actually inheriting a font-size of 0.9*medium when we inherit from the second div, not 14.4px.

Another way of looking at it is whenever the font family or language changes, you should recompute the font-size as if the language and family were always that way up the tree.

Firefox code uses both of these strategies. The original Gecko style system handles this by actually going back to the top of the tree and recalculating the font size as if the language/family were different. I suspect this is inefficient, but the rule tree seems to be involved in making this slightly more efficient

Servo, on the other hand, stores some extra data on the side when computing stuff, data which gets copied over to the child element. It basically stores the equivalent of saying “Yes, this font was computed from a keyword. The keyword was medium, and after that we applied a factor of 0.9 to it.”4

In both cases, this leads to a bunch of complexities in all the other font-size complexities, since they need to be carefully preserved through this.

In Servo, most of this gets handled via custom cascading functions for font-size.

Larger/smaller

So I mentioned that font-size: larger and smaller scale the size, but didn’t mention by what fraction.

According to the spec, if the font-size currently matches the value of an absolute keyword size (medium/large/etc), you should pick the value of the next/previous keyword sizes respectively.

If it is between two, find the same point between the next/previous two sizes.

This, of course, must play well with the weird inheritance of keyword font sizes mentioned before. In gecko’s model this isn’t too hard, since Gecko recalculates things anyway. In Servo’s model we’d have to store a sequence of applications of larger/smaller and relative units, instead of storing just a relative unit.

Additionally, when computing this during text-zoom, you have to unzoom before looking it up in the table, and then rezoom.

Overall, a bunch of complexity for not much gain — turns out only Gecko actually followed the spec here! All other browser engines used simple ratios here.

So my fix here was simply to remove this behavior from Gecko. That simplified things.

MathML

Firefox and Safari support MathML, a markup language for math. It doesn’t get used much on the Web these days, but it exists.

MathML has its own complexities when it comes to font-size. Specifically, scriptminsize, scriptlevel, and scriptsizemultiplier.

For example, in MathML, the text in the numerator or denominator of a fraction or the text of a superscript is 0.71 times the size of the text outside of it. This is because the default scriptsizemultiplier for MathML elements is 0.71, and these specific elements all get a default scriptlevel of +1.

Basically, scriptlevel=+1 means “multiply the font size by scriptsizemultiplier”, and scriptlevel=-1 is for dividing. This can be specified via a scriptlevel HTML attribute on an mstyle element. You can similarly tweak the (inherited) multiplier via the scriptsizemultiplier HTML attribute, and the minimum size via scriptminsize.

So, for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<math><msup>
    <mi>text</mi>
    <mn>small superscript</mn>
</msup></math><br>
<math>
    text
    <mstyle scriptlevel=+1>
        small
        <mstyle scriptlevel=+1>
            smaller
            <mstyle scriptlevel=-1>
                small again
            </mstyle>
        </mstyle>
    </mstyle>
</math>

will show as (you will need Firefox to see the rendered version, Safari supports MathML too but the support isn’t as good):

textsmall superscript
text small smaller small again

(codepen)

So this isn’t as bad. It’s as if scriptlevel is a weird em unit. No biggie, we know how to deal with those already.

Except you also have scriptminsize. This lets you set the minimum font size for changes caused by scriptlevel.

This means that scriptminsize will make sure scriptlevel never causes changes that make the font smaller than the min size, but it will ignore cases where you deliberately specify an em unit or a pixel value.

There’s already a subtle bit of complexity introduced here, scriptlevel now becomes another thing that tweaks how font-size inherits. Fortunately, in Firefox/Servo internally scriptlevel (as are scriptminsize and scriptsizemultiplier) is also handled as a CSS property, which means that we can use the same framework we used for font-family and language here – compute the script properties before font-size, and if scriptlevel is set, force-recalculate the font size even if font-size itself was not set.

Interlude: early and late computed properties

In Servo the way we handle dependencies in properties is to have a set of “early” properties and a set of “late” properties (which are allowed to depend on early properties). We iterate the declarations twice, once looking for early properties, and once for late. However, now we have a pretty intricate set of dependencies, where font-size must be calculated after language, font-family, and the script properties, but before everything else that involves lengths. Additionally, font-family has to be calculated after all the other early properties due to another font complexity I’m not covering here.

The way we handle this is to pull font-size and font-family out during the early computation, but not deal with them until after the early computation is done.

At that stage we first handle the disabling of text-zoom, and then handle the complexities of font-family.

We then compute the font family. If a font size was specified, we just compute that. If it was not, but a font family, lang, or scriptlevel was specified, we force compute as inherited, which handles all the constraints.

Why scriptminsize gets complicated

Unlike with the other “minimum font size”, using an em unit in any property will calculate the length with the clamped value, not the “if nothing had been clamped” value, when the font size has been clamped with scriptminsize. So at first glance handling this seems straightforward; only consider the script min size when deciding to scale because of scriptlevel.

As always, it’s not that simple 😀:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<math>
<mstyle scriptminsize="10px" scriptsizemultiplier="0.75" style="font-size:20px">
    20px
    <mstyle scriptlevel="+1">
        15px
        <mstyle scriptlevel="+1">
            11.25px
                <mstyle scriptlevel="+1">
                    would be 8.4375, but is clamped at 10px
                        <mstyle scriptlevel="+1">
                            would be 6.328125, but is clamped at 10px
                                <mstyle scriptlevel="-1">
                                    This is not 10px/0.75=13.3, rather it is still clamped at 10px
                                        <mstyle scriptlevel="-1">
                                            This is not 10px/0.75=13.3, rather it is still clamped at 10px
                                            <mstyle scriptlevel="-1">
                                                This is 11.25px again
                                                    <mstyle scriptlevel="-1">
                                                        This is 15px again
                                                    </mstyle>
                                            </mstyle>
                                        </mstyle>
                                </mstyle>
                        </mstyle>
                </mstyle>
        </mstyle>
    </mstyle>
</mstyle>
</math>

(codepen)

Basically, if you increase the level a bunch of times after hitting the min size, decreasing it by one should not immediately compute min size / multiplier. That would make things asymmetric; something with a net script level of +5 should have the same size as something with a net script level of +6 -1, provided the multiplier hasn’t changed.

So what happens is that the script level is calculated against the font size as if scriptminsize had never applied, and we only use that size if it is greater than the min size.

It’s not just a matter of keeping track of the script level at which clamping happened – the multiplier could change in the process and you need to keep track of that too. So this ends up in creating yet another font-size value to inherit.

To recap, we are now at four different notions of font size being inherited:

  • The main font size used by styling
  • The “actual” font size, i.e. the main font size but clamped by the min size
  • (In servo only) The “keyword” size; i.e. the size stored as a keyword and ratio, if it was derived from a keyword
  • The “script unconstrained” size; the font size as if scriptminsize never existed.

Another complexity here is that the following should still work:

1
2
3
4
5
6
7
8
<math>
<mstyle scriptminsize="10px" scriptsizemultiplier="0.75" style="font-size: 5px">
    5px
    <mstyle scriptlevel="-1">
        6.666px
    </mstyle>
</mstyle>
</math>

(codepen)

Basically, if you were already below the scriptminsize, reducing the script level (to increase the font size) should not get clamped, since then you’d get something too large.

This basically means you only apply scriptminsize if you are applying the script level to a value greater than the script min size.

In Servo, all of the MathML handling culminates in this wonderful function that is more comment than code, and some code in the functions near it.


So there you have it. font-size is actually pretty complicated. A lot of the web platform has hidden complexities like this, and it’s always fun to encounter more of them.

(Perhaps less fun when I have to implement them 😂)

Thanks to mystor, mgattozzi, bstrie, and projektir for reviewing drafts of this post


  1. Interestingly, in Firefox, this number is 50% for all ruby except for when the language is Taiwanese Mandarin (where it is 30%). This is because Taiwan uses a phonetic script called Bopomofo, and each Han glyph can be represented as a maximum of 3 Bopomofo letters. So it is possible to choose a reasonable minimum size such that the ruby never extends the size of the glyph below it. On the other hand, pinyin can be up to six letters, and Hiranaga up to (I think) 5, and the corresponding “no overflow” scaling will be too tiny. So fitting them on top of the glyph is not a consideration and instead we elect to have a larger font size for better readability. Additionally, Bopomofo ruby is often set on the side of the glyph instead of on top, and 30% works better there. (h/t @upsuper for pointing this out)

  2. Other browser engines have other optimizations, I’m just less familiar with them

  3. Some properties are inherited, some are “reset”. For example, font-family is inherited — child elements inherit font family from the parent unless otherwise specified. However transform is not, if you transform an element that does not further transform the children.

  4. This won’t handle calcs, which is something I need to fix. Fixing this is trivial, you store an absolute offset in addition to the ratio.

Teaching Programming: Proactive vs Reactive

I’ve been thinking about this a lot these days. In part because of an idea I had but also due to this twitter discussion.

When teaching most things, there are two non-mutually-exclusive ways of approaching the problem. One is “proactive”1, which is where the teacher decides a learning path beforehand, and executes it. The other is “reactive”, where the teacher reacts to the student trying things out and dynamically tailors the teaching experience.

Most in-person teaching experiences are a mix of both. Planning beforehand is very important whilst teaching, but tailoring the experience to the student’s reception of the things being taught is important too.

In person, you can mix these two, and in doing so you get a “best of both worlds” situation. Yay!

But … we don’t really learn much programming in a classroom setup. Sure, some folks learn the basics in college for a few years, but everything they learn after that isn’t in a classroom situation where this can work2. I’m an autodidact, and while I have taken a few programming courses for random interesting things, I’ve taught myself most of what I know using various sources. I care a lot about improving the situation here.

With self-driven learning we have a similar divide. The “proactive” model corresponds to reading books and docs. Various people have proactively put forward a path for learning in the form of a book or tutorial. It’s up to you to pick one, and follow it.

The “reactive” model is not so well-developed. In the context of self-driven learning in programming, it’s basically “do things, make mistakes, hope that Google/Stackoverflow help”. It’s how a lot of people learn programming; and it’s how I prefer to learn programming.

It’s very nice to be able to “learn along the way”. While this is a long and arduous process, involving many false starts and a lack of a sense of progress, it can be worth it in terms of the kind of experience this gets you.

But as I mentioned, this isn’t as well-developed. With the proactive approach, there still is a teacher – the author of the book! That teacher may not be able to respond in real time, but they’re able to set forth a path for you to work through.

On the other hand, with the “reactive” approach, there is no teacher. Sure, there are Random Answers on the Internet, which are great, but they don’t form a coherent story. Neither can you really be your own teacher for a topic you do not understand.

Yet plenty of folks do this. Plenty of folks approach things like learning a new language by reading at most two pages of docs and then just diving straight in and trying stuff out. The only language I have not done this for is the first language I learned3 4.

I think it’s unfortunate that folks who prefer this approach don’t get the benefit of a teacher. In the reactive approach, teachers can still tell you what you’re doing wrong and steer you away from tarpits of misunderstanding. They can get you immediate answers and guidance. When we look for answers on stackoverflow, we get some of this, but it also involves a lot of pattern-matching on the part of the student, and we end up with a bad facsimile of what a teacher can do for you.

But it’s possible to construct a better teacher for this!

In fact, examples of this exist in the wild already!

The Elm compiler is my favorite example of this. It has amazing error messages

The error messages tell you what you did wrong, sometimes suggest fixes, and help correct potential misunderstandings.

Rust does this too. Many compilers do. (Elm is exceptionally good at it)

One thing I particularly like about Rust is that from that error you can try rustc --explain E0373 and get a terminal-friendly version of this help text.

Anyway, diagnostics basically provide a reactive component to learning programming. I’ve cared about diagnostics in Rust for a long time, and I often remind folks that many things taught through the docs can/should be taught through diagnostics too. Especially because diagnostics are a kind of soapbox for compiler writers — you can’t guarantee that your docs will be read, but you can guarantee that your error messages will. These days, while I don’t have much time to work on stuff myself I’m very happy to mentor others working on improving diagnostics in Rust.

Only recently did I realize why I care about them so much – they cater exactly to my approach to learning programming languages! If I’m not going to read the docs when I get started and try the reactive approach, having help from the compiler is invaluable.

I think this space is relatively unexplored. Elm might have the best diagnostics out there, and as diagnostics (helping all users of a language – new and experienced), they’re great, but as a teaching tool for newcomers; they still have a long way to go. Of course, compilers like Rust are even further behind.

One thing I’d like to experiment with is a first-class tool for reactive teaching. In a sense, clippy is already something like this. Clippy looks out for antipatterns, and tries to help teach. But it also does many other things, and not all are teaching moments are antipatterns.

For example, in C, this isn’t necessarily an antipattern:

1
2
3
4
struct thingy *result;
if (result = do_the_thing()) {
    frob(*result)
}

Many C codebases use if (foo = bar()). It is a potential footgun if you confuse it with ==, but there’s no way to be sure. Many compilers now have a warning for this that you can silence by doubling the parentheses, though.

In Rust, this isn’t an antipattern either:

1
2
3
4
5
6
7
fn add_one(mut x: u8) {
    x += 1;
}

let num = 0;
add_one(num);
// num is still 0

For someone new to Rust, they may feel that the way to have a function mutate arguments (like num) passed to it is to use something like mut x: u8. What this actually does is copies num (because u8 is a Copy type), and allows you to mutate the copy within the scope of the function. The right way to make a function that mutates arguments passed to it by-reference would be to do something like fn add_one(x: &mut u8). If you try the mut x thing for non-Copy values, you’d get a “reading out of moved value” error when you try to access num after calling add_one. This would help you figure out what you did wrong, and potentially that error could detect this situation and provide more specific help.

But for Copy types, this will just compile. And it’s not an antipattern – the way this works makes complete sense in the context of how Rust variables work, and is something that you do need to use at times.

So we can’t even warn on this. Perhaps in “pedantic clippy” mode, but really, it’s not a pattern we want to discourage. (At least in the C example that pattern is one that many people prefer to forbid from their codebase)

But it would be nice if we could tell a learning programmer “hey, btw, this is what this syntax means, are you sure you want to do this?”. With explanations and the ability to dismiss the error.

In fact, you don’t even need to restrict this to potential footguns!

You can detect various things the learner is trying to do. Are they probably mixing up String and &str? Help them! Are they writing a trait? Give a little tooltip explaining the feature.

This is beginning to remind me of the original “office assistant” Clippy, which was super annoying. But an opt-in tool or IDE feature which gives helpful suggestions could still be nice, especially if you can strike a balance between being so dense it is annoying and so sparse it is useless.

It also reminds me of well-designed tutorial modes in games. Some games have a tutorial mode that guides you through a set path of doing things. Other games, however, have a tutorial mode that will give you hints even if you stray off the beaten path. Michael tells me that Prey is a recent example of such a game.

This really feels like it fits the “reactive” model I prefer. The student gets to mold their own journey, but gets enough helpful hints and nudges from the “teacher” (the tool) so that they don’t end up wasting too much time and can make informed decisions on how to proceed learning.

Now, rust-clippy isn’t exactly the place for this kind of tool. This tool needs the ability to globally “silence” a hint once you’ve learned it. rust-clippy is a linter, and while you can silence lints in your code, you can’t silence them globally for the current user. Nor does that really make sense.

But rust-clippy does have the infrastructure for writing stuff like this, so it’s an ideal prototyping point. I’ve filed this issue to discuss this topic.

Ultimately, I’d love to see this as an IDE feature.

I’d also like to see more experimentation in the department of “reactive” teaching — not just tools like this.

Thoughts? Ideas? Let me know!

thanks to Andre (llogiq) and Michael Gattozzi for reviewing this


  1. This is how I’m using these terms. There seems to be precedent in pedagogy for the proactive/reactive classification, but it might not be exactly the same as the way I’m using it.

  2. This is true for everything, but I’m focusing on programming (in particular programming languages) here.

  3. And when I learned Rust, it only had two pages of docs, aka “The Tutorial”. Good times.

  4. I do eventually get around to doing a full read of the docs or a book but this is after I’m already able to write nontrivial things in the language, and it takes a lot of time to get there.

Mentally Modelling Modules

The module and import system in Rust is sadly one of the many confusing things you have to deal with whilst learning the language. A lot of these confusions stem from a misunderstanding of how it works. In explaining this I’ve seen that it’s usually a common set of misunderstandings.

In the spirit of “You’re doing it wrong”, I want to try and explain one “right” way of looking at it. You can go pretty far1 without knowing this, but it’s useful and helps avoid confusion.



First off, just to get this out of the way, mod foo; is basically a way of saying “look for foo.rs or foo/mod.rs and make a module named foo with its contents”. It’s the same as mod foo { ... } except the contents are in a different file. This itself can be confusing at first, but it’s not what I wish to focus on here. The Rust book explains this more in the chapter on modules.

In the examples here I will just be using mod foo { ... } since multi-file examples are annoying, but keep in mind that the stuff here applies equally to multi-file crates.

Motivating examples

To start off, I’m going to provide some examples of Rust code which compiles. Some of these may be counterintuitive, based on your existing model.

1
2
3
4
5
6
7
pub mod foo {
    extern crate regex;

    mod bar {
        use foo::regex::Regex;
    }
}

(playpen)

1
2
3
4
5
6
7
8
9
10
11
use std::mem;


pub mod foo {
    // not std::mem::transmute!
    use mem::transmute;

    pub mod bar {
        use foo::transmute;
    }
}

(playpen)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
pub mod foo {
    use bar;
    use bar::bar_inner;

    fn foo() {
        // this works!
        bar_inner();
        bar::bar_inner();
        // this doesn't
        // baz::baz_inner();

        // but these do!
        ::baz::baz_inner();
        super::baz::baz_inner();

        // these do too!
        ::bar::bar_inner();
        super::bar::bar_inner();
        self::bar::bar_inner();

    }
}

pub mod bar {
    pub fn bar_inner() {}
}
pub mod baz {
    pub fn baz_inner() {}
}

(playpen)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
pub mod foo {
    use bar::baz;
    // this won't work
    // use baz::inner();

    // this will
    use self::baz::inner;
    // or
    // use bar::baz::inner

    pub fn foo() {
        // but this will work!
        baz::inner();
    }
}

pub mod bar {
    pub mod baz {
        pub fn inner() {}
    }
}

(playpen)

These examples remind me of the “point at infinity” in elliptic curve crypto or fake particles in physics or fake lattice elements in various fields of CS2. Sometimes, for something to make sense, you add in things that don’t normally exist. Similarly, these examples may contain code which is not traditional Rust style, but the import system still makes more sense when you include them.

Imports

The core confusion behind how imports work can really be resolved by remembering two rules:

  • use foo::bar::baz resolves foo relative to the root module (lib.rs or main.rs)
    • You can resolve relative to the current module by explicily trying use self::foo::bar::baz
  • foo::bar::baz within your code3 resolves foo relative to the current module
    • You can resolve relative to the root by explicitly using ::foo::bar::baz

That’s actually … it. There are no further caveats. The rest of this is modelling what constitutes as “being within a module”.

Let’s take a pretty standard setup, where extern crate declarations are placed in the the root module:

1
2
3
4
5
6
7
8
9
10
11
extern crate regex;

mod foo {
    use regex::Regex;

    fn foo() {
        // won't work
        // let ex = regex::Regex::new("");
        let ex = Regex::new("");
    }
}

When we say extern crate regex, we pull in the regex crate into the crate root. This behaves pretty similar to mod regex { /* contents of regex crate */}. Basically, we’ve imported the crate into the crate root, and since all use paths are relative to the crate root, use regex::Regex works fine inside the module.

Inline in code, regex::Regex won’t work because as mentioned before inline paths are relative to the current module. However, you can try ::regex::Regex::new("").

Since we’ve imported regex::Regex in mod foo, that name is now accessible to everything inside the module directly, so the code can just say Regex::new().

The way you can view this is that use blah and extern crate blah create an item named blah “within the module”, which is basically something like a symbolic link, saying “yes this item named blah is actually elsewhere but we’ll pretend it’s within the module”

The error message from this code may further drive this home:

1
2
3
4
5
use foo::replace;

pub mod foo {
    use std::mem::replace;
}

(playpen)

The error I get is

1
2
3
4
5
error: function `replace` is private
 --> src/main.rs:3:5
  |
3 | use foo::replace;
  |     ^^^^^^^^^^^^

There’s no function named replace in the module foo! But the compiler seems to think there is?

That’s because use std::mem::replace basically is equivalent to there being something like:

1
2
3
4
5
6
7
8
9
10
11
12
pub mod foo {
    fn replace(...) -> ... {
        ...
    }

    // here we can refer to `replace` freely (in inline paths)
    fn whatever() {
        // ...
        let something = replace(blah);
        // ...
    }
}

except it’s actually like a symlink to the function defined in std::mem. Because inline paths are relative to the current module, saying use std::mem::replace works as if you had defined a function replace in the same module, and you can refer to replace() without needing any extra qualification in inline paths.

This also makes pub use fit perfectly in our model. pub use says “make this symlink, but let others see it too”:

1
2
3
4
5
6
// works now!
use foo::replace;

pub mod foo {
    pub use std::mem::replace;
}


Folks often get annoyed when this doesn’t work:

1
2
3
4
5
mod foo {
    use std::mem;
    // nope
    // use mem::replace;
}

As mentioned before, use paths are relative to the root module. There is no mem in the root module, so this won’t work. We can make it work via self, which I mentioned before:

1
2
3
4
5
mod foo {
    use std::mem;
    // yep!
    use self::mem::replace;
}

Note that this brings overloading of the self keyword up to a grand total of four! Two cases which occur in the import/path system:

  • use self::foo means “find me foo within the current module”
  • use foo::bar::{self, baz} is equivalent to use foo::bar; use foo::bar::baz;
  • fn foo(&self) lets you define methods and specify if the receiver is by-move, borrowed, mutably borrowed, or other
  • Self within implementations lets you refer to the type being implemented on

Oh well, at least it’s not static.




Going back to one of the examples I gave at the beginning:

1
2
3
4
5
6
7
8
9
10
use std::mem;


pub mod foo {
    use mem::transmute;

    pub mod bar {
        use foo::transmute;
    }
}

(playpen)

It should be clearer now why this works. The root module imports mem. Now, from everyone’s point of view, there’s an item called mem in the root.

Within mod foo, use mem::transmute works because use is relative to the root, and mem already exists in the root! When you use something, all child modules will see it as if it were actually belonging to the module. (Non-child modules won’t see it because of privacy, we saw an example of this already)

This is why use foo::transmute works from mod bar, too. bar can refer to the contents of foo via use foo::whatever, since foo is a child of the root module, and use is relative to the root. foo already has an item named transmute inside it because it imported one. Nothing in the parent module is private from the child, so we can use foo::transmute from bar.

Generally, the standard way of doing things is to either not use modules (just a single lib.rs), or, if you do use modules, put nothing other than extern crates and mods in the root. This is why we rarely see shenanigans like the above; there’s nothing in the root crate to import, aside from other crates specified by extern crate. The trick of “reimport something from the parent module” is also pretty rare because there’s basically no point to using that (just import it directly!). So this is not the kind of code you’ll see in the wild.



Basically, the way the import system works can be summed up as:

  • extern crate and use will act as if they were defining the imported item in the current module, like a symbolic link
  • use foo::bar::baz resolves the path relative to the root module
  • foo::bar::baz in an inline path (i.e. not in a use) will resolve relative to the current module
  • ::foo::bar::baz will always resolve relative to the root module
  • self::foo::bar::baz will always resolve relative to the current module
  • super::foo::bar::baz will always resolve relative to the parent module

Alright, on to the other half of this. Privacy.

Privacy

So how does privacy work?

Privacy, too, follows some basic rules:

  • If you can access a module, you can access all of its pub contents
  • A module can always access its child modules, but not recursively
    • This means that a module cannot access private items in its children, nor can it access private grandchildren modules
  • A child can always access its parent modules (and their parents), and all their contents
  • pub(restricted) is a proposal which extends this a bit, but it’s experimental so we won’t deal with it here

Giving some examples,

1
2
3
4
5
6
7
8
9
10
mod foo {
    mod bar {
        // can access `foo::foofunc`, even though `foofunc` is private

        pub fn barfunc() {}

    }
    // can access `foo::bar::barfunc()`, even though `bar` is private
    fn foofunc() {}
}
1
2
3
4
5
6
7
8
9
10
11
12
mod foo {
    mod bar {
        // We can access our parent and _all_ its contents,
        // so we have access to `foo::baz`. We can access
        // all pub contents of modules we have access to, so we
        // can access `foo::baz::bazfunc`
        use foo::baz::bazfunc;
    }
    mod baz {
        pub fn bazfunc() {}
    }
}

It’s important to note that this is all contextual; whether or not a particular path works is a function of where you are. For example, this works4:

1
2
3
4
5
6
7
8
9
10
pub mod foo {
    /* not pub */ mod bar {
        pub mod baz {
            pub fn bazfunc() {}
        }
        pub mod quux {
            use foo::bar::baz::bazfunc;
        }
    }
}

We are able to write the path foo::bar::baz::bazfunc even though bar is private!

This is because we still have access to the module bar, by being a descendent module.



Hopefully this is helpful to some of you. I’m not really sure how this can fit into the official docs, but if you have ideas, feel free to adapt it5!


  1. This is because most of these misunderstandings lead to a model where you think fewer things compile, which is fine as long as it isn’t too restrictive. Having a mental model where you feel more things will compile than actually do is what leads to frustration; the opposite can just be restrictive.

  2. One example closer to home is how Rust does lifetime resolution. Lifetimes form a lattice with 'static being the bottom element. There is no top element for lifetimes in Rust syntax, but internally there is the “empty lifetime” which is used during borrow checking. If something resolves to have an empty lifetime, it can’t exist, so we get a lifetime error.

  3. When I say “within your code”, I mean “anywhere but a use statement”. I may also term these as “inline paths”.

  4. Example adapted from this discussion

  5. Contact me if you have licensing issues; I still have to figure out the licensing situation for the blog, but am more than happy to grant exceptions for content being uplifted into official or semi-official docs.

Two Interpretations Diverged in a Yellow Wood

Whose words are these I think I know
His house is in the village though
He will not see me stopping here
To interpret his work as I go

My little student must think it queer
To read without some context near
Between the words and the intent
He wonders what the poem meant

He gives his head a little shake
To ask if there is some mistake
“That’s not what the author said!”
Providing another view instead

The words are lovely, dark, and deep
But I have literary criticism to preach
And miles to go before I sleep
And miles to go before I sleep







Seriously though, try reading The Road Not Taken as metacircular commentary on how the poem is very often “mis”interpreted, and the nature of interpretation / Death of the Author. It fits perfectly when you read “road” as “interpretation”.

(Yes, I know, the parody above is not based on The Road Not Taken but instead a different Frost poem. I was originally going to modify The Road Not Taken but realized all I had to do was change a few words to get there, which was no fun at all)