Down a Rusty rabbit hole

Last week I fell down a rather interesting rabbit hole in Rust, which was basically me discovering a series of quirks of the Rust compiler/language, each one leading to the next when I asked “why?”.

It started when someone asked why autogenerated Debug impls use argument names like __arg_0 which start with a double underscore.

This happened to be my fault. The reason we used a double underscore was that while a single underscore tells rustc not to warn about a possibly-unused variable, there’s an off- by-default clippy lint that warns about variables that start with a single underscore that are used, which can be silenced with a double underscore. Now, the correct fix here is to make the lint ignore derive/macros (which I believe we did as well), but at the time we needed to add an underscore anyway so a double underscore didn’t seem worse.

Except of course, this double underscore appears in the docs. Oops.

Ideally the rustc derive infrastructure would have a way of specifying the argument name to use so that we can at least have descriptive things here, but that’s a bit more work (I’m willing to mentor this work though!). So I thought I’d fix this by at least removing the double underscore, and making the unused lint ignore #[derive()] output.

While going through the code to look for underscores I also discovered a hygiene issue. The following code throws a bunch of very weird type errors:

pub const __cmp: u8 = 1;

#[derive(PartialOrd, PartialEq)]
pub enum Foo {
    A(u8), B(u8)
}

(playpen)

error[E0308]: mismatched types
 --> src/main.rs:6:7
  |
6 |     A(u8), B(u8)
  |       ^^^ expected enum `std::option::Option`, found u8
  |
  = note: expected type `std::option::Option<std::cmp::Ordering>`
             found type `u8`
.....

This is because the generated code for PartialOrd contains the following:

match foo.cmp(bar) {
    Some(Ordering::Equal) => .....,
    __cmp => __cmp,
}

__cmp can both be a binding to a wildcard pattern match as well as a match against a constant named __cmp, and in the presence of such a constant it resolves to the constant, causing type errors.

One way to fix this is to bind foo.cmp(bar) to some temporary variable x and use that directly in a _ => x branch.

I thought I could be clever and try cmp @ _ => cmp instead. match supports syntax where you can do foo @ <pattern>, where foo is bound to the entire matched variable. The cmp here is unambiguously a binding; it cannot be a pattern. So no conflicting with the const, problem solved!

So I made a PR for both removing the underscores and also fixing this. The change for __cmp is no longer in that PR, but you can find it here.

Except I hit a problem. With that PR, the following still breaks:

pub const cmp: u8 = 1;

#[derive(PartialOrd, PartialEq)]
pub enum Foo {
    A(u8), B(u8)
}

throwing a slightly cryptic error:

error[E0530]: match bindings cannot shadow constants
 --> test.rs:9:7
  |
4 | pub const cmp: u8 = 1;
  | ---------------------- a constant `cmp` is defined here
...
9 |     B(u8)
  |       ^^^ cannot be named the same as a constant

You can see a reduced version of this error in the following code:

pub const cmp : u8 = 1;

fn main() {
    match 1 {
        cmp @ _ => ()
    }
}

(playpen)

Huh. Wat. Why? cmp @ _ seems to be pretty unambiguous, what’s wrong with it shadowing a constant?

Turns out bindings cannot shadow constants at all, for a rather subtle reason:

const A: u8 = ...; // A_const
let A @ _ = ...; // A_let
match .. {
    A => ...; // A_match
}

What happens here is that constants and variables occupy the same namespace. So A_let shadows A_const here, and when we attempt to match, A_match is resolved to A_let and rejected (since you can’t match against a variable), and A_match falls back to resolving as a fresh binding pattern, instead of resolving to a pattern that matches against A_const.

This is kinda weird, so we disallow shadowing constants with variables. This is rarely a problem because variables are lowercase and constants are uppercase. We could technically allow this language-wise, but it’s hard on the implementation (and irrelevant in practice) so we don’t.

So I dropped that fix. The temporary local variable approach is broken as well since you can also name a constant the same as the local variable and have a clash (so again, you need the underscores to avoid surprises).

But then I realized that we had an issue with removing the underscores from __arg_0 as well.

The following code is also broken:

pub const __arg_0: u8 = 1;

#[derive(Debug)]
struct Foo(u8);

(playpen)

error[E0308]: mismatched types
 --> src/main.rs:3:10
  |
3 | #[derive(Debug)]
  |          ^^^^^ expected mutable reference, found u8
  |
  = note: expected type `&mut std::fmt::Formatter<'_>`
             found type `u8`

You can see a reduced version of this error in the following code:

pub const __arg_0: u8 = 1;

fn foo(__arg_0: bool) {}

error[E0308]: mismatched types
 --> src/main.rs:3:8
  |
3 | fn foo(__arg_0: bool) {}
  |        ^^^^^^^ expected bool, found u8

(playpen)

This breakage is not an issue with the current code because of the double underscores – there’s a very low chance someone will create a constant that is both lowercase and starts with a double underscore. But it’s a problem when I remove the underscores since that chance shoots up.

Anyway, this failure is even weirder. Why are we attempting to match against the constant in the first place? fn argument patterns¹ are irrefutable, i.e. all possible values of the type should match the argument. For example, fn foo(Some(foo): Option<u8>) {} will fail to compile with “refutable pattern in function argument: None not covered”.

There’s no point trying to match against constants here; because even if we find a constant it will be rejected later. Instead, we can unambiguously resolve identifiers as new bindings, yes?

Right?

Firm in my belief, I filed an issue.

I was wrong, it’s not going to always be rejected later. With zero-sized types this can totally still work:

struct S;

const C: S = S;

fn main() {
    let C = S;
}

Here because S has only one state, matching against a constant of the type is still irrefutable.

I argued that this doesn’t matter – since the type has a single value, it doesn’t matter whether we resolved to a new binding or the constant; the value and semantics are the same.

This is true.

Except.

Except for when destructors come in.

It was at this point that my table found itself in the perplexing state of being upside-down.

This is still really fine, zero-sized-constants-with-destructors is a pretty rare thing in Rust and I don’t really see folks relying on this behavior.

However I later realized that this entire detour was pointless because even if we fix this, we end up with a way for bindings to shadow constants. Which … which we already realized isn’t allowed by the compiler till we fix some bugs.

Damn.

The actual fix to the macro stuff is to use hygenic generated variable names, which the current infrastructure supports. I plan to make a PR for this eventually.

But it was a very interesting dive into the nuances of pattern matching in Rust.

Yes, function arguments in Rust are patterns. You can totally do things like (a, b): (u8, u8) in function arguments (like you can do in let) ↩

In Pursuit of Laziness

Manish Goregaokar's blog

Down a Rusty Rabbit Hole