<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Category: Mozilla | In Pursuit of Laziness]]></title>
  <link href="http://manishearth.github.io/blog/categories/mozilla/atom.xml" rel="self"/>
  <link href="http://manishearth.github.io/"/>
  <updated>2024-08-21T01:01:09+00:00</updated>
  <id>http://manishearth.github.io/</id>
  <author>
    <name><![CDATA[Manish Goregaokar]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[So Zero It's ... Negative? (Zero-Copy #3)]]></title>
    <link href="http://manishearth.github.io/blog/2022/08/03/zero-copy-3-so-zero-its-dot-dot-dot-negative/"/>
    <updated>2022-08-03T00:00:00+00:00</updated>
    <id>http://manishearth.github.io/blog/2022/08/03/zero-copy-3-so-zero-its-dot-dot-dot-negative</id>
    <content type="html"><![CDATA[<p><em>This is part 3 of a three-part series on interesting abstractions for zero-copy deserialization I’ve been working on over the last year. This part is about eliminating the deserialization step entirely. Part 1 is about making it more pleasant to work with and can be found <a href="http://manishearth.github.io/blog/2022/08/03/zero-copy-1-not-a-yoking-matter/">here</a>; while Part 2 is about making it work for more types and can be found <a href="http://manishearth.github.io/blog/2022/08/03/zero-copy-2-zero-copy-all-the-things/">here</a>.  The posts can be read in any order, though only the first post contains an explanation of what zero-copy deserialization</em> is.</p>

<blockquote>
  <p>And when Alexander saw the breadth of his work, he wept. For there were no more copies left to zero.</p>

  <p>—Hans Gruber, after designing three increasingly unhinged zero-copy crates</p>
</blockquote>

<p><a href="http://manishearth.github.io/blog/2022/08/03/zero-copy-1-not-a-yoking-matter/">Part 1</a> of this series attempted to answer the question “how can we make zero-copy deserialization <em>pleasant</em>”, while <a href="http://manishearth.github.io/blog/2022/08/03/zero-copy-2-zero-copy-all-the-things/">part 2</a> answered “how do we make zero-copy deserialization <em>more useful</em>?”.</p>

<p>This part goes one step further and asks “what if we could avoid deserialization altogether?”.</p>

<div class="discussion discussion-example">
            <img class="bobblehead" width="60px" height="60px" title="Confused pion" alt="Speech bubble for character Confused pion" src="http://manishearth.github.io/images/pion-nought.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             Wait, what?
            </div>
        </div>

<p>Bear with me.</p>

<p>As mentioned in the previous posts, internationalization libraries like <a href="https://github.com/unicode-org/icu4x">ICU4X</a> need to be able to load and manage a lot of internationalization data. ICU4X in particular wants this part of the process to be as flexible and efficient as possible. The focus on efficiency is why we use zero-copy deserialization for basically everything, whereas the focus on flexibility has led to a robust and pluggable data loading infrastructure that allows you to mix and match data sources.</p>

<p>Deserialization is a <em>great</em> way to load data since it’s in and of itself quite flexible! You can put your data in a neat little package and load it off the filesystem! Or send it over the network! It’s even better when you have efficient techniques like zero-copy deserialization because the cost is low.</p>

<p>But the thing is, there is still a cost. Even with zero-copy deserialization, you have to <em>validate</em> the data you receive. It’s often a cost folks are happy to pay, but that’s not always the case.</p>

<p>For example, you might be, say, <a href="https://www.mozilla.org/en-US/firefox/">a web browser interested in using ICU4X</a>, and you <em>really</em> care about startup times. Browsers typically need to set up a lot of stuff when being started up (and when opening a new tab!), and every millisecond counts when it comes to giving the user a smooth experience. Browsers also typically ship with most of the internationalization data they need already. Spending precious time deserializing data that you shipped with is suboptimal.</p>

<p>What would be ideal would be something that works like this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">DATA</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Data</span> <span class="o">=</span> <span class="o">&amp;</span><span class="nn">serde_json</span><span class="p">::</span><span class="nd">deserialize!</span><span class="p">(</span><span class="nd">include_bytes!</span><span class="p">(</span><span class="s">"./testdata.json"</span><span class="p">));</span>
</code></pre></div></div>

<p>where you can have stuff get deserialized at compile time and loaded into a static. Unfortunately, Rust <code class="language-plaintext highlighter-rouge">const</code> support is not at the stage where the above code is possible whilst working within serde’s generic framework, though it might be in a year or so.</p>

<p>You <em>could</em> write a very unsafe version of <code class="language-plaintext highlighter-rouge">serde::Deserialize</code> that operates on fully trusted data and uses some data format that is easy to zero-copy deserialize whilst avoiding any kind of validation. However, this would still have some cost: you still have to scan the data to reconstruct the full deserialized output. More importantly, it would require a parallel universe of unsafe serde-like traits that everyone has to derive or implement, where even small bugs in manual implementations would likely cause memory corruption.</p>

<div class="discussion discussion-note">
            <img class="bobblehead" width="60px" height="60px" title="Positive pion" alt="Speech bubble for character Positive pion" src="http://manishearth.github.io/images/pion-plus.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             Sounds like you need some format that needs no validation or scanning to zero-copy deserialize, and can be produced safely. But that doesn’t exist, does it?
            </div>
        </div>

<p>It does.</p>

<p>… but you’re not going to like where I’m going with this.</p>

<div class="discussion discussion-note">
            <img class="bobblehead" width="60px" height="60px" title="Positive pion" alt="Speech bubble for character Positive pion" src="http://manishearth.github.io/images/pion-plus.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             Oh no.
            </div>
        </div>

<p>There is such a format: <em>Rust code</em>. Specifically, Rust code in <code class="language-plaintext highlighter-rouge">static</code>s. When compiled, Rust <code class="language-plaintext highlighter-rouge">static</code>s are basically “free” to load, beyond the typical costs involved in paging in memory. The Rust compiler trusts itself to be good at codegen, so it doesn’t need validation when loading a compiled <code class="language-plaintext highlighter-rouge">static</code> from memory. There is the possibility of codegen bugs, however we have to trust the compiler about that for the rest of our program anyway!</p>

<p>This is even more “zero” than “zero-copy deserialization”! Regular “zero copy deserialization” still involves a scanning and potentially a validation step, it’s really more about “zero allocations” than actually avoiding <em>all</em> of the copies. On the other hand, there’s truly no copies or anything going on when you load Rust statics; it’s already ready to go as a <code class="language-plaintext highlighter-rouge">&amp;'static</code> reference!</p>

<p>We just have to figure out a way to “serialize to <code class="language-plaintext highlighter-rouge">const</code> Rust code” such that the resultant Rust code could just be compiled in to the binary, and people who need to load trusted data into ICU4X can load it for free!</p>

<div class="discussion discussion-example">
            <img class="bobblehead" width="60px" height="60px" title="Confused pion" alt="Speech bubble for character Confused pion" src="http://manishearth.github.io/images/pion-nought.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             What does “<code class="language-plaintext highlighter-rouge">const</code> code” mean in this context?
            </div>
        </div>

<p>In Rust, <code class="language-plaintext highlighter-rouge">const</code> code essentially is code that can be proven to be side-effect-free, and it’s the only kind of code allowed in <code class="language-plaintext highlighter-rouge">static</code>s, <code class="language-plaintext highlighter-rouge">const</code>s, and <code class="language-plaintext highlighter-rouge">const fn</code>s.</p>

<div class="discussion discussion-example">
            <img class="bobblehead" width="60px" height="60px" title="Confused pion" alt="Speech bubble for character Confused pion" src="http://manishearth.github.io/images/pion-nought.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             I see! Does this code actually have to be “constant”?
            </div>
        </div>

<p>Not quite! Rust supports mutation and even things like for loops in <code class="language-plaintext highlighter-rouge">const</code> code! Ultimately, it has to be the kind of code that <em>can</em> be computed at compile time with no difference of behavior: so no reading from files or the network, or using random numbers.</p>

<p>For a long time only very simple code was allowed in <code class="language-plaintext highlighter-rouge">const</code>, but over the last year the scope of what that environment can do has expanded greatly, and it’s actually possible to do complicated things here, which is precisely what enables us to actually do “serialize to Rust code” in a reasonable way.</p>

<h2 id="databake"><code class="language-plaintext highlighter-rouge">databake</code></h2>

<p><em>A lot of the design here can also be found in the <a href="https://docs.google.com/document/d/192l7yr6hVnG11Dr8a7mDLonIb6c8rr6zq-iswrZtlXE/edit">design doc</a>. While I did the bulk of the design for this crate, it was almost completely implemented by <a href="https://github.com/robertbastian">Robert</a>, who also worked on integrating it into ICU4X, and cleaned up the design in the process.</em></p>

<p>Enter <a href="https://docs.rs/databake"><code class="language-plaintext highlighter-rouge">databake</code></a> (née <code class="language-plaintext highlighter-rouge">crabbake</code>). <code class="language-plaintext highlighter-rouge">databake</code> is a crate that provides just this; the ability to serialize your types to <code class="language-plaintext highlighter-rouge">const</code> code that can then be used in <code class="language-plaintext highlighter-rouge">static</code>s allowing for truly zero-cost data loading, no deserialization necessary!</p>

<p>The core entry point to <code class="language-plaintext highlighter-rouge">databake</code> is the <code class="language-plaintext highlighter-rouge">Bake</code> trait:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">trait</span> <span class="n">Bake</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">bake</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">ctx</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">CrateEnv</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">TokenStream</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>A <code class="language-plaintext highlighter-rouge">TokenStream</code> is the type typically used in Rust <a href="https://doc.rust-lang.org/reference/procedural-macros.html">procedural macros</a> to represent a snippet of Rust code. The <code class="language-plaintext highlighter-rouge">Bake</code> trait allows you to take an instance of a type, and convert it to Rust code that represents the same value.</p>

<p>The <code class="language-plaintext highlighter-rouge">CrateEnv</code> object is used to track which crates are needed, so that it is possible for tools generating this code to let the user know which direct dependencies are needed.</p>

<p>This trait is augmented by a <a href="https://docs.rs/databake/0.1.1/databakee/derive.Bake.html"><code class="language-plaintext highlighter-rouge">#[derive(Bake)]</code></a> custom derive that can be used to apply it to most types automatically:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// inside crate `bar`, module `module.rs`</span>

<span class="k">use</span> <span class="nn">databake</span><span class="p">::</span><span class="n">Bake</span><span class="p">;</span>

<span class="nd">#[derive(Bake)]</span>
<span class="nd">#[databake(path</span> <span class="nd">=</span> <span class="nd">bar::module)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">Person</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span> <span class="p">{</span>
   <span class="k">pub</span> <span class="n">name</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="nb">str</span><span class="p">,</span>
   <span class="k">pub</span> <span class="n">age</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>As with most custom derives, this only works on structs and enums that contain other types that already implement <code class="language-plaintext highlighter-rouge">Bake</code>. Most types not involving mandatory allocation should be able to.</p>

<h2 id="how-to-use-it">How to use it</h2>

<p><code class="language-plaintext highlighter-rouge">databake</code> itself doesn’t really prescribe any particular code generation strategy. It can be used in a proc macro or in a <code class="language-plaintext highlighter-rouge">build.rs</code>, or, even in a separate binary. ICU4X does the latter, since that’s just what ICU4X’s model for data generation is: clients can use the binary to customize the format and contents of the data they need.</p>

<p>So a typical way of using this crate might be to do something like this in <code class="language-plaintext highlighter-rouge">build.rs</code>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">some_dep</span><span class="p">::</span><span class="n">Data</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">databake</span><span class="p">::</span><span class="n">Bake</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">quote</span><span class="p">::</span><span class="n">quote</span><span class="p">;</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
   <span class="c1">// load data from file</span>
   <span class="k">let</span> <span class="n">json_data</span> <span class="o">=</span> <span class="nd">include_str!</span><span class="p">(</span><span class="s">"data.json"</span><span class="p">);</span>

   <span class="c1">// deserialize from json</span>
   <span class="k">let</span> <span class="n">my_data</span><span class="p">:</span> <span class="n">Data</span> <span class="o">=</span> <span class="nn">serde_json</span><span class="p">::</span><span class="nf">from_str</span><span class="p">(</span><span class="n">json_data</span><span class="p">);</span>

   <span class="c1">// get a token tree out of it</span>
   <span class="k">let</span> <span class="n">baked</span> <span class="o">=</span> <span class="n">my_data</span><span class="nf">.bake</span><span class="p">();</span>


   <span class="c1">// Construct rust code with this in a static</span>
   <span class="c1">// The quote macro is used by procedural macros to do easy codegen,</span>
   <span class="c1">// but it's useful in build scripts as well.</span>
   <span class="k">let</span> <span class="n">my_data_rs</span> <span class="o">=</span> <span class="nd">quote!</span> <span class="p">{</span>
      <span class="k">use</span> <span class="nn">some_dep</span><span class="p">::</span><span class="n">Data</span><span class="p">;</span>
      <span class="k">static</span> <span class="n">MY_DATA</span><span class="p">:</span> <span class="n">Data</span> <span class="o">=</span> #<span class="n">baked</span><span class="p">;</span>
   <span class="p">}</span>

   <span class="c1">// Write to file</span>
   <span class="k">let</span> <span class="n">out_dir</span> <span class="o">=</span> <span class="nn">env</span><span class="p">::</span><span class="nf">var_os</span><span class="p">(</span><span class="s">"OUT_DIR"</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>
   <span class="k">let</span> <span class="n">dest_path</span> <span class="o">=</span> <span class="nn">Path</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="o">&amp;</span><span class="n">out_dir</span><span class="p">)</span><span class="nf">.join</span><span class="p">(</span><span class="s">"data.rs"</span><span class="p">);</span>
   <span class="nn">fs</span><span class="p">::</span><span class="nf">write</span><span class="p">(</span>
      <span class="o">&amp;</span><span class="n">dest_path</span><span class="p">,</span>
      <span class="o">&amp;</span><span class="n">my_data_rs</span><span class="nf">.to_string</span><span class="p">()</span>
   <span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>

   <span class="c1">// (Optional step omitted: run rustfmt on the file)</span>

   <span class="c1">// tell Cargo that we depend on this file</span>
   <span class="nd">println!</span><span class="p">(</span><span class="s">"cargo:rerun-if-changed=src/data.json"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="what-it-looks-like">What it looks like</h2>

<p>ICU4X generates all of its test data into JSON, <a href="https://docs.rs/postcard"><code class="language-plaintext highlighter-rouge">postcard</code></a>, and “baked” formats. For example, for <a href="https://github.com/unicode-org/icu4x/blob/7b52dbfe57043da5459c12627671a779d467dc0f/provider/testdata/data/json/decimal/symbols%401/ar-EG.json">this JSON data representing how a particular locale does numbers</a>, the “baked” data looks like <a href="https://github.com/unicode-org/icu4x/blob/7b52dbfe57043da5459c12627671a779d467dc0f/provider/testdata/data/baked/decimal/symbols_v1.rs#L24-L41">this</a>. That’s a rather simple data type, but we do use this for more complex data like <a href="https://raw.githubusercontent.com/unicode-org/icu4x/7b52dbfe57043da5459c12627671a779d467dc0f/provider/testdata/data/baked/datetime/datesymbols_v1.rs">date time symbol data</a>, which is unfortunately too big for GitHub to render normally.</p>

<p>ICU4X’s code for generating this is in <a href="https://github.com/unicode-org/icu4x/blob/3f4d841ef0b168031d837433d075308bbebf34b7/provider/datagen/src/databake.rs">this file</a>. It’s complicated primarily because ICU4X’s data generation pipeline is super configurable and complicated, The core thing that it does is, for each piece of data, it <a href="https://github.com/unicode-org/icu4x/blob/3f4d841ef0b168031d837433d075308bbebf34b7/provider/datagen/src/databake.rs#L118">calls <code class="language-plaintext highlighter-rouge">tokenize()</code></a>, which is a thin wrapper around <a href="https://github.com/unicode-org/icu4x/blob/882e23403327620e4aafde28a9a407bcc6245a54/provider/core/src/datagen/payload.rs#L131-L136">calling <code class="language-plaintext highlighter-rouge">.bake()</code> on the data and some other stuff</a>. It then takes all of the data and organizes it into files like those linked above, populated with a static for each piece of data. In our case, we include all this generated rust code into our “testdata” crate as a module, but there are many possibilities here!</p>

<p>For our “test” data, which is currently 2.7 MB in the <a href="https://docs.rs/postcard"><code class="language-plaintext highlighter-rouge">postcard</code></a> format (which is optimized for being lightweight), the same data ends up being 11 MB of JSON, and 18 MB of generated Rust code! That’s … a lot of Rust code, and tools like rust-analyzer struggle to load it. It’s of course much smaller once compiled into the binary, though that’s much harder to measure, because Rust is quite aggressive at optimizing unused data out in the baked version (where it has ample opportunity to). From various unscientific tests, it seems like 2MB of deduplicated postcard data corresponds to roughly 500KB of deduplicated baked data. This makes sense, since one can expect baked data to be near the theoretical limit of how small the data is without applying some heavy compression. Furthermore, while we deduplicate baked data at a per-locale level, it can take advantage of LLVM’s ability to deduplicate statics further, so if, for example, two different locales have <em>mostly</em> the same data for a given data key<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> with some differences, LLVM may be able to use the same statics for sub-data.</p>

<h2 id="limitations">Limitations</h2>

<p><code class="language-plaintext highlighter-rouge">const</code> support in Rust still has a ways to go. For example, it doesn’t yet support creating objects like <code class="language-plaintext highlighter-rouge">String</code>s which are usually on the heap, though <a href="https://github.com/rust-lang/const-eval/issues/20">they are working on allowing this</a>. This isn’t a huge problem for us; all of our data already supports zero-copy deserialization, which means that for every instance of our data types, there is <em>some way</em> to represent it as a borrow from another <code class="language-plaintext highlighter-rouge">static</code>.</p>

<p>A more pesky limitation is that you can’t interact with traits in <code class="language-plaintext highlighter-rouge">const</code> environments. To some extent, were that possible, the purpose of this crate could also have been fulfilled by making the <code class="language-plaintext highlighter-rouge">serde</code> pipeline <code class="language-plaintext highlighter-rouge">const</code>-friendly<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, and then the code snippet from the beginning of this post would work:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">DATA</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Data</span> <span class="o">=</span> <span class="o">&amp;</span><span class="nn">serde_json</span><span class="p">::</span><span class="nd">deserialize!</span><span class="p">(</span><span class="nd">include_bytes!</span><span class="p">(</span><span class="s">"./testdata.json"</span><span class="p">));</span>
</code></pre></div></div>

<p>This means that for things like <code class="language-plaintext highlighter-rouge">ZeroVec</code> (see <a href="http://manishearth.github.io/blog/2022/08/03/zero-copy-2-zero-copy-all-the-things/">part 2</a>), we can’t actually just make their safe constructors <code class="language-plaintext highlighter-rouge">const</code> and pass in data to be validated — the validation code is all behind traits — so we have to unsafely construct them. This is somewhat unfortunate, however ultimately if the <code class="language-plaintext highlighter-rouge">zerovec</code> byte representation had trouble roundtripping we would have larger problems, so it’s not an introduction of a new surface of unsafety. We’re still able to validate things when <em>generating</em> the baked data, we just can’t get the compiler to also re-validate before agreeing to compile the <code class="language-plaintext highlighter-rouge">const</code> code.</p>

<h2 id="try-it-out">Try it out!</h2>

<p><a href="https://docs.rs/databake"><code class="language-plaintext highlighter-rouge">databake</code></a> is much less mature compared to <a href="https://docs.rs/yoke"><code class="language-plaintext highlighter-rouge">yoke</code></a> and <a href="https://docs.rs/zerovec"><code class="language-plaintext highlighter-rouge">zerovec</code></a>, but it does seem to work rather well so far. Try it out! Let me know what you think!</p>

<p><em>Thanks to <a href="https://twitter.com/plaidfinch">Finch</a>, <a href="https://twitter.com/yaahc_">Jane</a>, <a href="https://github.com/sffc">Shane</a>, and <a href="https://github.com/robertbastian">Robert</a> for reviewing drafts of this post</em></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>In ICU4X, a “data key” can be used to talk about a specific type of data, for example the decimal symbols data has a <code class="language-plaintext highlighter-rouge">decimal/symbols@1</code> data key. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>Mind you, this would not be an easy task, but it would likely integrate with the ecosystem really well. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Zero-Copy All the Things! (Zero-Copy #2)]]></title>
    <link href="http://manishearth.github.io/blog/2022/08/03/zero-copy-2-zero-copy-all-the-things/"/>
    <updated>2022-08-03T00:00:00+00:00</updated>
    <id>http://manishearth.github.io/blog/2022/08/03/zero-copy-2-zero-copy-all-the-things</id>
    <content type="html"><![CDATA[<p><em>This is part 2 of a three-part series on interesting abstractions for zero-copy deserialization I’ve been working on over the last year. This part is about making zero-copy deserialization work for more types. Part 1 is about making it more pleasant to work with and can be found <a href="http://manishearth.github.io/blog/2022/08/03/zero-copy-1-not-a-yoking-matter/">here</a>; while Part 3 is about eliminating the deserialization step entirely and can be found <a href="http://manishearth.github.io/blog/2022/08/03/zero-copy-3-so-zero-its-dot-dot-dot-negative/">here</a>. The posts can be read in any order, though only the first post contains an explanation of what zero-copy deserialization</em> is.</p>

<h2 id="background">Background</h2>

<p><em>This section is the same as in the last article and can be skipped if you’ve read it</em></p>

<p>For the past year and a half I’ve been working full time on <a href="https://github.com/unicode-org/icu4x">ICU4X</a>, a new internationalization library in Rust being built under the Unicode Consortium as a collaboration between various companies.</p>

<p>There’s a lot I can say about ICU4X, but to focus on one core value proposition: we want it to be <em>modular</em> both in data and code. We want ICU4X to be usable on embedded platforms, where memory is at a premium. We want applications constrained by download size to be able to support all languages rather than pick a couple popular ones because they cannot afford to bundle in all that data. As a part of this, we want loading data to be <em>fast</em> and pluggable. Users should be able to design their own data loading strategies for their individual use cases.</p>

<p>See, a key part of performing correct internationalization is the <em>data</em>. Different locales<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> do things differently, and all of the information on this needs to go somewhere, preferably not code. You need data on how a particular locale formats dates<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, or how plurals work in a particular language, or how to accurately segment languages like Thai which are typically not written with spaces so that you can insert linebreaks in appropriate positions.</p>

<p>Given the focus on data, a <em>very</em> attractive option for us is zero-copy deserialization. In the process of trying to do zero-copy deserialization well, we’ve built some cool new libraries, this article is about one of them.</p>

<h2 id="what-can-you-zero-copy">What can you zero-copy?</h2>

<div class="discussion discussion-note">
            <img class="bobblehead" width="60px" height="60px" title="Positive pion" alt="Speech bubble for character Positive pion" src="http://manishearth.github.io/images/pion-plus.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             If you’re unfamiliar with zero-copy deserialization, check out the explanation in the <a href="http://manishearth.github.io/blog/2022/08/03/zero-copy-1-not-a-yoking-matter/">previous article</a>!
            </div>
        </div>

<p>In the <a href="http://manishearth.github.io/blog/2022/08/03/zero-copy-1-not-a-yoking-matter/">previous article</a> we explored how zero-copy deserialization could be made more pleasant to work with by erasing the lifetimes. In essence, we were expanding our capabilities on <em>what you can do with</em> zero-copy data.</p>

<p>This article is about expanding our capabilities on <em>what we can make</em> zero-copy data.</p>

<p>We previously saw this struct:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Serialize,</span> <span class="nd">Deserialize)]</span>
<span class="k">struct</span> <span class="n">Person</span> <span class="p">{</span>
    <span class="c1">// this field is nearly free to construct</span>
    <span class="n">age</span><span class="p">:</span> <span class="nb">u8</span><span class="p">,</span>
    <span class="c1">// constructing this will involve a small allocation and copy</span>
    <span class="n">name</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span>
    <span class="c1">// this may take a while</span>
    <span class="n">rust_files_written</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">String</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>and made the <code class="language-plaintext highlighter-rouge">name</code> field zero-copy by replacing it with a <code class="language-plaintext highlighter-rouge">Cow&lt;'a, str&gt;</code>. However, we weren’t able to do the same with the <code class="language-plaintext highlighter-rouge">rust_files_written</code> field because <a href="https://docs.rs/serde"><code class="language-plaintext highlighter-rouge">serde</code></a> does not handle zero-copy deserialization for things other than <code class="language-plaintext highlighter-rouge">[u8]</code> and <code class="language-plaintext highlighter-rouge">str</code>. Forget nested collections like <code class="language-plaintext highlighter-rouge">Vec&lt;String&gt;</code> (as <code class="language-plaintext highlighter-rouge">&amp;[&amp;str]</code>), even <code class="language-plaintext highlighter-rouge">Vec&lt;u32&gt;</code> (as <code class="language-plaintext highlighter-rouge">&amp;[u32]</code>) can’t be made zero-copy easily!</p>

<p>This is not a fundamental restriction in zero-copy deserialization, indeed, the excellent <a href="https://docs.rs/rkyv"><code class="language-plaintext highlighter-rouge">rkyv</code></a> library is able to support data like this. However, it’s not as slam-dunk easy as <code class="language-plaintext highlighter-rouge">str</code> and <code class="language-plaintext highlighter-rouge">[u8]</code> and it’s understandable that <a href="https://docs.rs/serde"><code class="language-plaintext highlighter-rouge">serde</code></a> wishes to not pick sides on any tradeoffs here and leave it up to the users.</p>

<p>So what’s the actual problem here?</p>

<h2 id="blefuscudian-bewilderment">Blefuscudian Bewilderment</h2>

<p>The short answer is: endianness, alignment, and for <code class="language-plaintext highlighter-rouge">Vec&lt;String&gt;</code>, indirection.</p>

<p>See, the way zero-copy deserialization works is by directly taking a pointer to the memory and declaring it to be the desired value. For this to work, that data <em>must</em> be of a kind that looks the same on all machines, and must be legal to take a reference to.</p>

<p>This is pretty straightforward for <code class="language-plaintext highlighter-rouge">[u8]</code> and <code class="language-plaintext highlighter-rouge">str</code>, their data is identical on every system. <code class="language-plaintext highlighter-rouge">str</code> does need a validation step to ensure it’s valid UTF-8, but the general thrust of zero-copy serialization is to replace expensive deserialization with cheaper validation, so we’re fine with that.</p>

<p>On the other hand, the borrowed version of <code class="language-plaintext highlighter-rouge">Vec&lt;String&gt;</code>, <code class="language-plaintext highlighter-rouge">&amp;[&amp;str]</code> is unlikely to look the same even across different executions of the program on the <em>same system</em>, because it contains pointers (indirection) that’ll change each time depending on the data source!</p>

<p>Pointers are hard. What about <code class="language-plaintext highlighter-rouge">Vec&lt;u32&gt;</code>/<code class="language-plaintext highlighter-rouge">[u32]</code>? Surely there’s nothing wrong with a pile of integers?</p>

<figure class="caption-wrapper center" style="width: 400px"><img class="caption" src="http://manishearth.github.io/images/post/castlevania-data.png" width="400" /><figcaption class="caption-text"><p><small>Dracula, dispensing wisdom on the subject of zero-copy deserialization.</small></p>
</figcaption></figure>

<p>This is where the endianness and alignment come in. Firstly, a <code class="language-plaintext highlighter-rouge">u32</code> doesn’t look exactly the same on all systems, some systems are “big endian”, where the integer <code class="language-plaintext highlighter-rouge">0x00ABCDEF</code> would be represented in memory as <code class="language-plaintext highlighter-rouge">[0x00, 0xAB, 0xCD, 0xEF]</code>, whereas others are “little endian” and would represent it <code class="language-plaintext highlighter-rouge">[0xEF, 0xCD, 0xAB, 0x00]</code>. Most systems these days are little-endian, but not all, so you may need to care about this.</p>

<p>This would mean that a <code class="language-plaintext highlighter-rouge">[u32]</code> serialized on a little endian system would come out completely garbled on a big-endian system if we’re naïvely zero-copy deserializing.</p>

<p>Secondly, a lot of systems impose <em>alignment</em> restrictions on types like <code class="language-plaintext highlighter-rouge">u32</code>. A <code class="language-plaintext highlighter-rouge">u32</code> cannot be found at any old memory address, on most modern systems it must be found at a memory address that’s a multiple of 4. Similarly, a <code class="language-plaintext highlighter-rouge">u64</code> must be at a memory address that’s a multiple of 8, and so on. The subsection of data being serialized, however, may be found at any address. It’s possible to design a serialization framework where a particular field in the data is forced to have a particular alignment (<a href="https://docs.rs/rkyv/latest/rkyv/util/struct.AlignedVec.html">rkyv has this</a>), however it’s kinda tricky and requires you to have control over the alignment of the original loaded data, which isn’t a part of serde’s model.</p>

<p>So how can we address this?</p>

<h2 id="zerovec-and-varzerovec">ZeroVec and VarZeroVec</h2>

<p><em>A lot of the design here can be found explained in the <a href="https://github.com/unicode-org/icu4x/blob/main/utils/zerovec/design_doc.md">design doc</a></em></p>

<p>After <a href="https://github.com/unicode-org/icu4x/issues/78#issuecomment-817090204">a bunch of discussions</a> with <a href="https://github.com/sffc">Shane</a>, we designed and wrote <a href="https://docs.rs/zerovec"><code class="language-plaintext highlighter-rouge">zerovec</code></a>, a crate that attempts to solve this problem, in a way that works with <a href="https://docs.rs/serde"><code class="language-plaintext highlighter-rouge">serde</code></a>.</p>

<p>The core abstractions of the crate are the two types, <a href="https://docs.rs/zerovec/latest/zerovec/enum.ZeroVec.html"><code class="language-plaintext highlighter-rouge">ZeroVec</code></a> and <a href="https://docs.rs/zerovec/latest/zerovec/enum.VarZeroVec.html"><code class="language-plaintext highlighter-rouge">VarZeroVec</code></a>, which are essentially zero-copy enabled versions of <code class="language-plaintext highlighter-rouge">Cow&lt;'a, [T]&gt;</code>, for fixed-size and variable-size <code class="language-plaintext highlighter-rouge">T</code> types.</p>

<p><a href="https://docs.rs/zerovec/latest/zerovec/enum.ZeroVec.html"><code class="language-plaintext highlighter-rouge">ZeroVec</code></a> can be used with any type implementing <a href="https://docs.rs/zerovec/latest/zerovec/ule/trait.ULE.html"><code class="language-plaintext highlighter-rouge">ULE</code></a> (more on what this means later), which is by default all of the integer types and can be extended to <em>most</em> <code class="language-plaintext highlighter-rouge">Copy</code> types. It’s rather similar to <code class="language-plaintext highlighter-rouge">&amp;[T]</code>, however instead of returning <em>references</em> to its elements, it copies them out. While <a href="https://docs.rs/zerovec/latest/zerovec/enum.ZeroVec.html"><code class="language-plaintext highlighter-rouge">ZeroVec</code></a> is a <code class="language-plaintext highlighter-rouge">Cow</code>-like borrowed-or-owned type<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>, there is a fully borrowed variant <a href="https://docs.rs/zerovec/latest/zerovec/struct.ZeroSlice.html"><code class="language-plaintext highlighter-rouge">ZeroSlice</code></a> that it derefs to.</p>

<p>Similarly, <a href="https://docs.rs/zerovec/latest/zerovec/enum.VarZeroVec.html"><code class="language-plaintext highlighter-rouge">VarZeroVec</code></a> may be used with types implementing <a href="https://docs.rs/zerovec/latest/zerovec/ule/trait.VarULE.html"><code class="language-plaintext highlighter-rouge">VarULE</code></a> (e.g. <code class="language-plaintext highlighter-rouge">str</code>). It <em>is</em> able to hand out references <code class="language-plaintext highlighter-rouge">VarZeroVec&lt;str&gt;</code> behaves very similarly to how <code class="language-plaintext highlighter-rouge">&amp;[str]</code> would work if such a type were allowed to exist in Rust. You can even nest them, making types like <code class="language-plaintext highlighter-rouge">VarZeroVec&lt;VarZeroSlice&lt;ZeroSlice&lt;u32&gt;&gt;&gt;</code>, the zero-copy equivalent of <code class="language-plaintext highlighter-rouge">Vec&lt;Vec&lt;Vec&lt;u32&gt;&gt;&gt;</code>.</p>

<p>There’s also a <a href="https://docs.rs/zerovec/latest/zerovec/enum.ZeroMap.html"><code class="language-plaintext highlighter-rouge">ZeroMap</code></a> type that provides a binary-search based map that works with types compatible with either <a href="https://docs.rs/zerovec/latest/zerovec/enum.ZeroVec.html"><code class="language-plaintext highlighter-rouge">ZeroVec</code></a> or <a href="https://docs.rs/zerovec/latest/zerovec/enum.VarZeroVec.html"><code class="language-plaintext highlighter-rouge">VarZeroVec</code></a>.</p>

<p>So, for example, to make the following struct zero-copy:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(serde::Serialize,</span> <span class="nd">serde::Deserialize)]</span>
<span class="k">struct</span> <span class="n">DataStruct</span> <span class="p">{</span>
    <span class="n">nums</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">u32</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">chars</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">char</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">strs</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">String</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>you can do something like this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(serde::Serialize,</span> <span class="nd">serde::Deserialize)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">DataStruct</span><span class="o">&lt;</span><span class="nv">'data</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="nd">#[serde(borrow)]</span>
    <span class="n">nums</span><span class="p">:</span> <span class="n">ZeroVec</span><span class="o">&lt;</span><span class="nv">'data</span><span class="p">,</span> <span class="nb">u32</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="nd">#[serde(borrow)]</span>
    <span class="n">chars</span><span class="p">:</span> <span class="n">ZeroVec</span><span class="o">&lt;</span><span class="nv">'data</span><span class="p">,</span> <span class="nb">char</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="nd">#[serde(borrow)]</span>
    <span class="n">strs</span><span class="p">:</span> <span class="n">VarZeroVec</span><span class="o">&lt;</span><span class="nv">'data</span><span class="p">,</span> <span class="nb">str</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Once deserialized, the data can be accessed with <code class="language-plaintext highlighter-rouge">data.nums.get(index)</code> or <code class="language-plaintext highlighter-rouge">data.strs[index]</code>, etc.</p>

<p>Custom types can also be supported within these types with some effort, if you’d like the following complex data to be zero-copy:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Copy,</span> <span class="nd">Clone,</span> <span class="nd">PartialEq,</span> <span class="nd">Eq,</span> <span class="nd">Ord,</span> <span class="nd">PartialOrd,</span> <span class="nd">serde::Serialize,</span> <span class="nd">serde::Deserialize)]</span>
<span class="k">struct</span> <span class="n">Date</span> <span class="p">{</span>
    <span class="n">y</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="n">m</span><span class="p">:</span> <span class="nb">u8</span><span class="p">,</span>
    <span class="n">d</span><span class="p">:</span> <span class="nb">u8</span>
<span class="p">}</span>

<span class="nd">#[derive(Clone,</span> <span class="nd">PartialEq,</span> <span class="nd">Eq,</span> <span class="nd">Ord,</span> <span class="nd">PartialOrd,</span> <span class="nd">serde::Serialize,</span> <span class="nd">serde::Deserialize)]</span>
<span class="k">struct</span> <span class="n">Person</span> <span class="p">{</span>
    <span class="n">birthday</span><span class="p">:</span> <span class="n">Date</span><span class="p">,</span>
    <span class="n">favorite_character</span><span class="p">:</span> <span class="nb">char</span><span class="p">,</span>
    <span class="n">name</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span>
<span class="p">}</span>

<span class="nd">#[derive(serde::Serialize,</span> <span class="nd">serde::Deserialize)]</span>
<span class="k">struct</span> <span class="n">Data</span> <span class="p">{</span>
    <span class="n">important_dates</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Date</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">important_people</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Person</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">birthdays_to_people</span><span class="p">:</span> <span class="n">HashMap</span><span class="o">&lt;</span><span class="n">Date</span><span class="p">,</span> <span class="n">Person</span><span class="o">&gt;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>you can do something like this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// custom fixed-size ULE type for ZeroVec</span>
<span class="nd">#[zerovec::make_ule(DateULE)]</span>
<span class="nd">#[derive(Copy,</span> <span class="nd">Clone,</span> <span class="nd">PartialEq,</span> <span class="nd">Eq,</span> <span class="nd">Ord,</span> <span class="nd">PartialOrd,</span> <span class="nd">serde::Serialize,</span> <span class="nd">serde::Deserialize)]</span>
<span class="k">struct</span> <span class="n">Date</span> <span class="p">{</span>
    <span class="n">y</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="n">m</span><span class="p">:</span> <span class="nb">u8</span><span class="p">,</span>
    <span class="n">d</span><span class="p">:</span> <span class="nb">u8</span>
<span class="p">}</span>

<span class="c1">// custom variable sized VarULE type for VarZeroVec</span>
<span class="nd">#[zerovec::make_varule(PersonULE)]</span>
<span class="nd">#[zerovec::derive(Serialize,</span> <span class="nd">Deserialize)]</span> <span class="c1">// add Serde impls to PersonULE</span>
<span class="nd">#[derive(Clone,</span> <span class="nd">PartialEq,</span> <span class="nd">Eq,</span> <span class="nd">Ord,</span> <span class="nd">PartialOrd,</span> <span class="nd">serde::Serialize,</span> <span class="nd">serde::Deserialize)]</span>
<span class="k">struct</span> <span class="n">Person</span><span class="o">&lt;</span><span class="nv">'data</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">birthday</span><span class="p">:</span> <span class="n">Date</span><span class="p">,</span>
    <span class="n">favorite_character</span><span class="p">:</span> <span class="nb">char</span><span class="p">,</span>
    <span class="nd">#[serde(borrow)]</span>
    <span class="n">name</span><span class="p">:</span> <span class="n">Cow</span><span class="o">&lt;</span><span class="nv">'data</span><span class="p">,</span> <span class="nb">str</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>

<span class="nd">#[derive(serde::Serialize,</span> <span class="nd">serde::Deserialize)]</span>
<span class="k">struct</span> <span class="n">Data</span><span class="o">&lt;</span><span class="nv">'data</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="nd">#[serde(borrow)]</span>
    <span class="n">important_dates</span><span class="p">:</span> <span class="n">ZeroVec</span><span class="o">&lt;</span><span class="nv">'data</span><span class="p">,</span> <span class="n">Date</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="c1">// note: VarZeroVec always must reference the unsized ULE type directly</span>
    <span class="nd">#[serde(borrow)]</span>
    <span class="n">important_people</span><span class="p">:</span> <span class="n">VarZeroVec</span><span class="o">&lt;</span><span class="nv">'data</span><span class="p">,</span> <span class="n">PersonULE</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="nd">#[serde(borrow)]</span>
    <span class="n">birthdays_to_people</span><span class="p">:</span> <span class="n">ZeroMap</span><span class="o">&lt;</span><span class="nv">'data</span><span class="p">,</span> <span class="n">Date</span><span class="p">,</span> <span class="n">PersonULE</span><span class="o">&gt;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Unfortunately the inner “ULE type” workings are not <em>completely</em> hidden from the user, especially for <code class="language-plaintext highlighter-rouge">VarZeroVec</code>-compatible types, but the crate does a fair number of things to attempt to make it pleasant to work with.</p>

<p>In general, <code class="language-plaintext highlighter-rouge">ZeroVec</code> should be used for types that are fixed-size and implement <code class="language-plaintext highlighter-rouge">Copy</code>, whereas <code class="language-plaintext highlighter-rouge">VarZeroVec</code> is to be used with types that logically contain a variable amount of data, like vectors, maps, strings, and aggregates of the same. <code class="language-plaintext highlighter-rouge">VarZeroVec</code> will always be used with a dynamically sized type, yielding references to that type.</p>

<p>I’ve noted before that these types are like <code class="language-plaintext highlighter-rouge">Cow&lt;'a, T&gt;</code>; they can be dealt with in a mutable-owned fashion, but it’s not the primary focus of the crate. In particular, <code class="language-plaintext highlighter-rouge">VarZeroVec&lt;T&gt;</code> will be significantly slower to mutate than something like <code class="language-plaintext highlighter-rouge">Vec&lt;String&gt;</code>, since all operations are done on the same buffer format. The general idea of this crate is that you probably will be <em>generating</em> your data in a situation without too many performance constraints, but you want the operation of <em>reading</em> the data to be fast. So, where necessary, the crate trades off mutation performance for deserialization/read performance. Still, it’s not terribly slow, just something to look out for and benchmark if necessary.</p>

<h2 id="how-it-works">How it works</h2>

<p>Most of the crate is built on the <a href="https://docs.rs/zerovec/latest/zerovec/ule/trait.ULE.html"><code class="language-plaintext highlighter-rouge">ULE</code></a> and <a href="https://docs.rs/zerovec/latest/zerovec/ule/trait.VarULE.html"><code class="language-plaintext highlighter-rouge">VarULE</code></a> traits. Both of these traits are <code class="language-plaintext highlighter-rouge">unsafe</code> traits (though as shown above most users need not manually implement them). “ULE” stands for “unaligned little-endian”, and marks types which have no alignment requirements and have the same representation across endiannesses, preferring to be identical to the little-endian representation where relevant<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>

<p>There’s also a safe <a href="https://docs.rs/zerovec/latest/zerovec/ule/trait.AsULE.html"><code class="language-plaintext highlighter-rouge">AsULE</code></a> trait that allows one to convert a type between itself and some corresponding <code class="language-plaintext highlighter-rouge">ULE</code> type.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">unsafe</span> <span class="k">trait</span> <span class="n">ULE</span><span class="p">:</span> <span class="nb">Sized</span> <span class="o">+</span> <span class="nb">Copy</span> <span class="o">+</span> <span class="k">'static</span> <span class="p">{</span>
    <span class="c1">// Validate that a byte slice is appropriate to treat as a reference to this type</span>
    <span class="k">fn</span> <span class="nf">validate_byte_slice</span><span class="p">(</span><span class="n">bytes</span><span class="p">:</span> <span class="o">&amp;</span><span class="p">[</span><span class="nb">u8</span><span class="p">])</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="n">ZeroVecError</span><span class="o">&gt;</span><span class="p">;</span>

    <span class="c1">// less relevant utility methods omitted</span>
<span class="p">}</span>

<span class="k">pub</span> <span class="k">trait</span> <span class="n">AsULE</span><span class="p">:</span> <span class="nb">Copy</span> <span class="p">{</span>
    <span class="k">type</span> <span class="n">ULE</span><span class="p">:</span> <span class="n">ULE</span><span class="p">;</span>

    <span class="c1">// Convert to the ULE type</span>
    <span class="k">fn</span> <span class="nf">to_unaligned</span><span class="p">(</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="k">Self</span><span class="p">::</span><span class="n">ULE</span><span class="p">;</span>
    <span class="c1">// Convert back from the ULE type</span>
    <span class="k">fn</span> <span class="nf">from_unaligned</span><span class="p">(</span><span class="n">unaligned</span><span class="p">:</span> <span class="k">Self</span><span class="p">::</span><span class="n">ULE</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="k">Self</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">trait</span> <span class="n">VarULE</span><span class="p">:</span> <span class="k">'static</span> <span class="p">{</span>
    <span class="c1">// Validate that a byte slice is appropriate to treat as a reference to this type</span>
    <span class="k">fn</span> <span class="nf">validate_byte_slice</span><span class="p">(</span><span class="n">_bytes</span><span class="p">:</span> <span class="o">&amp;</span><span class="p">[</span><span class="nb">u8</span><span class="p">])</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="n">ZeroVecError</span><span class="o">&gt;</span><span class="p">;</span>

    <span class="c1">// Construct a reference to Self from a known-valid byte slice</span>
    <span class="c1">// This is necessary since VarULE types are dynamically sized and the working of the metadata</span>
    <span class="c1">// of the fat pointer varies between such types</span>
    <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">from_byte_slice_unchecked</span><span class="p">(</span><span class="n">bytes</span><span class="p">:</span> <span class="o">&amp;</span><span class="p">[</span><span class="nb">u8</span><span class="p">])</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="k">Self</span><span class="p">;</span>

    <span class="c1">// less relevant utility methods omitted</span>
<span class="p">}</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">ZeroVec&lt;T&gt;</code> takes in types that are <code class="language-plaintext highlighter-rouge">AsULE</code> and stores them internally as slices of their ULE types (<code class="language-plaintext highlighter-rouge">&amp;[T::ULE]</code>). Such slices can be freely zero-copy serialized. When you attempt to index a <code class="language-plaintext highlighter-rouge">ZeroVec</code>, it converts the value back to <code class="language-plaintext highlighter-rouge">T</code> on the fly, an operation that’s usually just an unaligned load.</p>

<p><code class="language-plaintext highlighter-rouge">VarZeroVec&lt;T&gt;</code> is a bit more complicated. The beginning of its memory stores the indices of every element in the vector, followed by the data for all of the elements just splatted one after the other. As long as the dynamically sized data can be represented in a <em>flat</em> fashion (without further internal indirection), it can implement <code class="language-plaintext highlighter-rouge">VarULE</code>, and thus be used in <code class="language-plaintext highlighter-rouge">VarZeroVec&lt;T&gt;</code>. <code class="language-plaintext highlighter-rouge">str</code> implements this, but so do <code class="language-plaintext highlighter-rouge">ZeroSlice&lt;T&gt;</code> and <code class="language-plaintext highlighter-rouge">VarZeroSlice&lt;T&gt;</code>, allowing for infinite nesting of <code class="language-plaintext highlighter-rouge">zerovec</code> types!</p>

<p><code class="language-plaintext highlighter-rouge">ZeroMap&lt;T&gt;</code> works similarly to the <a href="https://docs.rs/litemap"><code class="language-plaintext highlighter-rouge">litemap</code></a> crate, it’s a map built out of two vectors, using binary search to find keys. This isn’t always as efficient as a hash map but it can work well in a zero-copy way since it can just be backed by <code class="language-plaintext highlighter-rouge">ZeroVec</code> and <code class="language-plaintext highlighter-rouge">VarZeroVec</code>. There’s a bunch of trait infrastructure that allows it to automatically select <code class="language-plaintext highlighter-rouge">ZeroVec</code> or <code class="language-plaintext highlighter-rouge">VarZeroVec</code> for each of the key and value vectors based on the type of the key or value.</p>

<h2 id="what-about-rkyv">What about rkyv?</h2>

<p>An important question when we started down this path was: what about <a href="https://docs.rs/rkyv"><code class="language-plaintext highlighter-rouge">rkyv</code></a>? It had at the time just received a fair amount of attention in the Rust community, and seemed like a pretty cool library targeting the same space.</p>

<p>And in general if you’re looking for zero-copy deserialization, I wholeheartedly recommend looking at it! It’s an impressive library with a lot of thought put into it. When I was refining <a href="https://docs.rs/zerovec"><code class="language-plaintext highlighter-rouge">zerovec</code></a> I learned a lot from <a href="https://docs.rs/rkyv"><code class="language-plaintext highlighter-rouge">rkyv</code></a> having some insightful discussions with <a href="https://github.com/djkoloski">David</a> and comparing notes on approaches.</p>

<p>The main sticking point, for us, was that <a href="https://docs.rs/rkyv"><code class="language-plaintext highlighter-rouge">rkyv</code></a> works kinda separately from <a href="https://docs.rs/serde"><code class="language-plaintext highlighter-rouge">serde</code></a>: it uses its own traits and own serialization mechanism. We really liked <a href="https://docs.rs/serde"><code class="language-plaintext highlighter-rouge">serde</code></a>’s model and wanted to keep using it, especially since we wanted to support a variety of human-readable and non-human-readable data formats, including <a href="https://docs.rs/postcard"><code class="language-plaintext highlighter-rouge">postcard</code></a>, which is explicitly designed for low-resource environments. This becomes even more important for data interchange; we’d want programs written in other languages to be able to construct and send over data without necessarily being constrained to a particular wire format.</p>

<p>The goal of <a href="https://docs.rs/zerovec/latest/zerovec/enum.ZeroVec.html"><code class="language-plaintext highlighter-rouge">zerovec</code></a> is essentially to bring <a href="https://docs.rs/rkyv"><code class="language-plaintext highlighter-rouge">rkyv</code></a>-like improvements to a <a href="https://docs.rs/serde"><code class="language-plaintext highlighter-rouge">serde</code></a> universe without disrupting that universe too much. <code class="language-plaintext highlighter-rouge">zerovec</code> types, on human-readable formats like JSON, serialize to a normal human-readable representation of the structure, and on binary formats like <a href="https://docs.rs/postcard"><code class="language-plaintext highlighter-rouge">postcard</code></a>, serialize to a compact, zero-copy-friendly representation that Just Works.</p>

<h2 id="how-does-it-perform">How does it perform?</h2>

<p>So off the bat I’ll mention that <a href="https://docs.rs/rkyv"><code class="language-plaintext highlighter-rouge">rkyv</code></a> maintains <a href="https://github.com/djkoloski/rust_serialization_benchmark">a very good benchmark suite</a> that I really need to get around to integrating with zerovec, but haven’t yet.</p>

<div class="discussion discussion-issue">
            <img class="bobblehead" width="60px" height="60px" title="Negative pion" alt="Speech bubble for character Negative pion" src="http://manishearth.github.io/images/pion-minus.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             Why not go do that first? It would make your post better!
            </div>
        </div>

<p>Well, I was delaying working on this post until I had those benchmarks integrated, but that’s not how executive function works, and at this point I’d rather publish with the benchmarks I have rather than delaying further. I might update this post with the Good Benchmarks later!</p>

<div class="discussion discussion-issue">
            <img class="bobblehead" width="60px" height="60px" title="Negative pion" alt="Speech bubble for character Negative pion" src="http://manishearth.github.io/images/pion-minus.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             Hmph.
            </div>
        </div>

<p>The complete benchmark run details can be found <a href="https://gist.github.com/Manishearth/056a0ec12f9c943d71d214713d448ac0">here</a> (run via <code class="language-plaintext highlighter-rouge">cargo bench</code> at <a href="https://github.com/unicode-org/icu4x/tree/1e072b3248b93a974e21f3d01bc6a165eb272554/utils/zerovec"><code class="language-plaintext highlighter-rouge">1e072b32</code></a>. I’m pulling out some specific data points for illustration:</p>

<p><code class="language-plaintext highlighter-rouge">ZeroVec</code>:</p>

<table>
<thead><th>Benchmark</th><th>Slice</th><th>ZeroVec</th></thead>
<tbody>

   <tr><th>Deserialization (with <code>bincode</code>)</th></tr>
   <tr><th>Deserialize a vector of 100 u32s</th><td>141.55 ns</td><td>12.166 ns</td></tr>
   <tr><th>Deserialize a vector of 15 chars</th><td>225.55 ns</td><td>25.668 ns</td></tr>
   <tr><th>Deserialize and then sum a vector of 20 u32s</th><td>47.423 ns</td><td>14.131 ns</td></tr>

   <tr><th>Element fetching performance</th></tr>
   <tr><th>Sum a vector of 75 u32 elements</th><td>4.3091 ns</td><td>5.7108 ns</td></tr>
   <tr><th>Binary search a vector of 1000 u32 elements, 50 times</th><td>428.48 ns</td><td>565.23 ns</td></tr>
   <tr><th>Binary search a vector of 1000 u32 elements, 50 times</th><td>428.48 ns</td><td>565.23 ns</td></tr>
   <tr><th>Serialization</th></tr>

   <tr><th>Serialize a vector of 20 u32s</th><td>51.324 ns</td><td>21.582 ns</td></tr>
   <tr><th>Serialize a vector of 15 chars</th><td>195.75 ns</td><td>21.123 ns</td></tr>
</tbody>
</table>

<p><br />
In general we don’t care about serialization performance much, however serialization is fast here because <code class="language-plaintext highlighter-rouge">ZeroVec</code>s are always stored in memory as the same form they would be serialized at. This can make mutation slower. Fetching operations are a little bit slower on <code class="language-plaintext highlighter-rouge">ZeroVec</code>. The deserialization performance is where we see our real wins, sometimes being more than ten times as fast!</p>

<p><code class="language-plaintext highlighter-rouge">VarZeroVec</code>:</p>

<p>The strings are randomly generated, picked with sizes between 2 and 20 code points, and the same set of strings is used for any given row.</p>

<table>
<thead><th>Benchmark</th><th><code>Vec&lt;String&gt;</code></th><th><code>Vec&lt;&amp;str&gt;</code></th><th>VarZeroVec</th></thead>
<tbody>

   <tr><th>Deserialize (len 100)</th><td>11.274 us</td><td>2.2486 us</td><td>1.9446 us</td></tr>

   <tr><th>Count code points (len 100)</th><td colspan="2">728.99 ns</td><td>1265.0 ns</td></tr>
   <tr><th>Binary search for 1 element (len 500)</th><td colspan="2">57.788 ns</td><td>122.10 ns</td></tr>
   <tr><th>Binary search for 10 elements (len 500)</th><td colspan="2">451.40 ns</td><td>803.67 ns</td></tr>

</tbody>
</table>
<p><br /></p>

<p>Here, fetching operations are a bit slower since they need to read the indexing array, but there’s still a decent win for zero-copy deserialization. The deserialization wins stack up for more complex data; for <code class="language-plaintext highlighter-rouge">Vec&lt;String&gt;</code> you can get <em>most</em> of the wins by using <code class="language-plaintext highlighter-rouge">Vec&lt;&amp;str&gt;</code>, but that’s not necessarily possible for something more complex. We don’t currently have mutation benchmarks for <code class="language-plaintext highlighter-rouge">VarZeroVec</code>, but mutation can be slow and as mentioned before it’s not intended to be used much in client code.</p>

<p>Some of this is still in flux; for example we are in the process of <a href="https://github.com/unicode-org/icu4x/pull/2306">making <code class="language-plaintext highlighter-rouge">VarZeroVec</code>’s buffer format configurable</a> so that users can pick their precise tradeoffs.</p>

<h2 id="try-it-out">Try it out!</h2>

<p>Similar to <a href="https://docs.rs/yoke"><code class="language-plaintext highlighter-rouge">yoke</code></a>, I don’t consider the <a href="https://docs.rs/zerovec/latest/zerovec/enum.ZeroVec.html"><code class="language-plaintext highlighter-rouge">zerovec</code></a> crate “done” yet, but it’s been in use in ICU4X for a year now and I consider it mature enough to recommend to others. Try it out! Let me know what you think!</p>

<p><em>Thanks to <a href="https://twitter.com/plaidfinch">Finch</a>, <a href="https://twitter.com/yaahc_">Jane</a>, and <a href="https://github.com/sffc">Shane</a> for reviewing drafts of this post</em></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>A <em>locale</em> is typically a language and location, though it may contain additional information like the writing system or even things like the calendar system in use. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>Bear in mind, this isn’t just a matter of picking a format like MM-DD-YYYY! Dates in just US English can look like <code class="language-plaintext highlighter-rouge">4/10/22</code> or <code class="language-plaintext highlighter-rouge">4/10/2022</code> or <code class="language-plaintext highlighter-rouge">April 10, 2022</code>, or <code class="language-plaintext highlighter-rouge">Sunday, April 10, 2022 C.E.</code>, or <code class="language-plaintext highlighter-rouge">Sun, Apr 10, 2022</code>, and that’s not without thinking about week numbers, quarters, or time! This quickly adds up to a decent amount of data for each locale. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>As mentioned in the previous post, while zero-copy deserializing, it is typical to use borrowed-or-owned types like <code class="language-plaintext highlighter-rouge">Cow</code> over pure borrowed types because it’s not necessary that data in a human-readable format will be able to zero-copy deserialize. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>Most modern systems are little endian, so this imposes one fewer potential cost on conversion. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Not a Yoking Matter (Zero-Copy #1)]]></title>
    <link href="http://manishearth.github.io/blog/2022/08/03/zero-copy-1-not-a-yoking-matter/"/>
    <updated>2022-08-03T00:00:00+00:00</updated>
    <id>http://manishearth.github.io/blog/2022/08/03/zero-copy-1-not-a-yoking-matter</id>
    <content type="html"><![CDATA[<p><em>This is part 1 of a three-part series on interesting abstractions for zero-copy deserialization I’ve been working on over the last year. This part is about making zero-copy deserialization more pleasant to work with. Part 2 is about making it work for more types and can be found <a href="http://manishearth.github.io/blog/2022/08/03/zero-copy-2-zero-copy-all-the-things/">here</a>; while Part 3 is about eliminating the deserialization step entirely and can be found <a href="http://manishearth.github.io/blog/2022/08/03/zero-copy-3-so-zero-its-dot-dot-dot-negative/">here</a>. The posts can be read in any order, though this post contains an explanation of what zero-copy deserialization</em> is.</p>

<h2 id="background">Background</h2>

<p>For the past year and a half I’ve been working full time on <a href="https://github.com/unicode-org/icu4x">ICU4X</a>, a new internationalization library in Rust being built under the Unicode Consortium as a collaboration between various companies.</p>

<p>There’s a lot I can say about ICU4X, but to focus on one core value proposition: we want it to be <em>modular</em> both in data and code. We want ICU4X to be usable on embedded platforms, where memory is at a premium. We want applications constrained by download size to be able to support all languages rather than pick a couple popular ones because they cannot afford to bundle in all that data. As a part of this, we want loading data to be <em>fast</em> and pluggable. Users should be able to design their own data loading strategies for their individual use cases.</p>

<p>See, a key part of performing correct internationalization is the <em>data</em>. Different locales<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> do things differently, and all of the information on this needs to go somewhere, preferably not code. You need data on how a particular locale formats dates<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, or how plurals work in a particular language, or how to accurately segment languages like Thai which are typically not written with spaces so that you can insert linebreaks in appropriate positions.</p>

<p>Given the focus on data, a <em>very</em> attractive option for us is zero-copy deserialization. In the process of trying to do zero-copy deserialization well, we’ve built some cool new libraries, this article is about one of them.</p>

<figure class="caption-wrapper center" style="width: 400px"><img class="caption" src="http://manishearth.github.io/images/post/cow-tools.png" width="400" /><figcaption class="caption-text"><p><small>Gary Larson, <a href="https://en.wikipedia.org/wiki/Cow_Tools">“Cow Tools”</a>, <em>The Far Side</em>. October 1982</small></p>
</figcaption></figure>

<h2 id="zero-copy-deserialization-the-basics">Zero-copy deserialization: the basics</h2>

<p><em>This section can be skipped if you’re already familiar with zero-copy deserialization in Rust</em></p>

<p>Deserialization typically involves two tasks, done in concert: validating the data, and constructing an in-memory representation that can be programmatically accessed; i.e., the final deserialized value.</p>

<p>Depending on the format, the former is typically rather fast, but the latter can be super slow, typically around any variable-sized data which needs a new allocation and often a large copy.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Serialize,</span> <span class="nd">Deserialize)]</span>
<span class="k">struct</span> <span class="n">Person</span> <span class="p">{</span>
    <span class="c1">// this field is nearly free to construct</span>
    <span class="n">age</span><span class="p">:</span> <span class="nb">u8</span><span class="p">,</span>
    <span class="c1">// constructing this will involve a small allocation and copy</span>
    <span class="n">name</span><span class="p">:</span> <span class="nb">String</span><span class="p">,</span>
    <span class="c1">// this may take a while</span>
    <span class="n">rust_files_written</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">String</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>A typical binary data format will probably store this as a byte for the age, followed by the length of <code class="language-plaintext highlighter-rouge">name</code>, followed by the bytes for <code class="language-plaintext highlighter-rouge">name</code>, followed by another length for the vector, followed by a length and string data for each <code class="language-plaintext highlighter-rouge">String</code> value. Deserializing the <code class="language-plaintext highlighter-rouge">u8</code> age just involves reading it, but the other two fields require allocating sufficient memory and copying each byte over, in addition to any validation the types may need.</p>

<p>A common technique in this scenario is to skip the allocation and copy by simply <em>validating</em> the bytes and storing a <em>reference</em> to the original data. This can only be done for serialization formats where the data is represented identically in the serialized file and in the deserialized value.</p>

<p>When using <a href="https://docs.rs/serde"><code class="language-plaintext highlighter-rouge">serde</code></a> in Rust, this is typically done by using a <a href="https://doc.rust-lang.org/stable/std/borrow/struct.Cow.html"><code class="language-plaintext highlighter-rouge">Cow&lt;'a, T&gt;</code></a> with <code class="language-plaintext highlighter-rouge">#[serde(borrow)]</code>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Serialize,</span> <span class="nd">Deserialize)]</span>
<span class="k">struct</span> <span class="n">Person</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">age</span><span class="p">:</span> <span class="nb">u8</span><span class="p">,</span>
    <span class="nd">#[serde(borrow)]</span>
    <span class="n">name</span><span class="p">:</span> <span class="n">Cow</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="nb">str</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>

</code></pre></div></div>

<p>Now, when <code class="language-plaintext highlighter-rouge">name</code> is being deserialized, the deserializer only needs to validate that it is in fact a valid UTF-8 <code class="language-plaintext highlighter-rouge">str</code>, and the final value for <code class="language-plaintext highlighter-rouge">name</code> will be a reference to the original data being deserialized from itself.</p>

<p>An <code class="language-plaintext highlighter-rouge">&amp;'a str</code> can also be used instead of the <code class="language-plaintext highlighter-rouge">Cow</code>, however this makes the <code class="language-plaintext highlighter-rouge">Deserialize</code> impl much less general, since formats that do <em>not</em> store strings identically to their in-memory representation (e.g. JSON with strings that include escapes) will not be able to fall back to an owned value. As a result of this, owned-or-borrowed <a href="https://doc.rust-lang.org/stable/std/borrow/struct.Cow.html"><code class="language-plaintext highlighter-rouge">Cow&lt;'a, T&gt;</code></a> is often a cornerstone of good design when writing Rust code partaking in zero-copy deserialization.</p>

<div class="post-aside post-aside-note">You may notice that <code class="language-plaintext highlighter-rouge">rust_files_written</code> can’t be found in this new struct. This is because <a href="https://docs.rs/serde"><code class="language-plaintext highlighter-rouge">serde</code></a>, out of the box, can’t handle zero-copy deserialization for anything other than <code class="language-plaintext highlighter-rouge">str</code> and <code class="language-plaintext highlighter-rouge">[u8]</code>, for very good reasons. Other frameworks like <a href="https://docs.rs/rkyv"><code class="language-plaintext highlighter-rouge">rkyv</code></a> can, however we’ve also managed to make this possible with <a href="https://docs.rs/serde"><code class="language-plaintext highlighter-rouge">serde</code></a>. I’ll go in more depth about said reasons and our solution in <a href="http://manishearth.github.io/blog/2022/08/03/zero-copy-2-zero-copy-all-the-things/">part 2</a>.</div>

<div class="discussion discussion-example">
            <img class="bobblehead" width="60px" height="60px" title="Confused pion" alt="Speech bubble for character Confused pion" src="http://manishearth.github.io/images/pion-nought.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             Aren’t there still copies occurring here with the <code class="language-plaintext highlighter-rouge">age</code> field?
            </div>
        </div>

<p>Yes, “zero-copy” is somewhat of a misnomer, what it really means is “zero allocations”, or, alternatively, “zero large copies”. Look at it this way: data like <code class="language-plaintext highlighter-rouge">age</code> does get copied, but without, say, allocating a vector of <code class="language-plaintext highlighter-rouge">Person&lt;'a&gt;</code>, you’re only going to see that copy occur a couple times when individually deserializing <code class="language-plaintext highlighter-rouge">Person&lt;'a&gt;</code>s or when deserializing some struct that contains <code class="language-plaintext highlighter-rouge">Person&lt;'a&gt;</code> a couple times. To have a large copy occur <em>without</em> involving allocations, your type would have to be something that is that large on the stack in the first place, which people avoid in general because it means a large copy every time you move the value around even when you’re not deserializing.</p>

<h2 id="when-life-gives-you-lifetimes-">When life gives you lifetimes ….</h2>

<p>Zero-copy deserialization in Rust has one very pesky downside: the lifetimes. Suddenly, all of your deserialized types have lifetimes on them. Of course they would; they’re no longer self-contained, instead containing references to the data they were originally deserialized from!</p>

<p>This isn’t a problem unique to Rust, either, zero-copy deserialization always introduces more complex dependencies between your types, and different frameworks handle this differently; from leaving management of the lifetimes to the user to using reference counting or a GC to ensure the data sticks around. Rust serialization libraries can do stuff like this if they wish, too. In this case, <a href="https://docs.rs/serde"><code class="language-plaintext highlighter-rouge">serde</code></a>, in a very Rusty fashion, wants the library user to have control over the precise memory management here and surfaces this problem as a lifetime.</p>

<p>Unfortunately, lifetimes like these tend to make their way into everything. Every type holding onto your deserialized type needs a lifetime now and it’s likely going to become your users’ problem too.</p>

<p>Furthermore, Rust lifetimes are a purely compile-time construct. If your value is of a type with a lifetime, you need to know at compile time by when it will definitely no longer be in use, and you need to hold on to its source data until then. Rust’s design means that you don’t need to worry about getting this <em>wrong</em>, since the compiler will catch you, but you still need to <em>do it</em>.</p>

<p>All of this isn’t ideal for cases where you want to manage the lifetimes at runtime, e.g. if your data is being deserialized from a larger file and you wish to cache the loaded file as long as data deserialized from it is still around.</p>

<p>Typically in such cases you can use <a href="https://doc.rust-lang.org/stable/std/rc/struct.Rc.html"><code class="language-plaintext highlighter-rouge">Rc&lt;T&gt;</code></a>, which is effectively the “runtime instead of compile time” version of <code class="language-plaintext highlighter-rouge">&amp;'a T</code>s safe shared reference, but this only works for cases where you’re sharing homogenous types, whereas in this case we’re attempting to share different types deserialized from one blob of data, which itself is of a different type.</p>

<p>ICU4X would like users to be able to make use of caching and other data management strategies as needed, so this won’t do at all. For a while ICU4X had not one but <em>two</em> pervasive lifetimes threaded throughout most of its types: it was both confusing and not in line with our goals.</p>

<h2 id="-make-life-take-the-lifetimes-back">… make life take the lifetimes back</h2>

<p><em>A lot of the design here can be found explained in the <a href="https://github.com/unicode-org/icu4x/blob/main/utils/yoke/design_doc.md">design doc</a></em></p>

<p>After <a href="https://github.com/unicode-org/icu4x/issues/667#issuecomment-828123099">a bunch of discussion</a> on this, primarily with <a href="https://github.com/sffc">Shane</a>, I designed <a href="https://docs.rs/yoke"><code class="language-plaintext highlighter-rouge">yoke</code></a>, a crate that attempts to provide <em>lifetime erasure</em> in Rust via self-referential types.</p>

<div class="discussion discussion-example">
            <img class="bobblehead" width="60px" height="60px" title="Confused pion" alt="Speech bubble for character Confused pion" src="http://manishearth.github.io/images/pion-nought.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             Wait, <em>lifetime</em> erasure?
            </div>
        </div>

<p>Like type erasure! “Type erasure” (in Rust, done using <code class="language-plaintext highlighter-rouge">dyn Trait</code>) lets you take a compile time concept (the type of a value) and move it into something that can be decided at runtime. Analogously, the core value proposition of <code class="language-plaintext highlighter-rouge">yoke</code> is to take types burdened with the compile time concept of lifetimes and allow you to decide they be decided at runtime anyway.</p>

<div class="discussion discussion-example">
            <img class="bobblehead" width="60px" height="60px" title="Confused pion" alt="Speech bubble for character Confused pion" src="http://manishearth.github.io/images/pion-nought.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             Doesn’t <code class="language-plaintext highlighter-rouge">Rc&lt;T&gt;</code> already let you make lifetimes a runtime decision?
            </div>
        </div>

<p>Kind of, <code class="language-plaintext highlighter-rouge">Rc&lt;T&gt;</code> on its own lets you <em>avoid</em> compile-time lifetimes, whereas <code class="language-plaintext highlighter-rouge">Yoke</code> works with situations where there is already a lifetime (e.g. due to zero copy deserialization) that you want to paper over.</p>

<div class="discussion discussion-example">
            <img class="bobblehead" width="60px" height="60px" title="Confused pion" alt="Speech bubble for character Confused pion" src="http://manishearth.github.io/images/pion-nought.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             Cool! What does that look like?
            </div>
        </div>

<p>The general idea is that you can take a zero-copy deserializeable type like a <code class="language-plaintext highlighter-rouge">Cow&lt;'a, str&gt;</code> (or something more complicated) and “yoke” it to the value it was deserialized from, which we call a “cart”.</p>

<div class="discussion discussion-issue">
            <img class="bobblehead" width="60px" height="60px" title="Negative pion" alt="Speech bubble for character Negative pion" src="http://manishearth.github.io/images/pion-minus.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             <em>*groan*</em> not another crate named with a pun, Manish.
            </div>
        </div>

<p>I will never stop.</p>

<p>Anyway, here’s what that looks like.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Some types explicitly mentioned for clarity</span>

<span class="c1">// load a file</span>
<span class="k">let</span> <span class="n">file</span><span class="p">:</span> <span class="nb">Rc</span><span class="o">&lt;</span><span class="p">[</span><span class="nb">u8</span><span class="p">]</span><span class="o">&gt;</span> <span class="o">=</span> <span class="nn">fs</span><span class="p">::</span><span class="nf">read</span><span class="p">(</span><span class="s">"data.postcard"</span><span class="p">)</span><span class="o">?</span><span class="nf">.into</span><span class="p">();</span>

<span class="c1">// create a new Rc reference to the file data by cloning it,</span>
<span class="c1">// then use it as a cart for a Yoke</span>
<span class="k">let</span> <span class="n">y</span><span class="p">:</span> <span class="n">Yoke</span><span class="o">&lt;</span><span class="n">Cow</span><span class="o">&lt;</span><span class="k">'static</span><span class="p">,</span> <span class="nb">str</span><span class="o">&gt;</span><span class="p">,</span> <span class="nb">Rc</span><span class="o">&lt;</span><span class="p">[</span><span class="nb">u8</span><span class="p">]</span><span class="o">&gt;&gt;</span> <span class="o">=</span> <span class="nn">Yoke</span><span class="p">::</span><span class="nf">attach_to_cart</span><span class="p">(</span><span class="n">file</span><span class="nf">.clone</span><span class="p">(),</span> <span class="p">|</span><span class="n">contents</span><span class="p">|</span> <span class="p">{</span>
    <span class="c1">// deserialize from the file</span>
    <span class="k">let</span> <span class="n">cow</span><span class="p">:</span> <span class="n">Cow</span><span class="o">&lt;</span><span class="nb">str</span><span class="o">&gt;</span> <span class="o">=</span>  <span class="nn">postcard</span><span class="p">::</span><span class="nf">from_bytes</span><span class="p">(</span><span class="o">&amp;</span><span class="n">contents</span><span class="p">);</span>
    <span class="n">cow</span>
<span class="p">})</span>

<span class="c1">// the string is still accessible with `.get()`</span>
<span class="nd">println!</span><span class="p">(</span><span class="s">"{}"</span><span class="p">,</span> <span class="n">y</span><span class="nf">.get</span><span class="p">())</span>

<span class="nf">drop</span><span class="p">(</span><span class="n">y</span><span class="p">);</span>
<span class="c1">// only now will the reference count on the file be decreased</span>
</code></pre></div></div>

<div class="post-aside post-aside-issue">Some of the APIs here may not quite work due to current compiler bugs. In this blog post I’m using the ideal version of these APIs for illustrative purposes, but it’s worth checking with the Yoke docs to see if you may need to use an alternate workaround API. <em>Most</em> of the bugs have been fixed as of Rust 1.61.</div>

<div class="discussion discussion-note">
            <img class="bobblehead" width="60px" height="60px" title="Positive pion" alt="Speech bubble for character Positive pion" src="http://manishearth.github.io/images/pion-plus.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             The example above uses <a href="https://docs.rs/postcard"><code class="language-plaintext highlighter-rouge">postcard</code></a>: <code class="language-plaintext highlighter-rouge">postcard</code> is a really neat <code class="language-plaintext highlighter-rouge">serde</code>-compatible binary serialization format, designed for use on resource constrained environments. It’s quite fast and has a low codesize, check it out!
            </div>
        </div>

<p>The type <code class="language-plaintext highlighter-rouge">Yoke&lt;Cow&lt;'static, str&gt;, Rc&lt;[u8]&gt;&gt;</code> is “a lifetime-erased <code class="language-plaintext highlighter-rouge">Cow&lt;str&gt;</code> ‘yoked’ to a backing data store ‘cart’ that is an <code class="language-plaintext highlighter-rouge">Rc&lt;[u8]&gt;</code>”. What this means is that the Cow contains references to data from the cart, however, the <code class="language-plaintext highlighter-rouge">Yoke</code> will hold on to the cart type until it is done, which ensures the references from the <code class="language-plaintext highlighter-rouge">Cow</code> no longer dangle.</p>

<p>Most operations on the data within a <code class="language-plaintext highlighter-rouge">Yoke</code> operate via <code class="language-plaintext highlighter-rouge">.get()</code>, which in this case will return a <code class="language-plaintext highlighter-rouge">Cow&lt;'a, str&gt;</code>, where <code class="language-plaintext highlighter-rouge">'a</code> is the lifetime of borrow of <code class="language-plaintext highlighter-rouge">.get()</code>. This keeps things safe: a <code class="language-plaintext highlighter-rouge">Cow&lt;'static, str&gt;</code> is not really safe to distribute in this case since <code class="language-plaintext highlighter-rouge">Cow</code> is not actually borrowing from static data; however it’s fine as long as we transform the lifetime to something shorter during accesses.</p>

<p>Turns out, the <code class="language-plaintext highlighter-rouge">'static</code> found in <code class="language-plaintext highlighter-rouge">Yoke</code> types is actually a lie! Rust doesn’t really let you work with types with borrowed content without mentioning <em>some</em> lifetime, and here we want to relieve the compiler from its duty of managing lifetimes and manage them ourselves, so we need to give it <em>something</em> so that we can name the type, and <code class="language-plaintext highlighter-rouge">'static</code> is the only preexisting named lifetime in Rust.</p>

<p>The actual signature of <code class="language-plaintext highlighter-rouge">.get()</code> is <a href="https://docs.rs/yoke/latest/yoke/struct.Yoke.html#method.get">a bit weird</a> since it needs to be generic, but if our borrowed type is <code class="language-plaintext highlighter-rouge">Foo&lt;'a&gt;</code>, then the signature of <code class="language-plaintext highlighter-rouge">.get()</code> is something like this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span> <span class="n">Yoke</span><span class="o">&lt;</span><span class="n">Foo</span><span class="o">&lt;</span><span class="k">'static</span><span class="o">&gt;&gt;</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="n">get</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span><span class="p">(</span><span class="o">&amp;</span><span class="nv">'a</span> <span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="n">Foo</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="o">...</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>For a type to be allowed within a <code class="language-plaintext highlighter-rouge">Yoke&lt;Y, C&gt;</code>, it must implement <code class="language-plaintext highlighter-rouge">Yokeable&lt;'a&gt;</code>. This trait is unsafe to manually implement, in most cases you should autoderive it with <code class="language-plaintext highlighter-rouge">#[derive(Yokeable)]</code>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Yokeable,</span> <span class="nd">Serialize,</span> <span class="nd">Deserialize)]</span>
<span class="k">struct</span> <span class="n">Person</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">age</span><span class="p">:</span> <span class="nb">u8</span><span class="p">,</span>
    <span class="nd">#[serde(borrow)]</span>
    <span class="n">name</span><span class="p">:</span> <span class="n">Cow</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="nb">str</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">let</span> <span class="n">person</span><span class="p">:</span> <span class="n">Yoke</span><span class="o">&lt;</span><span class="n">Person</span><span class="o">&lt;</span><span class="k">'static</span><span class="o">&gt;</span><span class="p">,</span> <span class="nb">Rc</span><span class="o">&lt;</span><span class="p">[</span><span class="nb">u8</span><span class="p">]</span><span class="o">&gt;</span> <span class="o">=</span> <span class="nn">Yoke</span><span class="p">::</span><span class="nf">attach_to_cart</span><span class="p">(</span><span class="n">file</span><span class="nf">.clone</span><span class="p">(),</span> <span class="p">|</span><span class="n">contents</span><span class="p">|</span> <span class="p">{</span>
    <span class="nn">postcard</span><span class="p">::</span><span class="nf">from_bytes</span><span class="p">(</span><span class="o">&amp;</span><span class="n">contents</span><span class="p">)</span>
<span class="p">});</span>
</code></pre></div></div>

<p>Unlike most <code class="language-plaintext highlighter-rouge">#[derive]</code>s, <code class="language-plaintext highlighter-rouge">Yokeable</code> can be derived even if the fields do not already implement <code class="language-plaintext highlighter-rouge">Yokeable</code>, except for cases when fields with lifetimes also have other generic parameters. In such cases it typically suffices to tag the type with <code class="language-plaintext highlighter-rouge">#[yoke(prove_covariance_manually)]</code> and ensure any fields with lifetimes also implement <code class="language-plaintext highlighter-rouge">Yokeable</code>.</p>

<p>There’s a bunch more you can do with <code class="language-plaintext highlighter-rouge">Yoke</code>, for example you can “project” a yoke to get a new yoke with a subset of the data found in the initial one:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">person</span><span class="p">:</span> <span class="n">Yoke</span><span class="o">&lt;</span><span class="n">Person</span><span class="o">&lt;</span><span class="k">'static</span><span class="o">&gt;</span><span class="p">,</span> <span class="nb">Rc</span><span class="o">&lt;</span><span class="p">[</span><span class="nb">u8</span><span class="p">]</span><span class="o">&gt;&gt;</span> <span class="o">=</span> <span class="o">...</span><span class="err">.</span><span class="p">;</span>

<span class="k">let</span> <span class="n">person_name</span><span class="p">:</span> <span class="n">Yoke</span><span class="o">&lt;</span><span class="n">Cow</span><span class="o">&lt;</span><span class="k">'static</span><span class="p">,</span> <span class="nb">str</span><span class="o">&gt;&gt;</span> <span class="o">=</span> <span class="n">person</span><span class="nf">.project</span><span class="p">(|</span><span class="n">p</span><span class="p">,</span> <span class="n">_</span><span class="p">|</span> <span class="n">p</span><span class="py">.name</span><span class="p">);</span>

</code></pre></div></div>

<p>This allows one to mix data coming from disparate Yokes.</p>

<p><code class="language-plaintext highlighter-rouge">Yoke</code>s are, perhaps surprisingly, <em>mutable</em> as well! They are, after all, primarily intended to be used with copy-on-write data, so there are ways to mutate them provided that no <em>additional</em> borrowed data sneaks in:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">mut</span> <span class="n">person</span><span class="p">:</span> <span class="n">Yoke</span><span class="o">&lt;</span><span class="n">Person</span><span class="o">&lt;</span><span class="k">'static</span><span class="o">&gt;</span><span class="p">,</span> <span class="nb">Rc</span><span class="o">&lt;</span><span class="p">[</span><span class="nb">u8</span><span class="p">]</span><span class="o">&gt;&gt;</span> <span class="o">=</span> <span class="o">...</span><span class="err">.</span><span class="p">;</span>

<span class="c1">// make the name sound fancier</span>
<span class="n">person</span><span class="nf">.with_mut</span><span class="p">(|</span><span class="n">person</span><span class="p">|</span> <span class="p">{</span>
    <span class="c1">// this will convert the `Cow` into owned one</span>
    <span class="n">person</span><span class="py">.name</span><span class="nf">.to_mut</span><span class="p">()</span><span class="nf">.push</span><span class="p">(</span><span class="s">", Esq."</span><span class="p">)</span>
<span class="p">})</span>
</code></pre></div></div>

<p>Overall <code class="language-plaintext highlighter-rouge">Yoke</code> is a pretty powerful abstraction, useful for a host of situations involving zero-copy deserialization as well as other cases involving heavy borrowing. In ICU4X the abstractions we use to load data always use <code class="language-plaintext highlighter-rouge">Yoke</code>s, allowing various data loading strategies — including caching — to be mixed</p>

<h3 id="how-it-works">How it works</h3>

<div class="discussion discussion-note">
            <img class="bobblehead" width="60px" height="60px" title="Positive pion" alt="Speech bubble for character Positive pion" src="http://manishearth.github.io/images/pion-plus.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             Manish is about to say the word “covariant” so I’m going to get ahead of him and say: If you have trouble understanding this and the next section, don’t worry! The internal workings of his crate rely on multiple niche concepts that most Rustaceans never need to care about, even those working on otherwise advanced code.
            </div>
        </div>

<p><code class="language-plaintext highlighter-rouge">Yoke</code> works by relying on the concept of a <em>covariant lifetime</em>. The <a href="https://docs.rs/yoke/latest/yoke/trait.Yokeable.html"><code class="language-plaintext highlighter-rouge">Yokeable</code></a> trait looks like this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">unsafe</span> <span class="k">trait</span> <span class="n">Yokeable</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span><span class="p">:</span> <span class="k">'static</span> <span class="p">{</span>
    <span class="k">type</span> <span class="n">Output</span><span class="p">:</span> <span class="nv">'a</span><span class="p">;</span>
    <span class="c1">// methods omitted</span>
<span class="p">}</span>
</code></pre></div></div>

<p>and a typical implementation would look something like this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">unsafe</span> <span class="k">impl</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span> <span class="n">Yokeable</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span> <span class="k">for</span> <span class="n">Cow</span><span class="o">&lt;</span><span class="k">'static</span><span class="p">,</span> <span class="nb">str</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">type</span> <span class="n">Output</span><span class="p">:</span> <span class="nv">'a</span> <span class="o">=</span> <span class="n">Cow</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="nb">str</span><span class="o">&gt;</span><span class="p">;</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>An implementation of this trait will be implemented on the <code class="language-plaintext highlighter-rouge">'static</code> version of a type with a lifetime (which I will call <code class="language-plaintext highlighter-rouge">Self&lt;'static&gt;</code><sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> in this post), and maps the type to a version of it with a lifetime (<code class="language-plaintext highlighter-rouge">Self&lt;'a&gt;</code>). It must only be implemented on types where the lifetime <code class="language-plaintext highlighter-rouge">'a</code> is <em>covariant</em>, i.e., where it’s safe to treat <code class="language-plaintext highlighter-rouge">Self&lt;'a&gt;</code> as <code class="language-plaintext highlighter-rouge">Self&lt;'b&gt;</code> when <code class="language-plaintext highlighter-rouge">'b</code> is a shorter lifetime. Most types with lifetimes fall in this category<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>, especially in the space of zero-copy deserialization.</p>

<div class="discussion discussion-note">
            <img class="bobblehead" width="60px" height="60px" title="Positive pion" alt="Speech bubble for character Positive pion" src="http://manishearth.github.io/images/pion-plus.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             You can read more about variance in the <a href="https://doc.rust-lang.org/nomicon/subtyping.html">nomicon</a>!
            </div>
        </div>

<p>For any <code class="language-plaintext highlighter-rouge">Yokeable</code> type <code class="language-plaintext highlighter-rouge">Foo&lt;'static&gt;</code>, you can obtain the version of that type with a lifetime <code class="language-plaintext highlighter-rouge">'a</code> with <code class="language-plaintext highlighter-rouge">&lt;Foo as Yokeable&lt;'a&gt;&gt;::Output</code>. The <code class="language-plaintext highlighter-rouge">Yokeable</code> trait exposes some methods that allow one to safely carry out the various transforms that are allowed on a type with a covariant lifetime.</p>

<p><code class="language-plaintext highlighter-rouge">#[derive(Yokeable)]</code>, in most cases, relies on the compiler’s ability to determine if a lifetime is covariant, and doesn’t actually generate much code! In most cases, the bodies of the various functions on <code class="language-plaintext highlighter-rouge">Yokeable</code> are pure safe code, looking like this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span> <span class="n">Yokeable</span> <span class="k">for</span> <span class="n">Foo</span><span class="o">&lt;</span><span class="k">'static</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">type</span> <span class="n">Output</span><span class="p">:</span> <span class="nv">'a</span> <span class="o">=</span> <span class="n">Foo</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span><span class="p">;</span>
    <span class="k">fn</span> <span class="nf">transform</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="k">Self</span><span class="p">::</span><span class="n">Output</span> <span class="p">{</span>
        <span class="k">self</span>
    <span class="p">}</span>
    <span class="k">fn</span> <span class="nf">transform_owned</span><span class="p">(</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="k">Self</span><span class="p">::</span><span class="n">Output</span> <span class="p">{</span>
        <span class="k">self</span>
    <span class="p">}</span>
    <span class="k">fn</span> <span class="n">transform_mut</span><span class="o">&lt;</span><span class="n">F</span><span class="o">&gt;</span><span class="p">(</span><span class="o">&amp;</span><span class="nv">'a</span> <span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="n">F</span><span class="p">)</span>
    <span class="k">where</span>
        <span class="n">F</span><span class="p">:</span> <span class="k">'static</span> <span class="o">+</span> <span class="k">for</span><span class="o">&lt;</span><span class="nv">'b</span><span class="o">&gt;</span> <span class="nf">FnOnce</span><span class="p">(</span><span class="o">&amp;</span><span class="nv">'b</span> <span class="k">mut</span> <span class="k">Self</span><span class="p">::</span><span class="n">Output</span><span class="p">)</span> <span class="p">{</span>
        <span class="nf">f</span><span class="p">(</span><span class="k">self</span><span class="p">)</span>
    <span class="p">}</span>
    <span class="c1">// fn make() omitted since it's not as relevant</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The compiler knows these are safe because it knows that the type is covariant, and the <code class="language-plaintext highlighter-rouge">Yokeable</code> trait allows us to talk about types where these operations are safe, <em>generically</em>.</p>

<div class="discussion discussion-note">
            <img class="bobblehead" width="60px" height="60px" title="Positive pion" alt="Speech bubble for character Positive pion" src="http://manishearth.github.io/images/pion-plus.png" />
            <div class="discussion-spacer"></div>
            <div class="discussion-text">
             In other words, there’s a certain useful property about lifetime “stretchiness” that the compiler knows about, and we can check that the property applies to a type by generating code that the compiler would refuse to compile if the property did not apply.
            </div>
        </div>

<p>Using this trait, <code class="language-plaintext highlighter-rouge">Yoke</code> then works by storing <code class="language-plaintext highlighter-rouge">Self&lt;'static&gt;</code> and transforming it to a shorter, more local lifetime before handing it out to any consumers, using the methods on <code class="language-plaintext highlighter-rouge">Yokeable</code> in various ways. Knowing that the lifetime is covariant is what makes it safe to do such lifetime “squeezing”. The <code class="language-plaintext highlighter-rouge">'static</code> is a lie, but it’s safe to do that kind of thing as long as the value isn’t actually accessed with the <code class="language-plaintext highlighter-rouge">'static</code> lifetime, and we take great care to ensure it doesn’t leak.</p>

<h2 id="better-conversions-zerofrom">Better conversions: ZeroFrom</h2>

<p>A crate that pairs well with this is <a href="https://docs.rs/zerofrom"><code class="language-plaintext highlighter-rouge">zerofrom</code></a>, primarily designed and written by <a href="https://github.com/sffc">Shane</a>. It comes with the <a href="https://docs.rs/zerofrom/latest/zerofrom/trait.ZeroFrom.html"><code class="language-plaintext highlighter-rouge">ZeroFrom</code></a> trait:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">trait</span> <span class="n">ZeroFrom</span><span class="o">&lt;</span><span class="nv">'zf</span><span class="p">,</span> <span class="n">C</span><span class="p">:</span> <span class="o">?</span><span class="nb">Sized</span><span class="o">&gt;</span><span class="p">:</span> <span class="nv">'zf</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">zero_from</span><span class="p">(</span><span class="n">other</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'zf</span> <span class="n">C</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="k">Self</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The idea of this trait is to be able to work generically with types convertible to (often zero-copy) borrowed types.</p>

<p>For example, <code class="language-plaintext highlighter-rouge">Cow&lt;'zf, str&gt;</code> implements both <code class="language-plaintext highlighter-rouge">ZeroFrom&lt;'zf, str&gt;</code> and <code class="language-plaintext highlighter-rouge">ZeroFrom&lt;'zf, String&gt;</code>, as well as <code class="language-plaintext highlighter-rouge">ZeroFrom&lt;'zf, Cow&lt;'a, str&gt;&gt;</code>. It’s similar to the <a href="https://doc.rust-lang.org/stable/std/convert/trait.AsRef.html"><code class="language-plaintext highlighter-rouge">AsRef</code></a> trait but it allows for more flexibility on the kinds of borrowing occuring, and implementors are supposed to minimize the amount of copying during such a conversion. For example, when <code class="language-plaintext highlighter-rouge">ZeroFrom</code>-constructing a <code class="language-plaintext highlighter-rouge">Cow&lt;'zf, str&gt;</code> from some other <code class="language-plaintext highlighter-rouge">Cow&lt;'a, str&gt;</code>, it will <em>always</em> construct a <code class="language-plaintext highlighter-rouge">Cow::Borrowed</code>, even if the original <code class="language-plaintext highlighter-rouge">Cow&lt;'a, str&gt;</code> were owned.</p>

<p><code class="language-plaintext highlighter-rouge">Yoke</code> has a convenient constructor <a href="https://docs.rs/yoke/latest/yoke/struct.Yoke.html#method.attach_to_zero_copy_cart"><code class="language-plaintext highlighter-rouge">Yoke::attach_to_zero_copy_cart()</code></a> that can create a <code class="language-plaintext highlighter-rouge">Yoke&lt;Y, C&gt;</code> out of a cart type <code class="language-plaintext highlighter-rouge">C</code> if <code class="language-plaintext highlighter-rouge">Y&lt;'zf&gt;</code> implements <code class="language-plaintext highlighter-rouge">ZeroFrom&lt;'zf, C&gt;</code> for all lifetimes <code class="language-plaintext highlighter-rouge">'zf</code>. This is useful for cases where you want to do basic self-referential types but aren’t doing any fancy zero-copy deserialization.</p>

<h2 id="-make-life-rue-the-day-it-thought-it-could-give-you-lifetimes">… make life rue the day it thought it could give you lifetimes</h2>

<p>Life with this crate hasn’t been all peachy. We’ve, uh … <a href="https://github.com/rust-lang/rust/issues/90638">unfortunately</a> <a href="https://github.com/rust-lang/rust/issues/86703">discovered</a> <a href="https://github.com/rust-lang/rust/issues/88446">a</a> <a href="https://github.com/rust-lang/rust/issues/89436">toweringly</a> <a href="https://github.com/rust-lang/rust/issues/89196">large</a> <a href="https://github.com/rust-lang/rust/issues/84937">pile</a> <a href="https://github.com/rust-lang/rust/issues/89418">of</a> <a href="https://github.com/rust-lang/rust/issues/90950">gnarly</a> <a href="https://github.com/rust-lang/rust/issues/96223">compiler</a> <a href="https://github.com/rust-lang/rust/issues/91899">bugs</a>. A lot of this has its root in the fact that <code class="language-plaintext highlighter-rouge">Yokeable&lt;'a&gt;</code> in most cases is bound via <code class="language-plaintext highlighter-rouge">for&lt;'a&gt; Yokeable&lt;'a&gt;</code> (“<code class="language-plaintext highlighter-rouge">Yokeable&lt;'a&gt;</code> for all possible lifetimes <code class="language-plaintext highlighter-rouge">'a</code>”). The <code class="language-plaintext highlighter-rouge">for&lt;'a&gt;</code> is a niche feature known as a higher-ranked lifetime or trait bound (often referred to as “HRTB”), and while it’s always been necessary in some capacity for Rust’s typesystem to be able to reason about function pointers, it’s also always been rather buggy and is often discouraged for usages like this.</p>

<p>We’re using it so that we can talk about the lifetime of a type in a generic sense. Fortunately, there is a language feature under active development that will be better suited for this: <a href="https://rust-lang.github.io/generic-associated-types-initiative/index.html">Generic Associated Types</a>.</p>

<p>This feature isn’t stable yet, but, fortunately for <em>us</em>, most compiler bugs involving <code class="language-plaintext highlighter-rouge">for&lt;'a&gt;</code> <em>also</em> impact GATs, so we have been benefitting from the GAT work, and a lot of our bug reports have helped shore up the GAT code. Huge shout out to <a href="https://github.com/jackh726">Jack Huey</a> for fixing a lot of these bugs, and <a href="https://github.com/eddyb">eddyb</a> for helping out in the debugging process.</p>

<p>As of Rust 1.61, a lot of the major bugs have been fixed, however there are still some bugs around trait bounds for which the <code class="language-plaintext highlighter-rouge">yoke</code> crate maintains some <a href="https://docs.rs/yoke/latest/yoke/trait_hack/index.html">workaround helpers</a>. It has been our experience that most compiler bugs here are not <em>restrictive</em> when it comes to what you can do with the crate, but they may end up with code that looks less than ideal. Overall, we still find it worth it, we’re able to do some really neat zero-copy stuff in a way that’s externally convenient (even if some of the internal code is messy), and we don’t have lifetimes everywhere.</p>

<h2 id="try-it-out">Try it out!</h2>

<p>While I don’t consider the <a href="https://docs.rs/yoke"><code class="language-plaintext highlighter-rouge">yoke</code></a> crate “done” yet, it’s been in use in ICU4X for a year now and I consider it mature enough to recommend to others. Try it out! Let me know what you think!</p>

<p><em>Thanks to <a href="https://twitter.com/plaidfinch">Finch</a>, <a href="https://twitter.com/yaahc_">Jane</a>, and <a href="https://github.com/sffc">Shane</a> for reviewing drafts of this post</em></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>A <em>locale</em> is typically a language and location, though it may contain additional information like the writing system or even things like the calendar system in use. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>Bear in mind, this isn’t just a matter of picking a format like MM-DD-YYYY! Dates in just US English can look like <code class="language-plaintext highlighter-rouge">4/10/22</code> or <code class="language-plaintext highlighter-rouge">4/10/2022</code> or <code class="language-plaintext highlighter-rouge">April 10, 2022</code>, or <code class="language-plaintext highlighter-rouge">Sunday, April 10, 2022 C.E.</code>, or <code class="language-plaintext highlighter-rouge">Sun, Apr 10, 2022</code>, and that’s not without thinking about week numbers, quarters, or time! This quickly adds up to a decent amount of data for each locale. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>This isn’t real Rust syntax; since <code class="language-plaintext highlighter-rouge">Self</code> is always just <code class="language-plaintext highlighter-rouge">Self</code>, but we need to be able to refer to <code class="language-plaintext highlighter-rouge">Self</code> as a higher-kinded type in this scenario. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>Types that aren’t are ones involving mutability (<code class="language-plaintext highlighter-rouge">&amp;mut</code> or interior mutability) around the lifetime, and ones involving function pointers and trait objects. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[A Tour of Safe Tracing GC Designs in Rust]]></title>
    <link href="http://manishearth.github.io/blog/2021/04/05/a-tour-of-safe-tracing-gc-designs-in-rust/"/>
    <updated>2021-04-05T00:00:00+00:00</updated>
    <id>http://manishearth.github.io/blog/2021/04/05/a-tour-of-safe-tracing-gc-designs-in-rust</id>
    <content type="html"><![CDATA[<p>I’ve been thinking about garbage collection in Rust for a long time, ever since I started working on <a href="https://github.com/servo/servo">Servo</a>’s JS layer. I’ve <a href="https://manishearth.github.io/blog/2015/09/01/designing-a-gc-in-rust/">designed a GC library</a>, <a href="https://manishearth.github.io/blog/2016/08/18/gc-support-in-rust-api-design/">worked on GC integration ideas for Rust itself</a>, worked on Servo’s JS GC integration, and helped out with a <a href="https://github.com/asajeffrey/josephine">couple</a> <a href="https://github.com/kyren/gc-arena">other</a> GC projects in Rust.</p>

<p>As a result, I tend to get pulled into GC discussions fairly often. I enjoy talking about GCs – don’t get me wrong – but I often end up going over the same stuff. Being <a href="https://manishearth.github.io/blog/2018/08/26/why-i-enjoy-blogging/#blogging-lets-me-be-lazy">lazy</a> I’d much prefer to be able to refer people to a single place where they can get up to speed on the general space of GC design, after which it’s possible to have more in depth discussions about the specific tradeoffs necessary.</p>

<p>I’ll note that some of the GCs in this post are experiments or unmaintained. The goal of this post is to showcase these as examples of <em>design</em>, not necessarily general-purpose crates you may wish to use, though some of them are usable crates as well.</p>

<h3 id="a-note-on-terminology">A note on terminology</h3>

<p>A thing that often muddles discussions about GCs is that according to some definition of “GC”, simple reference counting <em>is</em> a GC. Typically the definition of GC used in academia broadly refers to any kind of automatic memory management. However, most programmers familiar with the term “GC” will usually liken it to “what Java, Go, Haskell, and C# do”, which can be unambiguously referred to as <em>tracing</em> garbage collection.</p>

<p>Tracing garbage collection is the kind which keeps track of which heap objects are directly reachable (“roots”), figures out the whole set of reachable heap objects (“tracing”, also, “marking”), and then cleans them up (“sweeping”).</p>

<p>Throughout this blog post I will use the term “GC” to refer to tracing garbage collection/collectors unless otherwise stated<sup id="fnref:0" role="doc-noteref"><a href="#fn:0" class="footnote" rel="footnote">1</a></sup>.</p>

<h2 id="why-write-gcs-for-rust">Why write GCs for Rust?</h2>

<p>(If you already want to write a GC in Rust and are reading this post to get ideas for <em>how</em>, you can skip this section. You already know why someone would want to write a GC for Rust)</p>

<p>Every time this topic is brought up someone will inevitably go “I thought the point of Rust was to avoid GCs” or “GCs will ruin Rust” or something. As a general rule it’s good to not give too much weight to the comments section, but I think it’s useful to explain why someone may wish for GC-like semantics in Rust.</p>

<p>There are really two distinct kinds of use cases. Firstly, sometimes you need to manage memory with cycles and <code class="language-plaintext highlighter-rouge">Rc&lt;T&gt;</code> is inadequate for the job since <code class="language-plaintext highlighter-rouge">Rc</code>-cycles get leaked. <a href="https://docs.rs/petgraph/"><code class="language-plaintext highlighter-rouge">petgraph</code></a> or an <a href="https://manishearth.github.io/blog/2021/03/15/arenas-in-rust/">arena</a> are often acceptable solutions for this kind of pattern, but not always, especially if your data is super heterogeneous. This kind of thing crops up often when dealing with concurrent datastructures; for example <a href="https://docs.rs/crossbeam/"><code class="language-plaintext highlighter-rouge">crossbeam</code></a> has <a href="https://docs.rs/crossbeam/0.8.0/crossbeam/epoch/index.html">an epoch-based memory management system</a> which, while not a full tracing GC, has a lot of characteristics in common with GCs.</p>

<p>For this use case it’s rarely necessary to design a custom GC, you can look for a reusable crate like <a href="https://docs.rs/gc/"><code class="language-plaintext highlighter-rouge">gc</code></a> <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">2</a></sup>.</p>

<p>The second case is far more interesting in my experience, and since it cannot be solved by off-the-shelf solutions tends to crop up more often: integration with (or implementation of) programming languages that <em>do</em> use a garbage collector. <a href="https://github.com/servo/servo">Servo</a> needs to do this for integrating with the Spidermonkey JS engine and <a href="https://github.com/kyren/luster">luster</a> needed to do this for implementing the GC of its Lua VM. <a href="https://github.com/jasonwilliams/boa/">boa</a>, a pure Rust JS runtime, uses the <a href="https://docs.rs/gc/"><code class="language-plaintext highlighter-rouge">gc</code></a> crate to back its garbage collector.</p>

<p>Sometimes when integrating with a GCd language you can get away with not needing to implement a full garbage collector: JNI does this; while C++ does not have native garbage collection, JNI gets around this by simply “rooting” (we’ll cover what that means in a bit) anything that crosses over to the C++ side<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">3</a></sup>. This is often fine!</p>

<p>The downside of this is that every interaction with objects managed by the GC has to go through an API call; you can’t “embed” efficient Rust/C++ objects in the GC with ease. For example, in browsers most DOM types (e.g. <a href="https://doc.servo.org/script/dom/element/struct.Element.html"><code class="language-plaintext highlighter-rouge">Element</code></a>) are implemented in native code; and need to be able to contain references to other native GC’d types (it should be possible to inspect the <a href="https://doc.servo.org/script/dom/node/struct.Node.html#structfield.child_list">children of a <code class="language-plaintext highlighter-rouge">Node</code></a> without needing to call back into the JavaScript engine).</p>

<p>So sometimes you need to be able to integrate with a GC from a runtime; or even implement your own GC if you are writing a runtime that needs one. In both of these cases you typically want to be able to safely manipulate GC’d objects from Rust code, and even directly put Rust types on the GC heap.</p>

<h2 id="why-are-gcs-in-rust-hard">Why are GCs in Rust hard?</h2>

<p>In one word: Rooting. In a garbage collector, the objects “directly” in use on the stack are the “roots”, and you need to be able to identify them. Here, when I say “directly”, I mean “accessible without having to go through other GC’d objects”, so putting an object inside a <code class="language-plaintext highlighter-rouge">Vec&lt;T&gt;</code> does not make it stop being a root, but putting it inside some other GC’d object does.</p>

<p>Unfortunately, Rust doesn’t really have a concept of “directly on the stack”:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">Foo</span> <span class="p">{</span>
    <span class="n">bar</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="nb">Gc</span><span class="o">&lt;</span><span class="n">Bar</span><span class="o">&gt;&gt;</span>
<span class="p">}</span>
<span class="c1">// this is a root</span>
<span class="k">let</span> <span class="n">bar</span> <span class="o">=</span> <span class="nn">Gc</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">Bar</span><span class="p">::</span><span class="nf">new</span><span class="p">());</span>
<span class="c1">// this is also a root</span>
<span class="k">let</span> <span class="n">foo</span> <span class="o">=</span> <span class="nn">Gc</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">Foo</span><span class="p">::</span><span class="nf">new</span><span class="p">());</span>
<span class="c1">// bar should no longer be a root (but we can't detect that!)</span>
<span class="n">foo</span><span class="py">.bar</span> <span class="o">=</span> <span class="nf">Some</span><span class="p">(</span><span class="n">bar</span><span class="p">);</span>
<span class="c1">// but foo should still be a root here since it's not inside</span>
<span class="c1">// another GC'd object</span>
<span class="k">let</span> <span class="n">v</span> <span class="o">=</span> <span class="nd">vec!</span><span class="p">[</span><span class="n">foo</span><span class="p">];</span>
</code></pre></div></div>

<p>Rust’s ownership system actually makes it easier to have fewer roots since it’s relatively easy to state that taking <code class="language-plaintext highlighter-rouge">&amp;T</code> of a GC’d object doesn’t need to create a new root, and let Rust’s ownership system sort it out, but being able to distinguish between “directly owned” and “indirectly owned” is super tricky.</p>

<p>Another aspect of this is that garbage collection is really a moment of global mutation – the garbage collector reads through the heap and then deletes some of the objects there. This is a moment of the rug being pulled out under your feet. Rust’s entire design is predicated on such rug-pulling being <em>very very bad and not to be allowed</em>, so this can be a bit problematic. This isn’t as bad as it may initially sound because after all the rug-pulling is mostly just cleaning up unreachable objects, but it does crop up a couple times when fitting things together, especially around destructors and finalizers<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">4</a></sup>. Rooting would be far easier if, for example, you were able to declare areas of code where “no GC can happen”<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">5</a></sup> so you can tightly scope the rug-pulling and have to worry less about roots.</p>

<h3 id="destructors-and-finalizers">Destructors and finalizers</h3>

<p>It’s worth calling out destructors in particular. A huge problem with custom destructors on GCd types is that the custom destructor totally can stash itself away into a long-lived reference during garbage collection, leading to a dangling reference:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">LongLived</span> <span class="p">{</span>
    <span class="n">dangle</span><span class="p">:</span> <span class="n">RefCell</span><span class="o">&lt;</span><span class="nb">Option</span><span class="o">&lt;</span><span class="nb">Gc</span><span class="o">&lt;</span><span class="n">CantKillMe</span><span class="o">&gt;&gt;&gt;</span>
<span class="p">}</span>

<span class="k">struct</span> <span class="n">CantKillMe</span> <span class="p">{</span>
    <span class="c1">// set up to point to itself during construction</span>
    <span class="n">self_ref</span><span class="p">:</span> <span class="n">RefCell</span><span class="o">&lt;</span><span class="nb">Option</span><span class="o">&lt;</span><span class="nb">Gc</span><span class="o">&lt;</span><span class="n">CantKillMe</span><span class="o">&gt;&gt;&gt;</span>
    <span class="n">long_lived</span><span class="p">:</span> <span class="nb">Gc</span><span class="o">&lt;</span><span class="n">LongLived</span><span class="o">&gt;</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="nb">Drop</span> <span class="k">for</span> <span class="n">CantKillMe</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">drop</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// attach self to long_lived</span>
        <span class="o">*</span><span class="k">self</span><span class="py">.long_lived.dangle</span><span class="nf">.borrow_mut</span><span class="p">()</span> <span class="o">=</span> <span class="nf">Some</span><span class="p">(</span><span class="k">self</span><span class="py">.self_ref</span><span class="nf">.borrow</span><span class="p">()</span><span class="nf">.clone</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">());</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">let</span> <span class="n">long</span> <span class="o">=</span> <span class="nn">Gc</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">LongLived</span><span class="p">::</span><span class="nf">new</span><span class="p">());</span>
<span class="p">{</span>
    <span class="k">let</span> <span class="n">cant</span> <span class="o">=</span> <span class="nn">Gc</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">CantKillMe</span><span class="p">::</span><span class="nf">new</span><span class="p">());</span>
    <span class="o">*</span><span class="n">cant</span><span class="py">.self_ref</span><span class="nf">.borrow_mut</span><span class="p">()</span> <span class="o">=</span> <span class="nf">Some</span><span class="p">(</span><span class="n">cant</span><span class="nf">.clone</span><span class="p">());</span>
    <span class="c1">// cant goes out of scope, CantKillMe::drop is run</span>
    <span class="c1">// cant is attached to long_lived.dangle but still cleaned up</span>
<span class="p">}</span>

<span class="c1">// Dangling reference!</span>
<span class="k">let</span> <span class="n">dangling</span> <span class="o">=</span> <span class="n">long</span><span class="py">.dangle</span><span class="nf">.borrow</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
</code></pre></div></div>

<p>The most common  solution here is to disallow destructors on types that use <code class="language-plaintext highlighter-rouge">#[derive(Trace)]</code>, which can be done by having the custom derive generate a <code class="language-plaintext highlighter-rouge">Drop</code> implementation, or have it generate something which causes a conflicting type error.</p>

<p>You can additionally provide a <code class="language-plaintext highlighter-rouge">Finalize</code> trait that has different semantics: the GC calls it while cleaning up GC objects, but it may be called multiple times or not at all. This kind of thing is typical in GCs outside of Rust as well.</p>

<h2 id="how-would-you-even-garbage-collect-without-a-runtime">How would you even garbage collect without a runtime?</h2>

<p>In most garbage collected languages, there’s a runtime that controls all execution, knows about every variable in the program, and is able to pause execution to run the GC whenever it likes.</p>

<p>Rust has a minimal runtime and can’t do anything like this, especially not in a pluggable way your library can hook in to. For thread local GCs you basically have to write it such that GC operations (things like mutating a GC field; basically some subset of the APIs exposed by your GC library) are the only things that may trigger the garbage collector.</p>

<p>Concurrent GCs can trigger the GC on a separate thread but will typically need to pause other threads whenever these threads attempt to perform a GC operation that could potentially be invalidated by the running garbage collector.</p>

<p>While this may restrict the flexibility of the garbage collector itself, this is actually pretty good for us from the side of API design: the garbage collection phase can only happen in certain well-known moments of the code, which means we only need to make things safe across <em>those</em> boundaries. Many of the designs we shall look at build off of this observation.</p>

<h2 id="commonalities">Commonalities</h2>

<p>Before getting into the actual examples of GC design, I want to point out some commonalities of design between all of them, especially around how they do tracing:</p>

<h3 id="tracing">Tracing</h3>

<p>“Tracing” is the operation of traversing the graph of GC objects, starting from your roots and perusing their children, and their children’s children, and so on.</p>

<p>In Rust, the easiest way to implement this is via a <a href="https://doc.rust-lang.org/book/ch19-06-macros.html#how-to-write-a-custom-derive-macro">custom derive</a>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// unsafe to implement by hand since you can get it wrong</span>
<span class="k">unsafe</span> <span class="k">trait</span> <span class="n">Trace</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">trace</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">gc_context</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">GcContext</span><span class="p">);</span>
<span class="p">}</span>

<span class="nd">#[derive(Trace)]</span>
<span class="k">struct</span> <span class="n">Foo</span> <span class="p">{</span>
    <span class="n">vec</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">Gc</span><span class="o">&lt;</span><span class="n">Bar</span><span class="o">&gt;&gt;</span><span class="p">,</span>
    <span class="n">extra_thing</span><span class="p">:</span> <span class="nb">Gc</span><span class="o">&lt;</span><span class="n">Baz</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">just_a_string</span><span class="p">:</span> <span class="nb">String</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The custom derive of <code class="language-plaintext highlighter-rouge">Trace</code> basically just calls <code class="language-plaintext highlighter-rouge">trace()</code> on all the fields. <code class="language-plaintext highlighter-rouge">Vec</code>’s <code class="language-plaintext highlighter-rouge">Trace</code> implementation will be written to call <code class="language-plaintext highlighter-rouge">trace()</code> on all of its fields, and <code class="language-plaintext highlighter-rouge">String</code>’s <code class="language-plaintext highlighter-rouge">Trace</code> implementation will do nothing. <code class="language-plaintext highlighter-rouge">Gc&lt;T&gt;</code> will likely have a <code class="language-plaintext highlighter-rouge">trace()</code> that marks its reachability in the <code class="language-plaintext highlighter-rouge">GcContext</code>, or something similar.</p>

<p>This is a pretty standard pattern, and while the specifics of the <code class="language-plaintext highlighter-rouge">Trace</code> trait will typically vary, the general idea is roughly the same.</p>

<p>I’m not going to get into the actual details of how mark-and-sweep algorithms work in this post; there are a lot of potential designs for them and they’re not that interesting from the point of view of designing a safe GC <em>API</em> in Rust. However, the general idea is to keep a queue of found objects initially populated by the root, trace them to find new objects and queue them up if they’ve not already been traced. Clean up any objects that were <em>not</em> found.</p>

<h3 id="immutable-by-default">Immutable-by-default</h3>

<p>Another commonality between these designs is that a <code class="language-plaintext highlighter-rouge">Gc&lt;T&gt;</code> is always potentially shared, and thus will need tight control over mutability to satisfy Rust’s ownership invariants. This is typically achieved by using interior mutability, much like how <code class="language-plaintext highlighter-rouge">Rc&lt;T&gt;</code> is almost always paired with <code class="language-plaintext highlighter-rouge">RefCell&lt;T&gt;</code> for mutation, however some approaches (like that in <a href="https://github.com/asajeffrey/josephine">josephine</a>) do allow for mutability without runtime checking.</p>

<h3 id="threading">Threading</h3>

<p>Some GCs are single-threaded, and some are multi-threaded. The single threaded ones typically have a <code class="language-plaintext highlighter-rouge">Gc&lt;T&gt;</code> type that is not <code class="language-plaintext highlighter-rouge">Send</code>, so while you can set up multiple graphs of GC types on different threads, they’re essentially independent. Garbage collection only affects the thread it is being performed for, all other threads can continue unhindered.</p>

<p>Multithreaded GCs will have a <code class="language-plaintext highlighter-rouge">Send</code> <code class="language-plaintext highlighter-rouge">Gc&lt;T&gt;</code> type. Garbage collection will typically, but not always, block any thread which attempts to access data managed by the GC during that time. In some languages there are “stop the world” garbage collectors which block all threads at “safepoints” inserted by the compiler; Rust does not have the capability to insert such safepoints and blocking threads on GCs is done at the library level.</p>

<p>Most of the examples below are single-threaded, but their API design is not hard to extend towards a hypothetical multithreaded GC.</p>

<h2 id="rust-gc">rust-gc</h2>

<p>The <a href="https://docs.rs/gc/"><code class="language-plaintext highlighter-rouge">gc</code></a> crate is one I wrote with <a href="https://twitter.com/kneecaw/">Nika Layzell</a> mostly as a fun exercise, to figure out if a safe GC API is <em>possible</em>. I’ve <a href="https://manishearth.github.io/blog/2015/09/01/designing-a-gc-in-rust/">written about the design in depth before</a>, but the essence of the design is that it does something similar to reference counting to keep track of roots, and forces all GC mutations go through special <code class="language-plaintext highlighter-rouge">GcCell</code> types so that they can update the root count. Basically, a “root count” is updated whenever something becomes a root or stops being a root:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">Foo</span> <span class="p">{</span>
    <span class="n">bar</span><span class="p">:</span> <span class="n">GcCell</span><span class="o">&lt;</span><span class="nb">Option</span><span class="o">&lt;</span><span class="nb">Gc</span><span class="o">&lt;</span><span class="n">Bar</span><span class="o">&gt;&gt;&gt;</span>
<span class="p">}</span>
<span class="c1">// this is a root (root count = 1)</span>
<span class="k">let</span> <span class="n">bar</span> <span class="o">=</span> <span class="nn">Gc</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">Bar</span><span class="p">::</span><span class="nf">new</span><span class="p">());</span>
<span class="c1">// this is also a root (root count = 1)</span>
<span class="k">let</span> <span class="n">foo</span> <span class="o">=</span> <span class="nn">Gc</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">Foo</span><span class="p">::</span><span class="nf">new</span><span class="p">());</span>
<span class="c1">// .borrow_mut()'s RAII guard unroots bar (sets its root count to 0)</span>
<span class="o">*</span><span class="n">foo</span><span class="py">.bar</span><span class="nf">.borrow_mut</span><span class="p">()</span> <span class="o">=</span> <span class="nf">Some</span><span class="p">(</span><span class="n">bar</span><span class="p">);</span>
<span class="c1">// foo is still a root here, no call to .set()</span>
<span class="k">let</span> <span class="n">v</span> <span class="o">=</span> <span class="nd">vec!</span><span class="p">[</span><span class="n">foo</span><span class="p">];</span>

<span class="c1">// at destrucion time, foo's root count is set to 0</span>
</code></pre></div></div>

<p>The actual garbage collection phase will occur when certain GC operations are performed at a time when the heap is considered to have gotten reasonably large according to some heuristics.</p>

<p>While this is essentially “free” on reads, this is a fair amount of reference count traffic on any kind of write, which might not be desired; often the goal of using GCs is to <em>avoid</em> the performance characteristics of reference-counting-like patterns. Ultimately this is a hybrid approach that’s a mix of tracing and reference counting<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">6</a></sup>.</p>

<p><a href="https://docs.rs/gc/"><code class="language-plaintext highlighter-rouge">gc</code></a> is useful as a general-purpose GC if you just want a couple of things to participate in cycles without having to think about it too much. The general design can apply to a specialized GC integrating with another language runtime since it provides a clear way to keep track of roots; but it may not necessarily have the desired performance characteristics.</p>

<h2 id="servos-dom-integration">Servo’s DOM integration</h2>

<p><a href="https://github.com/servo/servo">Servo</a> is a browser engine in Rust that I used to work on full time. As mentioned earlier, browser engines typically implement a lot of their DOM types in native (i.e. Rust or C++, not JS) code, so for example <a href="https://doc.servo.org/script/dom/element/struct.Element.html"><code class="language-plaintext highlighter-rouge">Node</code></a> is a pure Rust object, and it <a href="https://doc.servo.org/script/dom/node/struct.Node.html#structfield.child_list">contains direct references to its children</a> so Rust code can do things like traverse the tree without having to go back and forth between JS and Rust.</p>

<p>Servo’s model is a little weird: roots are a <em>different type</em>, and lints enforce that unrooted heap references are never placed on the stack:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[dom_struct]</span> <span class="c1">// this is #[derive(JSTraceable)] plus some markers for lints</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">Node</span> <span class="p">{</span>
    <span class="c1">// the parent type, for inheritance</span>
    <span class="n">eventtarget</span><span class="p">:</span> <span class="n">EventTarget</span><span class="p">,</span>
    <span class="c1">// in the actual code this is a different helper type that combines</span>
    <span class="c1">// the RefCell, Option, and Dom, but i've simplified it to use</span>
    <span class="c1">// stdlib types for this example</span>
    <span class="n">prev_sibling</span><span class="p">:</span> <span class="n">RefCell</span><span class="o">&lt;</span><span class="nb">Option</span><span class="o">&lt;</span><span class="n">Dom</span><span class="o">&lt;</span><span class="n">Node</span><span class="o">&gt;&gt;&gt;</span><span class="p">,</span>
    <span class="n">next_sibling</span><span class="p">:</span> <span class="n">RefCell</span><span class="o">&lt;</span><span class="nb">Option</span><span class="o">&lt;</span><span class="n">Dom</span><span class="o">&lt;</span><span class="n">Node</span><span class="o">&gt;&gt;&gt;</span><span class="p">,</span>
    <span class="c1">// ...</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">Node</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">frob_next_sibling</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// fields can be accessed as borrows without any rooting</span>
        <span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">next</span><span class="p">)</span> <span class="o">=</span> <span class="k">self</span><span class="py">.next_sibling</span><span class="nf">.borrow</span><span class="p">()</span><span class="nf">.as_ref</span><span class="p">()</span> <span class="p">{</span>
            <span class="n">next</span><span class="nf">.frob</span><span class="p">();</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">get_next_sibling</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">DomRoot</span><span class="o">&lt;</span><span class="n">Node</span><span class="o">&gt;&gt;</span> <span class="p">{</span>
        <span class="c1">// but you need to root things for them to escape the borrow</span>
        <span class="c1">// .root() turns Dom&lt;T&gt; into DomRoot&lt;T&gt;</span>
        <span class="k">self</span><span class="py">.next_sibling</span><span class="nf">.borrow</span><span class="p">()</span><span class="nf">.as_ref</span><span class="p">()</span><span class="nf">.map</span><span class="p">(|</span><span class="n">x</span><span class="p">|</span> <span class="n">x</span><span class="nf">.root</span><span class="p">())</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">illegal</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// this line of code would get linted by a custom lint called unrooted_must_root</span>
        <span class="c1">// (which works somewhat similarly to the must_use stuff that Rust does)</span>
        <span class="k">let</span> <span class="n">ohno</span><span class="p">:</span> <span class="n">Dom</span><span class="o">&lt;</span><span class="n">Node</span><span class="o">&gt;</span> <span class="o">=</span> <span class="k">self</span><span class="py">.next_sibling</span><span class="nf">.borrow_mut</span><span class="p">()</span><span class="nf">.take</span><span class="p">();</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">Dom&lt;T&gt;</code> is basically a smart pointer that behaves like <code class="language-plaintext highlighter-rouge">&amp;T</code> but without a lifetime, whereas <code class="language-plaintext highlighter-rouge">DomRoot&lt;T&gt;</code> has the additional behavior of rooting on creation (and unrooting on <code class="language-plaintext highlighter-rouge">Drop</code>). The custom lint plugin essentially enforces that <code class="language-plaintext highlighter-rouge">Dom&lt;T&gt;</code>, and any DOM structs (tagged with <code class="language-plaintext highlighter-rouge">#[dom_struct]</code>) are never accessible on the stack aside from through <code class="language-plaintext highlighter-rouge">DomRoot&lt;T&gt;</code> or <code class="language-plaintext highlighter-rouge">&amp;T</code>.</p>

<p>I wouldn’t recommend this approach; it works okay but we’ve wanted to move off of it for a while because it relies on custom plugin lints for soundness. But it’s worth mentioning for completeness.</p>

<h2 id="josephine-servos-experimental-gc-plans">Josephine (Servo’s experimental GC plans)</h2>

<p>Given that Servo’s existing GC solution depends on plugging in to the compiler to do additional static analysis, we wanted something better. So <a href="https://github.com/asajeffrey/">Alan</a> designed <a href="https://github.com/asajeffrey/josephine">Josephine</a> (“JS affine”), which uses Rust’s affine types and borrowing in a cleaner way to provide a safe GC system.</p>

<p>Josephine is explicitly designed for Servo’s use case and as such does a lot of neat things around “compartments” and such that are probably irrelevant unless you specifically wish for your GC to integrate with a JS engine.</p>

<p>I mentioned earlier that the fact that the garbage collection phase can only happen in certain well-known moments of the code actually can make things easier for GC design, and Josephine is an example of this.</p>

<p>Josephine has a “JS context”, which is to be passed around everywhere and essentially represents the GC itself. When doing operations which may trigger a GC, you have to borrow the context mutably, whereas when accessing heap objects you need to borrow the context immutably. You can root heap objects to remove this requirement:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// cx is a `JSContext`, `node` is a `JSManaged&lt;'a, C, Node&gt;`</span>
<span class="c1">// assuming next_sibling and prev_sibling are not Options for simplicity</span>

<span class="c1">// borrows cx for `'b`</span>
<span class="k">let</span> <span class="n">next_sibling</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'b</span> <span class="n">Node</span> <span class="o">=</span> <span class="n">node</span><span class="py">.next_sibling</span><span class="nf">.borrow</span><span class="p">(</span><span class="n">cx</span><span class="p">);</span>
<span class="nd">println!</span><span class="p">(</span><span class="s">"Name: {:?}"</span><span class="p">,</span> <span class="n">next_sibling</span><span class="py">.name</span><span class="p">);</span>
<span class="c1">// illegal, because cx is immutably borrowed by next_sibling</span>
<span class="c1">// node.prev_sibling.borrow_mut(cx).frob();</span>

<span class="c1">// read from next_sibling to ensure it lives this long</span>
<span class="nd">println!</span><span class="p">(</span><span class="s">"{:?}"</span><span class="p">,</span> <span class="n">next_sibling</span><span class="py">.name</span><span class="p">);</span>

<span class="k">let</span> <span class="k">ref</span> <span class="k">mut</span> <span class="n">root</span> <span class="o">=</span> <span class="n">cx</span><span class="nf">.new_root</span><span class="p">();</span>
<span class="c1">// no longer needs to borrow cx, borrows root for 'root instead</span>
<span class="k">let</span> <span class="n">next_sibling</span><span class="p">:</span> <span class="n">JSManaged</span><span class="o">&lt;</span><span class="nv">'root</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">Node</span><span class="o">&gt;</span> <span class="o">=</span> <span class="n">node</span><span class="py">.next_sibling</span><span class="nf">.in_root</span><span class="p">(</span><span class="n">root</span><span class="p">);</span>
<span class="c1">// now it's fine, no outstanding borrows of `cx`</span>
<span class="n">node</span><span class="py">.prev_sibling</span><span class="nf">.borrow_mut</span><span class="p">(</span><span class="n">cx</span><span class="p">)</span><span class="nf">.frob</span><span class="p">();</span>

<span class="c1">// read from next_sibling to ensure it lives this long</span>
<span class="nd">println!</span><span class="p">(</span><span class="s">"{:?}"</span><span class="p">,</span> <span class="n">next_sibling</span><span class="py">.name</span><span class="p">);</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">new_root()</code> creates a new root, and <code class="language-plaintext highlighter-rouge">in_root</code> ties the lifetime of a JS managed type to the root instead of to the <code class="language-plaintext highlighter-rouge">JSContext</code> borrow, releasing the borrow of the <code class="language-plaintext highlighter-rouge">JSContext</code> and allowing it to be borrowed mutably in future <code class="language-plaintext highlighter-rouge">.borrow_mut()</code> calls.</p>

<p>Note that <code class="language-plaintext highlighter-rouge">.borrow()</code> and <code class="language-plaintext highlighter-rouge">.borrow_mut()</code> here do not have runtime borrow-checking cost despite their similarities to <code class="language-plaintext highlighter-rouge">RefCell::borrow()</code>, they instead are doing some lifetime juggling to make things safe. Creating roots typically does have runtime cost. Sometimes you <em>may</em> need to use <code class="language-plaintext highlighter-rouge">RefCell&lt;T&gt;</code> for the same reason it’s used in <code class="language-plaintext highlighter-rouge">Rc</code>, but mostly only for non-GCd fields.</p>

<p>Custom types are typically defined in two parts as so:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Copy,</span> <span class="nd">Clone,</span> <span class="nd">Debug,</span> <span class="nd">Eq,</span> <span class="nd">PartialEq,</span> <span class="nd">JSTraceable,</span> <span class="nd">JSLifetime,</span> <span class="nd">JSCompartmental)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">Element</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">C</span><span class="o">&gt;</span> <span class="p">(</span><span class="k">pub</span> <span class="n">JSManaged</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">NativeElement</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">C</span><span class="o">&gt;&gt;</span><span class="p">);</span>

<span class="nd">#[derive(JSTraceable,</span> <span class="nd">JSLifetime,</span> <span class="nd">JSCompartmental)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">NativeElement</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">C</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">name</span><span class="p">:</span> <span class="n">JSString</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">C</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">parent</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">Element</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">C</span><span class="o">&gt;&gt;</span><span class="p">,</span>
    <span class="n">children</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Element</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">C</span><span class="o">&gt;&gt;</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>where <code class="language-plaintext highlighter-rouge">Element&lt;'a&gt;</code> is a convenient copyable reference that is to be used inside other GC types, and <code class="language-plaintext highlighter-rouge">NativeElement&lt;'a&gt;</code> is its backing storage. The <code class="language-plaintext highlighter-rouge">C</code> parameter has to do with compartments and can be ignored for now.</p>

<p>A neat thing worth pointing out is that there’s no runtime borrow checking necessary for manipulating other GC references, even though roots let you hold multiple references to the same object!</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">parent_root</span> <span class="o">=</span> <span class="n">cx</span><span class="nf">.new_root</span><span class="p">();</span>
<span class="k">let</span> <span class="n">parent</span> <span class="o">=</span> <span class="n">element</span><span class="nf">.borrow</span><span class="p">(</span><span class="n">cx</span><span class="p">)</span><span class="py">.parent</span><span class="nf">.in_root</span><span class="p">(</span><span class="n">parent_root</span><span class="p">);</span>
<span class="k">let</span> <span class="k">ref</span> <span class="k">mut</span> <span class="n">child_root</span> <span class="o">=</span> <span class="n">cx</span><span class="nf">.new_root</span><span class="p">();</span>

<span class="c1">// could potentially be a second reference to `element` if it was</span>
<span class="c1">// the first child</span>
<span class="k">let</span> <span class="n">first_child</span> <span class="o">=</span> <span class="n">parent</span><span class="py">.children</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="nf">.in_root</span><span class="p">(</span><span class="n">child_root</span><span class="p">);</span>

<span class="c1">// this is okay, even though we hold a reference to `parent`</span>
<span class="c1">// via element.parent, because we have rooted that reference so it's</span>
<span class="c1">// now independent of whether `element.parent` changes!</span>
<span class="n">first_child</span><span class="nf">.borrow_mut</span><span class="p">(</span><span class="n">cx</span><span class="p">)</span><span class="py">.parent</span> <span class="o">=</span> <span class="nb">None</span><span class="p">;</span>
</code></pre></div></div>

<p>Essentially, when mutating a field, you have to obtain mutable access to the context, so there will not be any references to the field itself still around (e.g. <code class="language-plaintext highlighter-rouge">element.borrow(cx).parent</code>), only to the GC’d data within it, so you can change what a field references without invalidating other references to the <em>contents</em> of what the field references. This is a pretty cool trick that enables GC <em>without runtime-checked interior mutability</em>, which is relatively rare in such designs.</p>

<h2 id="unfinished-design-for-a-builtin-rust-gc">Unfinished design for a builtin Rust GC</h2>

<p>For a while a couple of us worked on a way to make Rust <em>itself</em> extensible with a pluggable GC, using LLVM stack map support for finding roots. After all, if we know which types are GC-ish, we can include metadata on how to find roots for each function, similar to how Rust functions currently contain unwinding hooks to enable cleanly running destructors during a panic.</p>

<p>We never got around to figuring out a <em>complete</em> design, but you can find more information on what we figured out in <a href="https://manishearth.github.io/blog/2016/08/18/gc-support-in-rust-api-design/">my</a> and <a href="http://blog.pnkfx.org/blog/categories/gc/">Felix’s</a> posts on this subject. Essentially, it involved a <code class="language-plaintext highlighter-rouge">Trace</code> trait with more generic <code class="language-plaintext highlighter-rouge">trace</code> methods, an auto-implemented <code class="language-plaintext highlighter-rouge">Root</code> trait that works similar to <code class="language-plaintext highlighter-rouge">Send</code>, and compiler machinery to keep track of which <code class="language-plaintext highlighter-rouge">Root</code> types are on the stack.</p>

<p>This is probably not too useful for people attempting to implement a GC, but I’m mentioning it for completeness’ sake.</p>

<p>Note that pre-1.0 Rust did have a builtin GC (<code class="language-plaintext highlighter-rouge">@T</code>, known as “managed pointers”), but IIRC in practice the cycle-management parts were not ever implemented so it behaved exactly like <code class="language-plaintext highlighter-rouge">Rc&lt;T&gt;</code>. I believe it was intended to have a cycle collector (I’ll talk more about that in the next section).</p>

<h2 id="bacon-rajan-cc-and-cycle-collectors-in-general">bacon-rajan-cc (and cycle collectors in general)</h2>

<p><a href="https://fitzgeraldnick.com/">Nick Fitzgerald</a> wrote <a href="https://github.com/fitzgen/bacon-rajan-cc"><code class="language-plaintext highlighter-rouge">bacon-rajan-cc</code></a> to implement _<a href="https://researcher.watson.ibm.com/researcher/files/us-bacon/Bacon01Concurrent.pdf">“Concurrent Cycle Collection in Reference Counted Systems”</a>__ by David F. Bacon and V.T. Rajan.</p>

<p>This is what is colloquially called a <em>cycle collector</em>; a kind of garbage collector which is essentially “what if we took <code class="language-plaintext highlighter-rouge">Rc&lt;T&gt;</code> but made it detect cycles”. Some people do not consider these to be <em>tracing</em> garbage collectors, but they have a lot of similar characteristics (and they do still “trace” through types). They’re often categorized as “hybrid” approaches, much like <a href="https://docs.rs/gc/"><code class="language-plaintext highlighter-rouge">gc</code></a>.</p>

<p>The idea is that you don’t actually need to <em>know</em> what the roots are if you’re maintaining reference counts: if a heap object has a reference count that is more than the number of heap objects referencing it, it must be a root. In practice it’s pretty inefficient to traverse the entire heap, so optimizations are applied, often by applying different “colors” to nodes, and by only looking at the set of objects that have recently have their reference counts decremented.</p>

<p>A crucial observation here is that if you <em>only focus on potential garbage</em>, you can shift your definition of “root” a bit, when looking for cycles you don’t need to look for references from the stack, you can be satisfied with references from <em>any part of the heap you know for a fact is reachable from things which are not potential garbage</em>.</p>

<p>A neat property of cycle collectors is while mark and sweep tracing GCs have their performance scale by the size of the heap as a whole, cycle collectors scale by the size of <em>the actual garbage you have</em> <sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">7</a></sup>. There are of course other tradeoffs:  deallocation is often cheaper or “free” in tracing GCs (amortizing those costs by doing it during the sweep phase) whereas cycle collectors have the constant allocator traffic involved in cleaning up objects when refcounts reach zero.</p>

<p>The way <a href="https://github.com/fitzgen/bacon-rajan-cc">bacon-rajan-cc</a> works is that every time a reference count is decremented, the object is added to a list of “potential cycle roots”, unless the reference count is decremented to 0 (in which case the object is immediately cleaned up, just like <code class="language-plaintext highlighter-rouge">Rc</code>). It then traces through this list; decrementing refcounts for every reference it follows, and cleaning up any elements that reach refcount 0. It then traverses this list <em>again</em> and reincrements refcounts for each reference it follows, to restore the original refcount. This basically treats any element not reachable from this “potential cycle root” list as “not garbage”, and doesn’t bother to visit it.</p>

<p>Cycle collectors require tighter control over the garbage collection algorithm, and have differing performance characteristics, so they may not necessarily be suitable for all use cases for GC integration in Rust, but it’s definitely worth considering!</p>

<h2 id="cell-gc">cell-gc</h2>

<p><a href="https://twitter.com/jorendorff/">Jason Orendorff</a>’s <a href="https://github.com/jorendorff/cell-gc">cell-gc</a> crate is interesting, it has a concept of “heap sessions”. Here’s a modified example from the readme:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">cell_gc</span><span class="p">::</span><span class="n">Heap</span><span class="p">;</span>

<span class="c1">// implements IntoHeap, and also generates an IntListRef type and accessors</span>
<span class="nd">#[derive(cell_gc_derive::IntoHeap)]</span>
<span class="k">struct</span> <span class="n">IntList</span><span class="o">&lt;</span><span class="nv">'h</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">head</span><span class="p">:</span> <span class="nb">i64</span><span class="p">,</span>
    <span class="n">tail</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">IntListRef</span><span class="o">&lt;</span><span class="nv">'h</span><span class="o">&gt;&gt;</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// Create a heap (you'll only do this once in your whole program)</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">heap</span> <span class="o">=</span> <span class="nn">Heap</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>

    <span class="n">heap</span><span class="nf">.enter</span><span class="p">(|</span><span class="n">hs</span><span class="p">|</span> <span class="p">{</span>
        <span class="c1">// Allocate an object (returns an IntListRef)</span>
        <span class="k">let</span> <span class="n">obj1</span> <span class="o">=</span> <span class="n">hs</span><span class="nf">.alloc</span><span class="p">(</span><span class="n">IntList</span> <span class="p">{</span> <span class="n">head</span><span class="p">:</span> <span class="mi">17</span><span class="p">,</span> <span class="n">tail</span><span class="p">:</span> <span class="nb">None</span> <span class="p">});</span>
        <span class="nd">assert_eq!</span><span class="p">(</span><span class="n">obj1</span><span class="nf">.head</span><span class="p">(),</span> <span class="mi">17</span><span class="p">);</span>
        <span class="nd">assert_eq!</span><span class="p">(</span><span class="n">obj1</span><span class="nf">.tail</span><span class="p">(),</span> <span class="nb">None</span><span class="p">);</span>

        <span class="c1">// Allocate another object</span>
        <span class="k">let</span> <span class="n">obj2</span> <span class="o">=</span> <span class="n">hs</span><span class="nf">.alloc</span><span class="p">(</span><span class="n">IntList</span> <span class="p">{</span> <span class="n">head</span><span class="p">:</span> <span class="mi">33</span><span class="p">,</span> <span class="n">tail</span><span class="p">:</span> <span class="nf">Some</span><span class="p">(</span><span class="n">obj1</span><span class="p">)</span> <span class="p">});</span>
        <span class="nd">assert_eq!</span><span class="p">(</span><span class="n">obj2</span><span class="nf">.head</span><span class="p">(),</span> <span class="mi">33</span><span class="p">);</span>
        <span class="nd">assert_eq!</span><span class="p">(</span><span class="n">obj2</span><span class="nf">.tail</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span><span class="nf">.head</span><span class="p">(),</span> <span class="mi">17</span><span class="p">);</span>

        <span class="c1">// mutate `tail`</span>
        <span class="n">obj2</span><span class="nf">.set_tail</span><span class="p">(</span><span class="nb">None</span><span class="p">);</span>
    <span class="p">});</span>
<span class="p">}</span>
</code></pre></div></div>

<p>All mutation goes through autogenerated accessors, so the crate has a little more control over traffic through the GC. These accessors help track roots via a scheme similar to what <a href="https://docs.rs/gc/"><code class="language-plaintext highlighter-rouge">gc</code></a> does; where there’s an <code class="language-plaintext highlighter-rouge">IntoHeap</code> trait used for modifying root refcounts when a reference is put into and taken out of the heap via accessors.</p>

<p>Heap sessions allow for the heap to moved around, even sent to other threads, and their lifetime prevents heap objects from being mixed between sessions. This uses a concept called <em>generativity</em>; you can read more about generativity in <em><a href="https://raw.githubusercontent.com/Gankra/thesis/master/thesis.pdf">“You Can’t Spell Trust Without Rust”</a></em> ch 6.3, by <a href="https://github.com/Gankra">Aria Beingessner</a>, or by looking at the <a href="https://github.com/bluss/indexing"><code class="language-plaintext highlighter-rouge">indexing</code></a> crate.</p>

<h2 id="interlude-the-similarities-between-async-and-gcs">Interlude: The similarities between <code class="language-plaintext highlighter-rouge">async</code> and GCs</h2>

<p>The next two examples use machinery from Rust’s <code class="language-plaintext highlighter-rouge">async</code> functionality despite having nothing to do with async I/O, and I think it’s important to talk about why that should make sense. I’ve <a href="https://twitter.com/ManishEarth/status/1073651552768819200">tweeted about this before</a>: I and <a href="https://github.com/kyren">Catherine West</a> figured this out when we were talking about <a href="https://github.com/kyren/gc-arena">her GC idea</a> based on <code class="language-plaintext highlighter-rouge">async</code>.</p>

<p>You can see some of this correspondence in Go: Go is a language that has both garbage collection and async I/O, and both of these use the same “safepoints” for yielding to the garbage collector or the scheduler. In Go, the compiler needs to automatically insert code that checks the “pulse” of the heap every now and then, and potentially runs garbage collection. It also needs to automatically insert code that can tell the scheduler “hey now is a safe time to interrupt me if a different goroutine wishes to run”. These are very similar in principle – they’re both essentially places where the compiler is inserting “it is okay to interrupt me now” checks, sometimes called “interruption points” or “yield points”.</p>

<p>Now, Rust’s compiler does not automatically insert interruption points. However, the design of <code class="language-plaintext highlighter-rouge">async</code> in Rust is essentially a way of adding <em>explicit</em> interruption points to Rust. <code class="language-plaintext highlighter-rouge">foo().await</code> in Rust is a way of running <code class="language-plaintext highlighter-rouge">foo()</code> and expecting that the scheduler <em>may</em> interrupt the code in between. The design of <a href="https://doc.rust-lang.org/nightly/std/future/trait.Future.html"><code class="language-plaintext highlighter-rouge">Future</code></a> and <a href="https://doc.rust-lang.org/nightly/std/pin/struct.Pin.html"><code class="language-plaintext highlighter-rouge">Pin&lt;P&gt;</code></a> come out of making this safe and pleasant to work with.</p>

<p>As we shall see, this same machinery can be used for creating safe interruption points for GCs in Rust.</p>

<h2 id="shifgrethor">Shifgrethor</h2>

<p><a href="https://github.com/withoutboats/shifgrethor">shifgrethor</a> is an experiment by <a href="https://github.com/withoutboats/">Saoirse</a> to try and build a GC that uses <a href="https://doc.rust-lang.org/nightly/std/pin/struct.Pin.html"><code class="language-plaintext highlighter-rouge">Pin&lt;P&gt;</code></a> for managing roots. They’ve written extensively on the design of <a href="https://github.com/withoutboats/shifgrethor">shifgrethor</a> <a href="https://without.boats/tags/shifgrethor/">on their blog</a>. In particular, the <a href="https://without.boats/blog/shifgrethor-iii/">post on rooting</a> goes through how rooting works.</p>

<p>The basic design is that there’s a <code class="language-plaintext highlighter-rouge">Root&lt;'root&gt;</code> type that contains a <code class="language-plaintext highlighter-rouge">Pin&lt;P&gt;</code>, which can be <em>immovably</em> tied to a stack frame using the same idea behind <code class="language-plaintext highlighter-rouge">pin-utils</code>’ <a href="https://docs.rs/pin-utils/0.1.0/pin_utils/macro.pin_mut.html"><code class="language-plaintext highlighter-rouge">pin_mut!()</code> macro</a>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">letroot!</span><span class="p">(</span><span class="n">root</span><span class="p">);</span>
<span class="k">let</span> <span class="n">gc</span><span class="p">:</span> <span class="nb">Gc</span><span class="o">&lt;</span><span class="nv">'root</span><span class="p">,</span> <span class="n">Foo</span><span class="o">&gt;</span> <span class="o">=</span> <span class="n">root</span><span class="nf">.gc</span><span class="p">(</span><span class="nn">Foo</span><span class="p">::</span><span class="nf">new</span><span class="p">());</span>
</code></pre></div></div>

<p>The fact that <code class="language-plaintext highlighter-rouge">root</code> is immovable allows for it to be treated as a true marker for the <em>stack frame</em> over anything else. The list of rooted types can be neatly stored in an ordered stack-like vector in the GC implementation, popping when individual roots go out of scope.</p>

<p>If you wish to return a rooted object from a function, the function needs to accept a <code class="language-plaintext highlighter-rouge">Root&lt;'root&gt;</code>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">new</span><span class="o">&lt;</span><span class="nv">'root</span><span class="o">&gt;</span><span class="p">(</span><span class="n">root</span><span class="p">:</span> <span class="n">Root</span><span class="o">&lt;</span><span class="nv">'root</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Gc</span><span class="o">&lt;</span><span class="nv">'root</span><span class="p">,</span> <span class="k">Self</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">root</span><span class="nf">.gc</span><span class="p">(</span><span class="k">Self</span> <span class="p">{</span>
        <span class="c1">// ...</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>All GC’d types have a <code class="language-plaintext highlighter-rouge">'root</code> lifetime of the root they trace back to, and are declared with a custom derive:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(GC)]</span>
<span class="k">struct</span> <span class="n">Foo</span><span class="o">&lt;</span><span class="nv">'root</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="nd">#[gc]</span> <span class="n">bar</span><span class="p">:</span> <span class="n">GcStore</span><span class="o">&lt;</span><span class="nv">'root</span><span class="p">,</span> <span class="n">Bar</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">GcStore</code> is a way to have fields use the rooting of their parent. Normally, if you wanted to put <code class="language-plaintext highlighter-rouge">Gc&lt;'root2, Bar&lt;'root2&gt;&gt;</code> inside <code class="language-plaintext highlighter-rouge">Foo&lt;'root1&gt;</code> you would not be able to because the lifetimes derive from different roots. <code class="language-plaintext highlighter-rouge">GcStore</code>, along with autogenerated accessors from <code class="language-plaintext highlighter-rouge">#[derive(GC)]</code>, will set <code class="language-plaintext highlighter-rouge">Bar</code>’s lifetime to be the same as <code class="language-plaintext highlighter-rouge">Foo</code> when you attempt to stick it inside <code class="language-plaintext highlighter-rouge">Foo</code>.</p>

<p>This design is somewhat similar to that of Servo where there’s a pair of types, one that lets us refer to GC types on the stack, and one that lets GC types refer to each other on the heap, but it uses <code class="language-plaintext highlighter-rouge">Pin&lt;P&gt;</code> instead of a lint to enforce this safely, which is way nicer. <code class="language-plaintext highlighter-rouge">Root&lt;'root&gt;</code> and <code class="language-plaintext highlighter-rouge">GcStore</code> do a bunch of lifetime tweaking that’s reminiscent of Josephine’s rooting system, however there’s no need for an <code class="language-plaintext highlighter-rouge">&amp;mut JsContext</code> type that needs to be passed around everywhere.</p>

<h2 id="gc-arena">gc-arena</h2>

<p><a href="https://github.com/kyren/gc-arena"><code class="language-plaintext highlighter-rouge">gc-arena</code></a> is <a href="https://github.com/kyren">Catherine West</a>’s experimental GC design for her Lua VM, <a href="https://github.com/kyren/luster"><code class="language-plaintext highlighter-rouge">luster</code></a>.</p>

<p>The <code class="language-plaintext highlighter-rouge">gc-arena</code> crate forces all GC-manipulating code to go within <code class="language-plaintext highlighter-rouge">arena.mutate()</code> calls, between which garbage collection may occur.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Collect)]</span>
<span class="nd">#[collect(no_drop)]</span>
<span class="k">struct</span> <span class="n">TestRoot</span><span class="o">&lt;</span><span class="nv">'gc</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">number</span><span class="p">:</span> <span class="nb">Gc</span><span class="o">&lt;</span><span class="nv">'gc</span><span class="p">,</span> <span class="nb">i32</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">many_numbers</span><span class="p">:</span> <span class="n">GcCell</span><span class="o">&lt;</span><span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">Gc</span><span class="o">&lt;</span><span class="nv">'gc</span><span class="p">,</span> <span class="nb">i32</span><span class="o">&gt;&gt;&gt;</span><span class="p">,</span>
<span class="p">}</span>

<span class="nd">make_arena!</span><span class="p">(</span><span class="n">TestArena</span><span class="p">,</span> <span class="n">TestRoot</span><span class="p">);</span>

<span class="k">let</span> <span class="k">mut</span> <span class="n">arena</span> <span class="o">=</span> <span class="nn">TestArena</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">ArenaParameters</span><span class="p">::</span><span class="nf">default</span><span class="p">(),</span> <span class="p">|</span><span class="n">mc</span><span class="p">|</span> <span class="n">TestRoot</span> <span class="p">{</span>
    <span class="n">number</span><span class="p">:</span> <span class="nn">Gc</span><span class="p">::</span><span class="nf">allocate</span><span class="p">(</span><span class="n">mc</span><span class="p">,</span> <span class="mi">42</span><span class="p">),</span>
    <span class="n">many_numbers</span><span class="p">:</span> <span class="nn">GcCell</span><span class="p">::</span><span class="nf">allocate</span><span class="p">(</span><span class="n">mc</span><span class="p">,</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">()),</span>
<span class="p">});</span>

<span class="n">arena</span><span class="nf">.mutate</span><span class="p">(|</span><span class="n">_mc</span><span class="p">,</span> <span class="n">root</span><span class="p">|</span> <span class="p">{</span>
    <span class="nd">assert_eq!</span><span class="p">(</span><span class="o">*</span><span class="p">((</span><span class="o">*</span><span class="n">root</span><span class="p">)</span><span class="py">.number</span><span class="p">),</span> <span class="mi">42</span><span class="p">);</span>
    <span class="n">root</span><span class="py">.numbers</span><span class="nf">.write</span><span class="p">(</span><span class="n">mc</span><span class="p">)</span><span class="nf">.push</span><span class="p">(</span><span class="nn">Gc</span><span class="p">::</span><span class="nf">allocate</span><span class="p">(</span><span class="n">mc</span><span class="p">,</span> <span class="mi">0</span><span class="p">));</span>
<span class="p">});</span>
</code></pre></div></div>

<p>Mutation is done with <code class="language-plaintext highlighter-rouge">GcCell</code>, basically a fancier version of <code class="language-plaintext highlighter-rouge">Gc&lt;RefCell&lt;T&gt;&gt;</code>. All GC operations require a <code class="language-plaintext highlighter-rouge">MutationContext</code> (<code class="language-plaintext highlighter-rouge">mc</code>), which is only available within <code class="language-plaintext highlighter-rouge">arena.mutate()</code>.</p>

<p>Only the arena root may survive between <code class="language-plaintext highlighter-rouge">mutate()</code> calls, and garbage collection does not happen during <code class="language-plaintext highlighter-rouge">.mutate()</code>, so rooting is easy – just follow the arena root. This crate allows for multiple GCs to coexist with separate heaps, and, similarly to <a href="https://github.com/jorendorff/cell-gc">cell-gc</a>, it uses generativity to enforce that the heaps do not get mixed.</p>

<p>So far this is mostly like other arena-based systems, but with a GC.</p>

<p>The <em>really cool</em> part of the design is the <code class="language-plaintext highlighter-rouge">gc-sequence</code> crate, which essentially builds a <code class="language-plaintext highlighter-rouge">Future</code>-like API (using a <code class="language-plaintext highlighter-rouge">Sequence</code> trait) on top of <code class="language-plaintext highlighter-rouge">gc-arena</code> that can potentially make this very pleasant to use. Here’s a modified example from a test:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Collect)]</span>
<span class="nd">#[collect(no_drop)]</span>
<span class="k">struct</span> <span class="n">TestRoot</span><span class="o">&lt;</span><span class="nv">'gc</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">test</span><span class="p">:</span> <span class="nb">Gc</span><span class="o">&lt;</span><span class="nv">'gc</span><span class="p">,</span> <span class="nb">i32</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>

<span class="nd">make_sequencable_arena!</span><span class="p">(</span><span class="n">test_sequencer</span><span class="p">,</span> <span class="n">TestRoot</span><span class="p">);</span>
<span class="k">use</span> <span class="nn">test_sequencer</span><span class="p">::</span><span class="n">Arena</span> <span class="k">as</span> <span class="n">TestArena</span><span class="p">;</span>

<span class="k">let</span> <span class="n">arena</span> <span class="o">=</span> <span class="nn">TestArena</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">ArenaParameters</span><span class="p">::</span><span class="nf">default</span><span class="p">(),</span> <span class="p">|</span><span class="n">mc</span><span class="p">|</span> <span class="n">TestRoot</span> <span class="p">{</span>
    <span class="n">test</span><span class="p">:</span> <span class="nn">Gc</span><span class="p">::</span><span class="nf">allocate</span><span class="p">(</span><span class="n">mc</span><span class="p">,</span> <span class="mi">42</span><span class="p">),</span>
<span class="p">});</span>

<span class="k">let</span> <span class="k">mut</span> <span class="n">sequence</span> <span class="o">=</span> <span class="n">arena</span><span class="nf">.sequence</span><span class="p">(|</span><span class="n">root</span><span class="p">|</span> <span class="p">{</span>
    <span class="nn">sequence</span><span class="p">::</span><span class="nf">from_fn_with</span><span class="p">(</span><span class="n">root</span><span class="py">.test</span><span class="p">,</span> <span class="p">|</span><span class="n">_</span><span class="p">,</span> <span class="n">test</span><span class="p">|</span> <span class="p">{</span>
        <span class="k">if</span> <span class="o">*</span><span class="n">test</span> <span class="o">==</span> <span class="mi">42</span> <span class="p">{</span>
            <span class="nf">Ok</span><span class="p">(</span><span class="o">*</span><span class="n">test</span> <span class="o">+</span> <span class="mi">10</span><span class="p">)</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
            <span class="nf">Err</span><span class="p">(</span><span class="s">"will not be generated"</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">})</span>
    <span class="nf">.and_then</span><span class="p">(|</span><span class="n">_</span><span class="p">,</span> <span class="n">r</span><span class="p">|</span> <span class="nf">Ok</span><span class="p">(</span><span class="n">r</span> <span class="o">+</span> <span class="mi">12</span><span class="p">))</span>
    <span class="nf">.and_chain</span><span class="p">(|</span><span class="n">_</span><span class="p">,</span> <span class="n">r</span><span class="p">|</span> <span class="nf">Ok</span><span class="p">(</span><span class="nn">sequence</span><span class="p">::</span><span class="nf">ok</span><span class="p">(</span><span class="n">r</span> <span class="o">-</span> <span class="mi">10</span><span class="p">)))</span>
    <span class="nf">.then</span><span class="p">(|</span><span class="n">_</span><span class="p">,</span> <span class="n">res</span><span class="p">|</span> <span class="n">res</span><span class="nf">.expect</span><span class="p">(</span><span class="s">"should not be error"</span><span class="p">))</span>
    <span class="nf">.chain</span><span class="p">(|</span><span class="n">_</span><span class="p">,</span> <span class="n">r</span><span class="p">|</span> <span class="nn">sequence</span><span class="p">::</span><span class="nf">done</span><span class="p">(</span><span class="n">r</span> <span class="o">+</span> <span class="mi">10</span><span class="p">))</span>
    <span class="nf">.map</span><span class="p">(|</span><span class="n">r</span><span class="p">|</span> <span class="nn">sequence</span><span class="p">::</span><span class="nf">done</span><span class="p">(</span><span class="n">r</span> <span class="o">-</span> <span class="mi">60</span><span class="p">))</span>
    <span class="nf">.flatten</span><span class="p">()</span>
    <span class="nf">.boxed</span><span class="p">()</span>
<span class="p">});</span>

<span class="k">loop</span> <span class="p">{</span>
    <span class="k">match</span> <span class="n">sequence</span><span class="nf">.step</span><span class="p">()</span> <span class="p">{</span>
        <span class="nf">Ok</span><span class="p">((</span><span class="n">_</span><span class="p">,</span> <span class="n">output</span><span class="p">))</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="nd">assert_eq!</span><span class="p">(</span><span class="n">output</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
            <span class="k">return</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="nf">Err</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="n">sequence</span> <span class="o">=</span> <span class="n">s</span><span class="p">,</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This is <em>very</em> similar to chained callback futures code; and if it could use the <code class="language-plaintext highlighter-rouge">Future</code> trait would be able to make use of <code class="language-plaintext highlighter-rouge">async</code> to convert this callback heavy code into sequential code with interrupt points using <code class="language-plaintext highlighter-rouge">await</code>. There were design constraints making <code class="language-plaintext highlighter-rouge">Future</code> not workable for this use case, though if Rust ever gets generators this would work well, and it’s quite possible that another GC with a similar design could be written, using <code class="language-plaintext highlighter-rouge">async</code>/<code class="language-plaintext highlighter-rouge">await</code> and <code class="language-plaintext highlighter-rouge">Future</code>.</p>

<p>Essentially, this paints a picture of an entire space of Rust GC design where GC mutations are performed using <code class="language-plaintext highlighter-rouge">await</code> (or <code class="language-plaintext highlighter-rouge">yield</code> if we ever get generators), and garbage collection can occur during those yield points, in a way that’s highly reminiscent of Go’s design.</p>

<h2 id="moving-forward">Moving forward</h2>

<p>As is hopefully obvious, the space of safe GC design in Rust is quite rich and has a lot of interesting ideas. I’m really excited to see what folks come up with here!</p>

<p>If you’re interested in reading more about GCs in general, <em><a href="https://courses.cs.washington.edu/courses/cse590p/05au/p50-bacon.pdf">“A Unified Theory of Garbage Collection”</a></em> by Bacon et al and the <a href="http://gchandbook.org/">GC Handbook</a> are great reads.</p>

<p><em>Thanks to <a href="https://mermaid.industries/">Andi McClure</a>, <a href="https://twitter.com/jorendorff/">Jason Orendorff</a>, <a href="https://fitzgeraldnick.com/">Nick Fitzgerald</a>, and <a href="https://twitter.com/kneecaw/">Nika Layzell</a> for providing feedback on drafts of this blog post</em></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:0" role="doc-endnote">
      <p>I’m also going to completely ignore the field of <em>conservative</em> stack-scanning tracing GCs where you figure out your roots by looking at all the stack memory and considering anything with a remotely heap-object-like bit pattern to be a root. These are interesting, but can’t really be made 100% safe in the way Rust wants them to be unless you scan the heap as well. <a href="#fnref:0" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:1" role="doc-endnote">
      <p>Which currently does not have support for concurrent garbage collection, but it could be added. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>Some JNI-using APIs are also forced to have <a href="https://developer.android.com/ndk/reference/group/bitmap#androidbitmap_lockpixels">explicit rooting APIs</a> to give access to things like raw buffers. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>In general, finalizers in GCs are hard to implement soundly in any language, not just Rust, but Rust can sometimes be a bit more annoying about it. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>Spolier: This is actually possible in Rust, and we’ll get into it further in this post! <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p>Such hybrid approaches are common in high performance GCs; <em><a href="https://courses.cs.washington.edu/courses/cse590p/05au/p50-bacon.pdf">“A Unified Theory of Garbage Collection”</a></em> by Bacon et al. covers a lot of the breadth of these approaches. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p>Firefox’s DOM actually uses a mark &amp; sweep tracing GC <em>mixed with</em> a cycle collector for this reason. The DOM types themselves are cycle collected, but JavaScript objects are managed by the Spidermonkey GC. Since some DOM types may contain references to arbitrary JS types (e.g. ones that store callbacks) there’s a fair amount of work required to break cycles manually in some cases, but it has performance benefits since the vast majority of DOM objects either never become garbage or become garbage by having a couple non-cycle-participating references get released. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Down a Rusty Rabbit Hole]]></title>
    <link href="http://manishearth.github.io/blog/2018/04/12/down-a-rusty-rabbit-hole/"/>
    <updated>2018-04-12T00:00:00+00:00</updated>
    <id>http://manishearth.github.io/blog/2018/04/12/down-a-rusty-rabbit-hole</id>
    <content type="html"><![CDATA[<p>Last week I fell down a rather interesting rabbit hole in Rust, which was basically
me discovering a series of quirks of the Rust compiler/language, each one leading to the
next when I asked “why?”.</p>

<p>It started when someone asked why autogenerated <code class="language-plaintext highlighter-rouge">Debug</code> impls use argument names like <code class="language-plaintext highlighter-rouge">__arg_0</code>
which start with a double underscore.</p>

<p>This happened to be <a href="https://github.com/rust-lang/rust/pull/32294">my fault</a>. The reason <a href="https://github.com/rust-lang/rust/pull/32251#issuecomment-197481726">we used a double underscore</a> was that
while a single underscore tells rustc not to warn about a possibly-unused variable, there’s an off-
by-default clippy lint that warns about variables that start with a single underscore that are used,
which can be silenced with a double underscore. Now, the correct fix here is to make the lint ignore
derive/macros (which I believe we did as well), but at the time we needed to add an underscore
anyway so a double underscore didn’t seem worse.</p>

<p>Except of course, this double underscore appears in the docs. Oops.</p>

<p>Ideally the rustc derive infrastructure would have a way of specifying the argument name to use so
that we can at least have descriptive things here, but that’s a bit more work (I’m willing to mentor
this work though!). So I thought I’d fix this by at least removing the double underscore, and making
the unused lint ignore <code class="language-plaintext highlighter-rouge">#[derive()]</code> output.</p>

<p>While going through the code to look for underscores I also discovered a hygiene issue. The following code
throws a bunch of very weird type errors:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">const</span> <span class="n">__cmp</span><span class="p">:</span> <span class="nb">u8</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>

<span class="nd">#[derive(PartialOrd,</span> <span class="nd">PartialEq)]</span>
<span class="k">pub</span> <span class="k">enum</span> <span class="n">Foo</span> <span class="p">{</span>
    <span class="nf">A</span><span class="p">(</span><span class="nb">u8</span><span class="p">),</span> <span class="nf">B</span><span class="p">(</span><span class="nb">u8</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>(<a href="https://play.rust-lang.org/?gist=2352b6a2192f38caba70bc2b1fa889e7&amp;version=stable">playpen</a>)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error[E0308]: mismatched types
 --&gt; src/main.rs:6:7
  |
6 |     A(u8), B(u8)
  |       ^^^ expected enum `std::option::Option`, found u8
  |
  = note: expected type `std::option::Option&lt;std::cmp::Ordering&gt;`
             found type `u8`
.....
</code></pre></div></div>

<p>This is because the generated code for PartialOrd contains the following:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">match</span> <span class="n">foo</span><span class="nf">.cmp</span><span class="p">(</span><span class="n">bar</span><span class="p">)</span> <span class="p">{</span>
    <span class="nf">Some</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Equal</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="o">.....</span><span class="p">,</span>
    <span class="n">__cmp</span> <span class="k">=&gt;</span> <span class="n">__cmp</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">__cmp</code> can both be a binding to a wildcard pattern match as well as a match against a constant
named <code class="language-plaintext highlighter-rouge">__cmp</code>, and in the presence of such a constant it resolves to the constant, causing
type errors.</p>

<p>One way to fix this is to bind <code class="language-plaintext highlighter-rouge">foo.cmp(bar)</code> to some temporary variable <code class="language-plaintext highlighter-rouge">x</code> and use that directly in
a <code class="language-plaintext highlighter-rouge">_ =&gt; x</code> branch.</p>

<p>I thought I could be clever and try <code class="language-plaintext highlighter-rouge">cmp @ _ =&gt; cmp</code> instead. <code class="language-plaintext highlighter-rouge">match</code> supports syntax where you can
do <code class="language-plaintext highlighter-rouge">foo @ &lt;pattern&gt;</code>, where <code class="language-plaintext highlighter-rouge">foo</code> is bound to the entire matched variable. The <code class="language-plaintext highlighter-rouge">cmp</code> here is unambiguously
a binding; it cannot be a pattern. So no conflicting with the <code class="language-plaintext highlighter-rouge">const</code>, problem solved!</p>

<p>So I made <a href="https://github.com/rust-lang/rust/pull/49676">a PR for both removing the underscores and also fixing this</a>. The change for <code class="language-plaintext highlighter-rouge">__cmp</code>
is no longer in that PR, but you can find it <a href="https://github.com/Manishearth/rust/commit/partial-cmp-hygiene">here</a>.</p>

<p>Except I hit a problem. With that PR, the following still breaks:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">const</span> <span class="n">cmp</span><span class="p">:</span> <span class="nb">u8</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>

<span class="nd">#[derive(PartialOrd,</span> <span class="nd">PartialEq)]</span>
<span class="k">pub</span> <span class="k">enum</span> <span class="n">Foo</span> <span class="p">{</span>
    <span class="nf">A</span><span class="p">(</span><span class="nb">u8</span><span class="p">),</span> <span class="nf">B</span><span class="p">(</span><span class="nb">u8</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>throwing a slightly cryptic error:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error[E0530]: match bindings cannot shadow constants
 --&gt; test.rs:9:7
  |
4 | pub const cmp: u8 = 1;
  | ---------------------- a constant `cmp` is defined here
...
9 |     B(u8)
  |       ^^^ cannot be named the same as a constant
</code></pre></div></div>

<p>You can see a reduced version of this error in the following code:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">const</span> <span class="n">cmp</span> <span class="p">:</span> <span class="nb">u8</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">match</span> <span class="mi">1</span> <span class="p">{</span>
        <span class="n">cmp</span> <span class="o">@</span> <span class="n">_</span> <span class="k">=&gt;</span> <span class="p">()</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>(<a href="https://play.rust-lang.org/?gist=feebbc048b47c286d5720b9926c6925e&amp;version=stable">playpen</a>)</p>

<p>Huh. Wat. Why? <code class="language-plaintext highlighter-rouge">cmp @ _</code> seems to be pretty unambiguous, what’s wrong with it shadowing a constant?</p>

<p>Turns out bindings cannot shadow constants at all, for a <a href="https://github.com/rust-lang/rust/issues/33118#issuecomment-233962221">rather subtle reason</a>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">A</span><span class="p">:</span> <span class="nb">u8</span> <span class="o">=</span> <span class="o">...</span><span class="p">;</span> <span class="c1">// A_const</span>
<span class="k">let</span> <span class="n">A</span> <span class="o">@</span> <span class="n">_</span> <span class="o">=</span> <span class="o">...</span><span class="p">;</span> <span class="c1">// A_let</span>
<span class="k">match</span> <span class="o">..</span> <span class="p">{</span>
    <span class="n">A</span> <span class="k">=&gt;</span> <span class="o">...</span><span class="p">;</span> <span class="c1">// A_match</span>
<span class="p">}</span>
</code></pre></div></div>

<p>What happens here is that constants and variables occupy the same namespace. So <code class="language-plaintext highlighter-rouge">A_let</code> shadows
<code class="language-plaintext highlighter-rouge">A_const</code> here, and when we attempt to <code class="language-plaintext highlighter-rouge">match</code>, <code class="language-plaintext highlighter-rouge">A_match</code> is resolved to <code class="language-plaintext highlighter-rouge">A_let</code> and rejected (since
you can’t match against a variable), and <code class="language-plaintext highlighter-rouge">A_match</code> falls back to resolving as a fresh binding
pattern, instead of resolving to a pattern that matches against <code class="language-plaintext highlighter-rouge">A_const</code>.</p>

<p>This is kinda weird, so we disallow shadowing constants with variables. This is rarely a problem
because variables are lowercase and constants are uppercase. We could <em>technically</em> allow this
language-wise, but it’s hard on the implementation (and irrelevant in practice) so we don’t.</p>

<hr />

<p>So I dropped that fix. The temporary local variable approach is broken as well since
you can also name a constant the same as the local variable and have a clash (so again, you
need the underscores to avoid surprises).</p>

<p>But then I realized that we had an issue with removing the underscores from <code class="language-plaintext highlighter-rouge">__arg_0</code> as well.</p>

<p>The following code is also broken:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">const</span> <span class="n">__arg_0</span><span class="p">:</span> <span class="nb">u8</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>

<span class="nd">#[derive(Debug)]</span>
<span class="k">struct</span> <span class="nf">Foo</span><span class="p">(</span><span class="nb">u8</span><span class="p">);</span>
</code></pre></div></div>

<p>(<a href="https://play.rust-lang.org/?gist=6e10fd8de1123c6f6f695c891e879f70&amp;version=stable">playpen</a>)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error[E0308]: mismatched types
 --&gt; src/main.rs:3:10
  |
3 | #[derive(Debug)]
  |          ^^^^^ expected mutable reference, found u8
  |
  = note: expected type `&amp;mut std::fmt::Formatter&lt;'_&gt;`
             found type `u8`
</code></pre></div></div>

<p>You can see a reduced version of this error in the following code:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">const</span> <span class="n">__arg_0</span><span class="p">:</span> <span class="nb">u8</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>

<span class="k">fn</span> <span class="nf">foo</span><span class="p">(</span><span class="n">__arg_0</span><span class="p">:</span> <span class="nb">bool</span><span class="p">)</span> <span class="p">{}</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error[E0308]: mismatched types
 --&gt; src/main.rs:3:8
  |
3 | fn foo(__arg_0: bool) {}
  |        ^^^^^^^ expected bool, found u8
</code></pre></div></div>

<p>(<a href="https://play.rust-lang.org/?gist=2cf2c8b3520d5b343de1b76f80ea3fe7&amp;version=stable">playpen</a>)</p>

<p>This breakage is not an issue with the current code because of the double underscores – there’s a
very low chance someone will create a constant that is both lowercase and starts with a double
underscore. But it’s a problem when I remove the underscores since that chance shoots up.</p>

<p>Anyway, this failure is even weirder. Why are we attempting to match against the constant in the
first place? <code class="language-plaintext highlighter-rouge">fn</code> argument patterns<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> are irrefutable, i.e. all possible values of the type should match
the argument. For example, <code class="language-plaintext highlighter-rouge">fn foo(Some(foo): Option&lt;u8&gt;) {}</code> will fail to compile with
“refutable pattern in function argument: <code class="language-plaintext highlighter-rouge">None</code> not covered”.</p>

<p>There’s no point trying to match against constants here; because even if we find a constant it will be rejected
later. Instead, we can unambiguously resolve identifiers as new bindings, yes?</p>

<p>Right?</p>

<p>Firm in my belief, <a href="https://github.com/rust-lang/rust/issues/49680">I filed an issue</a>.</p>

<p>I was wrong, it’s <a href="https://github.com/rust-lang/rust/issues/49680#issuecomment-379029404">not going to always be rejected later</a>. With zero-sized types this
can totally still work:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">S</span><span class="p">;</span>

<span class="k">const</span> <span class="n">C</span><span class="p">:</span> <span class="n">S</span> <span class="o">=</span> <span class="n">S</span><span class="p">;</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">C</span> <span class="o">=</span> <span class="n">S</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Here because <code class="language-plaintext highlighter-rouge">S</code> has only one state, matching against a constant of the type is still irrefutable.</p>

<p>I argued that this doesn’t matter – since the type has a single value, it doesn’t matter whether we resolved to
a new binding or the constant; the value and semantics are the same.</p>

<p>This is true.</p>

<p>Except.</p>

<p><a href="https://github.com/rust-lang/rust/issues/49680#issuecomment-379032842">Except for when destructors come in</a>.</p>

<p>It was at this point that my table found itself in the perplexing state of being upside-down.</p>

<p>This is still really fine, zero-sized-constants-with-destructors is a pretty rare thing in Rust
and I don’t really see folks <em>relying</em> on this behavior.</p>

<p>However I later realized that this entire detour was pointless because even if we fix this, we end up
with a way for bindings to shadow constants. Which … which we already realized isn’t allowed by the
compiler till we fix some bugs.</p>

<p>Damn.</p>

<hr />

<p>The <em>actual</em> fix to the macro stuff is to use hygenic generated variable names, which the current
infrastructure supports. I plan to make a PR for this eventually.</p>

<p>But it was a very interesting dive into the nuances of pattern matching in Rust.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>Yes, function arguments in Rust are patterns. You can totally do things like <code class="language-plaintext highlighter-rouge">(a, b): (u8, u8)</code> in function arguments (like you can do in <code class="language-plaintext highlighter-rouge">let</code>) <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>
]]></content>
  </entry>
  
</feed>
