Guide to Nova's garbage collector

Published by
Aapo Alasuutari

We recently merged a large and important pull request that added a lifetime to our JavaScript Value type. As a result, the borrow checker is now a lot noisier and up-in-your-business about how you should write code inside the engine, as well as how embedders interface with it.

This is not a trivial system, so this post is intended as a step-by-step tutorial into understanding the garbage collector, how it uses the borrow checker, how it needs to be handled, and what sort of errors does it give you and why. If you're interested in contributing to Nova, this will be very useful reading. If you're only interested in seeing what the fuss is all about, then you might only want to read the first one or two chapters.

The Theory

Nova uses an exact or precise tracing garbage collector. A tracing garbage collector is one where the garbage collector algorithm follows references between items to find new items, and marks the items it has seen. Once it no longer finds new items, the algorithm then releases all unmarked ones. An exact or precise tracing garbage collector is one that is guaranteed to only mark known live items.

The difference between an exact and a conservative garbage collector (which traces all possible live items) is, generally, in that a conservative garbage collector will look for any sequences of bytes that look like a potential reference to a garbage collected item and traces all references found this way: It will never mistakenly free any still-referenced data but it may hold onto data that isn't actually referenced anymore. An exact garbage collector will not search for "potential" references like this, instead it uses some compile-time guarantee to only look for items in statically known places: It will never mistakenly free still-referenced data (assuming that all references are properly located in the statically known places) and it will always free all unreferenced data.

Nova's garbage collector uses lists of "roots" in the Agent and starts tracing from these lists. What this effectively means is that if a JavaScript Value type is being used by code and is eg. saved to a local variable in a function when garbage collection gets triggered, the garbage collector will not see this local variable. This results in a use-after-free.

// Value in a local variable; it refers to an object's data on the heap.
let value: Value<'_> = OrdinaryObject::create_empty_object(agent).into_value();
// Garbage collector runs; it does not see any references to the object.
agent.gc();
// The object has been free'd now, this is use-after-free!
println!(value.str_repr(agent));

Nova uses Rust's borrow checker to guard against this kind of use-after-free issues. Unfortunately, the method of doing this is by necessity somewhat manual and does not quite resemble your run-of-the-mill Rust fight with the borrow checker. This is kung-fu with borrow checker!

GcScope

Garbage collected values in a tracing garbage collector system do not have a "single owner" in the traditional Rust sense: You cannot point to a garbage collected value in memory and say "this value owns its own memory", and much less can you point to a reference between two values and say "the referee owns the referrent". Garbage collected systems allow for multiple referees but none of these references imply ownership over the memory. All the memory is owned by the garbage collector, communally if you will. In Nova's case this means that the Agent owns all value's heap data.

Garbage collected systems also do not generally subscribe to Rust's "exclusive or multiple shared references" paradigm of write access. In JavaScript's case, all data is always up for mutation by anyone who can access it. This means that when we ascribe a lifetime to Value, that lifetime cannot be bound to the borrow on Agent: It is perfectly valid for mutations to happen on the Value's data while we still hold the Value reference, and we are fully allowed to read and write to its data before and after said mutations.

The "validity" of a value is thus not based on possible mutations on it, but instead on garbage collection. A garbage collectable value stays valid to use without limit until garbage collection happens. For this purpose Nova has the GcScope zero-sized type (ZST) that gets passed through call graphs where garbage collection may happen, and its sibling NoGcScope which is passed into call graphs where garbage collection cannot happen. The GcScope is needed to trigger garbage collection, and it cannot be cloned or copied but it can be "reborrowed" to pass a temporary copy down into deeper call graphs: While this temporary copy lives, the original is "inactive" and cannot be used.

NoGcScope is likewise created from GcScope through reborrowing, but it does not stop the original from being used to create other NoGcScopes. In Rust parlance, to create a temporary GcScope copy the original must be borrowed exclusively, while NoGcScope can be created through a shared borrow.

// Note: Lifetime generics omitted for simplicity.
impl GcScope {
    fn reborrow(&mut self) -> GcScope;
    fn nogc(&self) -> NoGcScope;
}

All values bind their lifetime to the GcScope through the use of a NoGcScope. Because a GcScope is required to trigger garbage collection and a GcScope can only be passed to a function call either by value or by using the reborrow function which requires exclusive access to self, it means that any time that a GcScope gets passed passed to a function call it invalidates all shared borrows on it, which invalidates all NoGcScopes, and that then invalidates all values that have bound their lifetime to the GcScope through the use of a NoGcScope.

With this, we have a system where all garbage collected values bound to the GcScope are automatically invalidated when garbage collection may be triggered (because we call a function that may trigger garbage collection). Unfortunately, this is not the end: We haven't yet defined what "bound values" are, and that is an important and problematic thing to define.

Bindable values

Bindable values in Nova are handles to heap data that carry a Rust lifetime. The most common bindable value type is Value, but many others exist besides. (Some examples would be Number, String [not the standard library one], Symbol, and Object.) Normally, in Rust you see lifetimes mostly in references. For reasons explained above, Nova cannot use the Agent reference to derive the lifetime for bindable values. Nova also cannot quite use normal Rust lifetime rules directly due to Rust not allowing aliasing. For this reason, bindable values implement the following trait:

unsafe trait Bindable {
    type Of<'a>;

    fn unbind(self) -> Self::Of<'static>;
    fn bind<'a>(self, gc: NoGcScope<'a, '_>) -> Self::Of<'a>;
}

The Of type is expected to (and once possible to check, be required to) be equal to the Self type. For example, this is how Value's implementation begins:

unsafe impl Bindable for Value<'_> {
    type Of<'a> = Value<'a>;
}

What this says, effectively, is that for any Value, regardless of what lifetime it has, it has a function unbind that takes self (by value, not by reference) and returns a Value<'static>, and a bind function that again takes takes self and a NoGcScope<'a, '_> and returns a Value<'a>. You can perhaps see where this is going: "Bound values" mean values that carry the 'a lifetime from a NoGcScope<'a, '_>, either from having had bind(gc.nogc()) called on them, or from being returned from a function that takes a NoGcScope<'a, '_> and returns the value with the 'a lifetime.

It is also possible to get "exclusively bound values"; these are created when a bound value is returned with the 'a lifetime from a function that takes GcScope<'a, '_>, ie. when a bound value is returned from a function that can trigger garbage collection. Due to certain details of Rust's borrow checker rules related to internal mutation, the returned value will keep the GcScope from being reused. We'll see how to deal with this situation later.

The Practice

Now that we have seen the GcScope, NoGcScope, and bound values, we are ready to start using them in practice. First, let's do a quick review of these three principal actors:

  1. GcScope<'a, '_>: A zero-sized type that is passed to functions that can trigger garbage collection. Only one such type is ever active at a time, and no NoGcScopes can be active at the same time.
  2. NoGcScope<'a, '_>: A zero-sized type that is passed to functions that cannot trigger garbage collection. Any number of such types can be active at the same time, but no GcScope can be active at the same time.
  3. Bound values: Handles to garbage collected values that have been bound to a NoGcScope or GcScope, implicitly or explicitly.

With this, we're ready to start our praxis.

Returning bound values from NoGcScope functions

The simplest function that deals with bound values we can imagine in the engine is one that cannot trigger garbage collection, and returns a bound value. An example of this would be Value::from_string which takes a Rust std::string::String by value, moves it to the Agent heap, and returns a Value handle to it.

pub fn from_string<'a>(
    agent: &mut Agent,
    string: std::string::String,
    gc: NoGcScope<'a, '_>
) -> Value<'a>;

There is nothing much to say about these sorts of functions: The only thing of note is that it is important for the 'a lifetime to be defined and used in NoGcScope<'a, '_>. The following would be incorrect:

// *Incorrect* usage!
pub fn from_string_wrong(
    agent: &mut Agent,
    string: std::string::String,
    gc: NoGcScope
) -> Value; // Wrong! Lifetime not bound to NoGcScope!

With this sort of definition, the returned Value would be bound to the implied 'a lifetime of &'a mut Agent, not to the NoGcScope lifetime. This would not be a "bound value". This would also block the entire Agent from being used while the returned Value still exists. Even if the blocking were to be fixed (by making the function take &Agent instead), it would still mean that the handle would be invalidated if the Agent is mutated. This isn't correct: We want the Value to be invalidated when garbage collection may be triggered, which can happen entirely unconnected from heap mutation.

Returning bound values from GcScope functions

Currently, the entire Nova project contains no functions that can trigger garbage collection, return bound values, and take no parameters. Such a function would look like this:

pub fn silly_example<'a>(
    agent: &mut Agent,
    gc: GcScope<'a, '_>
) -> Value<'a>;

Again, it is important for the 'a lifetime to be defined and used in GcScope<'a, '_>. It is also worth it to remember that the returned Value<'a> will keep all GcScope inactive while it still exists. Again, we'll come back to this issue soon, but first let's take a look at a simpler case.

Accepting returned bound values from NoGcScope functions

Now that we've learned what functions returning bound values look like on the outside, let's try calling one in a function body. Since we're starting with a function that takes NoGcScope, this will be easy:

fn example(
    agent: &mut Agent,
    gc: GcScope
) {
    let string = Value::from_string(agent, "foo".into(), gc.nogc());
    let string2 = Value::from_string(agent, "bar".into(), gc.nogc());
    println!("{:?} {:?}", string, string2);
    // ... presumably more work here ...
}

There is nothing particularly odd here, except maybe for the gc.nogc() calls. Our example function takes no bound value parameters (since we don't know how to do that yet), and merely allocates two strings onto the heap, keeping handles to them, before printing the two handles.

The gc.nogc() calls are needed to create a NoGcScope from our GcScope while still keeping it accessible for later calls. The exact lifetime tricks that happen when gc.nogc() is called and the Value is returned bound to it could be annotated thusly:

let gc: GcScope<'gc, 'scope>;
let gc_ref: &'short GcScope<'gc, 'scope> = &gc;
let nogc: NoGcScope<'short, 'scope>;
let string: Value<'short>;

The resulting NoGcScope<'short, 'scope> thus says that it keeps the gc_ref borrow alive, and that borrow observes the GcScope. If someone were to take exclusive access to GcScope, then the gc_ref borrow would be forced to end, invalidating the NoGcScope. Likewise, the returned string bound values are bound to the 'short lifetime and they are invalidated if exclusive access to GcScope is taken.

Accepting returned bound values from GcScope functions

It is time to look at the issue I've been mentioning above. Let us call our silly_example from above twice and try to print the two results:

let first = silly_example(agent, gc.reborrow());
let second = silly_example(agent, gc.reborrow());
println!("{:?} {:?}", first, second);

This will not compile. The error will say that gc is borrowed as exclusive at the first gc.reborrow() call site, is then again borrowed as exclusive at the second gc.reborrow() call site, and the first exclusive borrow is later reused on at the print line. This is our first useful error from out bound values:

Both the calls to silly_example can trigger garbage collection. If the second call does trigger garbage collection, then the first Value will be use-after-free. The compile error here is absolutely on point: This is an error.

But okay, what if we were calling Value::from_string instead?

let first = silly_example(agent, gc.reborrow());
let second = Value::from_string(agent, "foo".into(), gc.nogc());
println!("{:?} {:?}", first, second);

This still will not compile! The error message says that gc.reborrow() borrows gc as exclusive, and gc.nogc() then borrows it as shared but the exclusive borrow is then reused when first is printed. From a garbage collector perspective this makes no sense: The first Value returned from silly_example will not be invalidated by a Value::from_string call, so why is this happening?

The reason is buried deep into Rust's internal mutation types, and we'll skip why it is so: We'll just accept that it needs to be this way for a good reason and we'll live with it. But, it is a problem for us: Putting this into problem into garbage collector terms, what Rust here is telling us is that while first lives, garbage collection is still happening and trying to access the GcScope at the same time would be equivalent to allowing two garbage collections to happen at the same time. That sounds dangerous and it makes sense we're stopped from doing that, but we know that wouldn't be the case: Garbage collection has potentially started and finished inside the silly_example call if it did.

A Value is just a handle, an integer of some kind that the Agent can use to find the associated heap data, and importantly the Agent does not trust a Value to contain correct data: All heap data access is always checked. Hence, we can extract the integer data from a Value and wrap it into a new Value with a different lifetime without sacrificing memory safety. Now, the perfect thing would be if we could simply call bind(gc.nogc()) on our first, but this will not compile because first keeps gc inactive. What we need to do is add an unbind call first:

let first = silly_example(agent, gc.reborrow())
    .unbind()
    .bind(gc.nogc());

This will now compile, despite looking a little ugly. (Maybe more than a little.) The unbind call releases the gc from the gc.reborrow()'s temporary exclusive borrow, after which we can immediately call bind(gc.nogc()) on it to bind it back to the GcScope but this time with a shared borrow backing it.

This is our first problem and solution: When returning a strongly bound value, GcScope will remain inactive. To fix this, chain .unbind().bind(gc.nogc()) to the call. Note: There is no runtime cost for doing these calls, though there likely is a small compile time cost.

Passing bound values to NoGcScope functions

This is going to be easy: Passing bound values as parameters to NoGcScope functions works just like you would expect from normal Rust:

let first = Value::from_string(agent, "3".into(), gc.nogc());
let result = to_number_primitive(agent, first, gc.nogc());

Because the first value isn't invalidated by the gc.nogc() call, passing it into the call is perfectly okay from Rust's perspective.

Passing bound values to GcScope functions

This is not going to be quite so easy, but you probably guessed that already. When gc.reborrow() is called, all bound values are invalidated immediately and trying to use them afterwards becomes an error:

let first = silly_example(agent, gc.reborrow()).unbind().bind(gc.nogc());
let child_gc = gc.reborrow(); // <-- Conceptually, Rust thinks garbage collection happens here.
let result = to_number(agent, first, child_gc); // Error: `first` is now use-after-free!

Even if we call the gc.reborrow() "within" the to_number call, the first value will become invalidated "after-the-fact" and the call itself now becomes invalid:

let first = silly_example(agent, gc.reborrow()).unbind().bind(gc.nogc());
let result = to_number(agent, first, gc.reborrow()); // Error! Trying to pass exclusive and shared reference together.

This is again an aliasing error and the borrow checker will not stand for this. So, what do we do? Conceptually, we again know that calling gc.reborrow() doesn't trigger garbage collection but something inside to_number may do so. Hence, passing first as a parameter is perfectly legal here, especially since it (again) does not put memory safety at risk. So, we can use the unbind method to release first from the GcScope borrow before it is passed to the method:

let first = silly_example(agent, gc.reborrow()).unbind().bind(gc.nogc());
let result = to_number(agent, first.unbind(), gc.reborrow()); // No error.

This is our second problem and solution: When calling a method that takes GcScope, we cannot pass bound values into that same call. To fix this, we must call .unbind() on the parameter bound values. To be safe, this should only be done at the call site and not before. The following would be incorrect:

// *Incorrect* usage!
let first = silly_example(agent, gc.reborrow()).unbind();
let result = to_number(agent, first, gc.reborrow()); // No error.
let other_result = to_number(agent, first, gc.reborrow()); // Uh oh, no error! `first` is now use-after-free!

This is also the reason why GcScope is always the last argument.

Taking bound values as parameters in NoGcScope functions

There is nothing particularly complicated here, again. These functions work in all ways very much like normal Rust functions:

fn my_method(
    agent: &mut Agent,
    value: Value,
    gc: NoGcScope
) {
    // ... do your thing ...
}

Because no garbage collection can happen within this function scope, the value is guaranteed to stay valid until the end of the call.

Taking bound values as parameters in GcScope functions

Once again, functions that may trigger garbage collection are the problem child. When we receive a parameter value: Value in a call, by normal Rust lifetime rules the Value<'_>'s contained lifetime is guaranteed to be valid until the end of this call. In our case, we do not want that to be the case; the lifetime should be bound to the GcScope but there is no way to make Rust perform this binding automatically. We must thus perform it manually: You can think of this as the mirror of the .unbind() call we needed to perform when calling a GcScope function earlier.

fn my_method(
    agent: &mut Agent,
    value: Value,
    gc: GcScope
) {
    let value = value.bind(gc.nogc());
    // ... do your thing ...
}

All bindable values should be bound in this way at the beginning of every function that takes GcScope: This is the most important thing in Nova's garbage collector by far. If this rule is not upheld, then getting use-after-free in the engine is trivial or even guaranteed. If this rule is upheld, then getting any further use-after-free becomes nearly impossible.

Luckily, this is fairly easy conceptually: You need only to bind your parameters and you're good to go. Unfortunately, we know that "just doing the right thing" is not quite that easy (see null pointer checks). For that reason, we plan on implementing a custom lint to check this at build time.

Holding bindable values across GcScope function calls

One final, important thing is left to discuss: We've seen how bindable values are formed and passed around, and how they become invalidated whenever a garbage collector triggering function call is made. This is great because it makes sure that we don't use these values after they may have been moved or free'd. But what if we need to use a bound value after a function call? For example, what do we do in this sort of situation:

fn example(
    agent: &mut Agent,
    arg0: Value,
    arg1: Value,
    gc: GcScope
) {
    let arg0 = arg0.bind(gc.nogc());
    let arg1 = arg1.bind(gc.nogc());

    let start_index = to_length_or_infinity(agent, arg0.unbind(), gc.reborrow())?;
    let end_index = to_length_or_infinity(agent, arg1.unbind(), gc.reborrow())?; // Error: arg1 is use-after-free
}

This will not compile, and it is a perfectly valid error: if garbage collection is triggered by the first to_length_or_infinity call, then arg1's data would be moved or removed. Continuing to use it would be well and truly mistaken.

But, we cannot just not use arg1: we need both start_index and end_index so we need to somehow make sure that arg1 can be used and stays valid across the first to_length_or_infinity call. The answer to how to do that is, emphatically, not let arg1 = arg1.unbind();. That would be a big, big mistake as it just makes the code compile but would not work correctly.

What we need to do, instead, is to "scope" or "root" arg1. (The terms are interchangeable in this context.) What this does is to write the arg1 Value onto the Agent heap, into a location that Agent guarantees will not be changed by the garbage collector, and returns a handle to it. This is a second level handle, if you will: Our original Value is a handle to some data on the heap, and rooting it moves that onto the heap and returns a handle. This second level handle has a different type, Scoped and is defined automatically for all bindable, rootable values.

trait Scopable: Rootable + Bindable
where
    for<'a> Self::Of<'a>: Rootable + Bindable,
{
    fn scope<'scope>(
        self,
        agent: &mut Agent,
        gc: NoGcScope<'_, 'scope>,
    ) -> Scoped<'scope, Self::Of<'static>> {
        Scoped::new(agent, self.unbind(), gc)
    }
}

For our Value, this would simplify to the following scope function implementation:

impl Value<'_> {
    fn scope<'scope>(
        self,
        agent: &mut Agent,
        gc: NoGcScope<'_, 'scope>,
    ) -> Scoped<'scope, Value<'static>> {
        Scoped::new(agent, self.unbind(), gc)
    }
}

Note that we're now using the second lifetime of the NoGcScope, this is the "scope" lifetime which works like your usual Rust lifetimes: a type carrying the scope lifetime is guaranteed to be valid for the whole function call. From a validity standpoint, this is because the Agent removes these scope-rooted handles from the heap only at outside of user-controlled code. Garbage collection does not remove them, so they do not need to be bound to the garbage collector lifetime.

Now that we know of scoping, this is how we would use it:

fn example(
    agent: &mut Agent,
    arg0: Value,
    arg1: Value,
    gc: GcScope
) {
    let arg0 = arg0.bind(gc.nogc());
    // We scope the argument value. The resulting scoped value is guaranteed to
    // be valid until the end of this call.
    // Note that we could bind and then scope the value, but that would not
    // make any difference to the resulting code or its correctness.
    let arg1: Scoped<'_, Value<'static>> = arg1.scope(agent, gc.nogc());

    let start_index = to_length_or_infinity(agent, arg0.unbind(), gc.reborrow())?;
    // To get our bindable value back out form a scoped value, we can use a get
    // method on it. This returns a `Value<'static>`, which we can leave
    // unbound because we're passing it as a parameter immediately.
    let end_index = to_length_or_infinity(agent, arg1.get(agent), gc.reborrow())?;
}

With this, we've successfully held arg1 across a GcScope function call, ensuring that its data will not be removed by the garbage collector (it will still be moved, but on the heap the scoped value's data will be updated to point to the new location).

The Mastery

Now we know how to work with bindable values at function interfaces, and we know how to create scoped values and use them to keep bindable values from being invalidated by the garbage collector when we need them to stay. All that remains is the all important question of "how do I actually use these things?!" True mastery can only come from practice, from trying and failing, and trying again until the tools break in your hands. I've never known how to create masters, but I've seen the tools break in my hands a few times by now, so I can hopefully give you a few tricks.

These will be in no particular order, snippets of code that pose challenges and the answers therein. Koans, if you will.

Short-circuit returning bound values from GcScope functions

If your function calls a function and immediately returns the result, you'll find that using gc.reborrow() or gc.nogc() will not work.

fn my_method<'a>(
    agent: &mut Agent,
    gc: GcScope<'a, '_>
) -> Value<'a> {
    // ...
    to_number(agent, some_value, gc.reborrow()) // Error: Returning value bound to local variable.
}

fn my_method_2<'a>(
    agent: &mut Agent,
    gc: GcScope<'a, '_>
) -> Value<'a> {
    // ...
    Value::from_string(agent, "foo".into(), gc.nogc()) // Error: Returning value bound to local variable.
}

The reason for this is that our gc.reborrow() and gc.nogc() are not "true reborrows", their contained 'a lifetime is always guaranteed to be shorter than the source GcScope<'a, '_> is, but our return type requires the lifetime to be equal to 'a. (This isn't exactly the reason, actually, but it's technically the same thing. The real reason is the temporary borrow. Whatever. Deal with it.)

There are two ways to fix this. The first, preferred one, is to consume the GcScope variable to get an equal 'a lifetime. This can be done by passing the gc directly (for GcScope functions) or by using the gc.into_nogc() method:

fn my_method<'a>(
    agent: &mut Agent,
    gc: GcScope<'a, '_>
) -> Value<'a> {
    // ...
    to_number(agent, some_value, gc) // No error
}

fn my_method_2<'a>(
    agent: &mut Agent,
    gc: GcScope<'a, '_>
) -> Value<'a> {
    // ...
    Value::from_string(agent, "foo".into(), gc.into_nogc()) // No error
}

Note that both of these actions invalidate all bound values; for the gc.into_nogc() method this is quite unfortunate as we actually know that it signifies the end to all possibility of garbage collector triggering in this call, and that henceforth all bound values are strictly guaranteed to stay valid until the end of the call. If you are passing bound values into the call together with the result of gc.into_nogc() then you can use .unbind() on the parameters at the call site the same way as you'd normally use for a gc.reborrow() call.

The second way to handle this issue is by simply calling .unbind() at the return site: This is likewise perfectly fine, and perfectly safe. Especially if you are passing bound values into a NoGcScope call and returning the result then this may sometimes be the cleaner option:

fn my_method<'a>(
    agent: &mut Agent,
    gc: GcScope<'a, '_>
) -> Value<'a> {
    // ...
    to_number(agent, some_value, gc.reborrow())
        .unbind() // No error
}

fn my_method_2<'a>(
    agent: &mut Agent,
    gc: GcScope<'a, '_>
) -> Value<'a> {
    // ...
    Value::from_string(agent, "foo".into(), gc.nogc())
        .unbind() // No error
}

The question is: How to ensure that a bound value lives long enough to be returned? The answer is, by consuming the GcScope. Or when under duress, by unbinding the value at the site of return.

Splitting a function's tail or branch into a NoGcScope scope

Oftentimes, functions will call into GcScope functions at the start but later move to operating purely with NoGcScope functions. In this case you might want to use gc.into_nogc() after the last gc.reborrow() call to ensure that the rest of the work does indeed happen without any chance of garbage collection. Unfortunately, when calling gc.into_nogc() any existing bound values get invalidated and cannot be used in the "tail" of the function.

// Some bound string values we've in the scope after parameter massaging.
let s: String<'_>;
let fill_string: String<'_>;

// NoGcScope tail of the function starts.
let gc = gc.into_nogc();

do_work(agent, s, fill_string, gc); // Error: s and fill_string are bound to GcScope which was consumed.

Here it is, as the only exception to the rule, okay to temporarily unbind bound values to a local variable and rebind them to pass them into the tail:

// NoGcScope tail of the function starts.
let (gc, s, fill_string) = {
    let s = s.unbind();
    let fill_string = fill_string.unbind();
    let gc = gc.into_nogc();
    (gc, s.bind(gc), fill_string.bind(gc))
};

Or alternatively:

// NoGcScope tail of the function starts.
let s = s.unbind();
let fill_string = fill_string.unbind();
let gc = gc.into_nogc();
let s = s.bind(gc);
let fill_string = fill_string.bind(gc);

Note that it is important to make sure that the new bound variable names (created after gc.into_nogc()) must shadow the unbound variable names for safety.

Avoiding scoping on fast paths

Scoping of variables is purposefully made to be cheap and in general it shouldn't be something you need to particularly worry about in your code. As an example, stack-only data is never put into the heap during scoping but is kept unchanged on the stack. Still, it is extra work that the engine must do and in many cases it is purely unnecessary.

For instance, a builtin method that takes a string and two number parameters will usually be defined as calling ToString and ToNumber methods on the parameters. These can methods call arbitrary JavaScript through a toValue function defined on an object passed in as a parameter, and hence they can trigger garbage collection. The builtin method must be prepared for this possibility, so scoping would normally be necessary.

fn builtin_method<'a>(
    agent: &mut Agent,
    _this_value: Value,
    arguments: ArgumentsList,
    gc: GcScope<'a, '_>
) -> JsResult<Value<'a>> {
    let a = arguments.get(0).bind(gc.nogc());
    let b = arguments.get(1).scope(agent, gc.nogc());
    let c = arguments.get(2).scope(agent, gc.nogc());
    let a = to_string(agent, a.unbind(), gc.reborrow())?.unbind().scope(agent, gc.nogc());
    let b = to_number(agent, b.get(agent), gc.reborrow())?.unbind().scope(agent, gc.nogc());
    let c = to_number(agent, c.get(agent), gc.reborrow())?.unbind().bind(gc.nogc());
    let a = a.get(agent).bind(gc.nogc());
    let b = b.get(agent).bind(gc.nogc());
}

At the end here we have bound values a, b, and c on the stack and everything is fine. There are two unfortunate parts here though: First, we've called scope four times to scope three values and second, the usual case is going to be that these parameters were already of the correct type. In that case these methods are guaranteed to not call into JavaScript, and are actually guaranteed to return the values as-is. None of the methods are likely to have any chance of triggering garbage collection.

We have two main ways to avoid the mostly-unnecessary scoping:

  1. Conditional fast paths for correctly typed calls.
  2. Try method variants.

Additionally, the Scoped type has some unsafe methods that can be used to cut down on the amount of work that scoping needs to perform.

Conditional fast paths

A conditional fast path is one where the code checks its preferred types ahead of time, and avoids all potentially garbage collection triggering method calls if the types match. The simplest and most targeted fast path for the above example method would be as follows:

let a = arguments.get(0).bind(gc.nogc());
let b = arguments.get(1).bind(gc.nogc());
let c = arguments.get(2).bind(gc.nogc());
if let (Ok(a), Ok(b), Ok(c)) = (String::try_from(a), Number::try_from(b), Number::try_from(c)) {
    // Fast path without conversions, scoping.
} else {
    // Slow path.
}

This will only work for the exactly correct call of the method, so it is maximally strict and (probably) maximally optimal. A more permissive but somewhat less optimal fast path can, in this case, be created based on the knowledge that primitive values will never cause a call into JavaScript in the ToString and ToNumber methods. Nova includes special variants of these methods for calling with primitive values:

let a = arguments.get(0).bind(gc.nogc());
let b = arguments.get(1).bind(gc.nogc());
let c = arguments.get(2).bind(gc.nogc());
if let (Ok(a), Ok(b), Ok(c)) = (Primitive::try_from(a), Primitive::try_from(b), Primitive::try_from(c)) {
    // Fast path without scoping.
    let a = to_string_primitive(agent, a, gc.nogc());
    let b = to_number_primitive(agent, b, gc.nogc());
    let c = to_number_primitive(agent, c, gc.nogc());
} else {
    // Slow path.
}

This will allow a wider range of ways to call the method to enter the fast path but at the cost of still having to call conversion methods. There is no hard and fast rule for which one should be preferred: This will likely be an ongoing matter of personal preference and performance measuring.

Try method variants

Similarly to the primitive conversion variants, Nova also includes "try variants" of various methods. These methods return a TryResult (which is just an alias of ControlFlow) which tells if the method finished successfully and the execution of the caller can continue (TryResult::Continue is then the returned variant), or if the method encountered a need to call into JavaScript and thus had to break out early (in which case TryResult::Break is returned).

These methods can be used to avoid splitting a method into a slow and fast path, at the cost of splitting smaller parts of the method into try and fallback paths. Doing so generally requires quite a bit of dancing around with mutable local variables, and adding mutable local optional scoped values to the method:

let a = arguments.get(0).bind(gc.nogc());
let b = arguments.get(1).bind(gc.nogc());
let c = arguments.get(2).bind(gc.nogc());
if let (
    TryResult::Continue(a),
    TryResult::Continue(b),
    TryResult::Continue(c)
) = (
    try_to_string(agent, a, gc.nogc()),
    try_to_number(agent, b, gc.nogc()),
    try_to_number(agent, c, gc.nogc())
) {
    // Try succeeded, continue on the fast path.
} else {
    // Slow path.
}

Avoiding excessive scoping

When a slow path is encountered, and we need to perform scoping then the best case scenario is that we only need the scoped value for that singular slow path and never afterwards. This is often not the case, however: A value will need to be conditionally scoped to pass through multiple slow paths. In this case, we'd do not want to re-scope a value over and over again. For example, this kind of code would make no sense:

let value: Value;
let value = value.scope(agent, gc.nogc());
gc.reborrow(); // Some GcScope calls.
let value = value.get(agent).bind(gc.nogc());
gc.nogc(); // Some NoGc calls.
let value = value.scope(agent, gc.nogc());
gc.reborrow(); // Some GcScope calls.
let value = value.get(agent).bind(gc.nogc());

This would scope the same value twice, which means that the value is stored on the heap in duplicate. Luckily we can avoid this.

When in doubt, scope

The simplest choice is to simply scope once and keep calling .get(agent) on the scoped value to get a bindable value for use as a parameter and so on. You may want to name the scoped value as scoped_value, and name the bound value as value. This way you can use the bound value as a parameter for a GcScope call while also scoping it.

let value: Value;
let scoped_value = value.scope(agent, gc.nogc());
some_call(agent, value.unbind(), gc.reborrow());
other_call(agent, scoped_value.get(agent), gc.reborrow());
let value = scoped_value.get(agent).bind(gc.nogc());

This is a perfectly valid way to handle the question of "how do I deal with Value invalidation on gc.reborrow()". It may not be the absolute peak of optimal performance, but it will get you to where you need to be the fastest.

Optional scoped value

When using conditional fast paths, you probably want to avoid repeatedly scoping the same values on each slow path. (It is also perfectly valid that slow paths are allowed to be slow and you don't need to worry about re-scoping values there. I'll allow it.) The way you can do this is to use an Optional<Scoped<T>>.

let mut value: Value;
let mut scoped_value = None;
let result = if let TryResult::Continue(result) = try_some_call(agent, value, gc.nogc()) {
    result?
} else {
    scoped_value = Some(value.scope(agent, gc.nogc()));
    let result = some_call(agent, value.unbind(), gc.reborrow())?;
    value = scoped_value.as_ref().unwrap().get(agent).bind(gc.nogc());
    result
};

// other work ...
let other_value: Value;
let result = if let TryResult::Continue(result) = try_other_call(agent, other_value, gc.nogc()) {
    result?
} else {
    if scoped_value.is_none() {
        scoped_value = Some(value.scope(agent, gc.nogc()));
    }
    let result = some_call(agent, other_value.unbind(), gc.reborrow())?;
    value = scoped_value.as_ref().unwrap().get(agent).bind(gc.nogc());
    result
};

This is pretty ugly, but it does the job: scoped_value remains None for the entire duration of the call if we always pass through fast paths and only costs us the stack space. On slow paths we scope our needed value if a previous slow path didn't already do so, and re-read value from the heap afterwards.

Changing values

In loops especially, it may happen that a particular value needs to be scoped for a short period of time and then is no longer needed. In loops specifically, another value will then naturally appear on the next loop iteration and needs to again be scoped for the same short period of time.

In these cases, you'd want to avoid allocating new value data onto the heap in each loop iteration. For this purpose, Scoped has the unsafe fn replace function. To use this function, you'll want to prepare a Scoped outside the loop and call the replace function on it when it is time to scope you value within the loop. Preparing the Scoped value outside the loop may be done normally using value.scope(agent) but for Scoped<Value> specifically, you can also use Value::Undefined.scope_static(). This prepares only the stack data for the Scoped and enables calling the replace function later.

let scoped_value = Value::Undefined.scope_static();
loop {
    let v: Value;
    // SAFETY: scoped_value is never shared.
    unsafe { scoped_value.replace(agent, v.unbind()) };
    gc.reborrow();
    let value = scoped_value.get(agent).bind(gc.nogc());
}

The replace function is marked unsafe not because it can cause undefined behaviour (it currently cannot) but because it can break the JavaScript virtual machine's internal logic. If a scoped value is cloned and then passed as a parameter to a call that calls replace on the clone, then get on the original will return a (JavaScript-wise) different value than was originally scoped. Thus, calling the function on any Scoped that was received as a parameter should generally be avoided unless you're very sure that it is not a clone, or that no other caller will call get on the value anymore.

Note: The API is not guaranteed to never perform new heap allocations for scoping. If the API is called on the same scoped value repeatedly, with every other call scoping a stack-only value (like a small integer or string) and every other call scoping a heap value (like an object), then the stack-only value calls will forget about the scoped heap allocation without explicitly releasing it, causing the heap value calls to need to allocate a new one.

Changing types

Sometimes a value needs to be scoped but is then later checked to be of a different type or gets converted to a different type with the original value never used afterwards, and needs to be scoped again. To keep the change, you'd want to reuse the previous scoped value but it's of a different type and won't work.

In these cases, you can use the unsafe fn replace_self to change both the value and the type of what is being stored at the same time. The safety condition for this function is the same as unsafe fn replace above: The scoped value should not be a clone passed in as a parameter or if it is, you must be very sure that it's not going to be used by other callers. Or, you must be very sure that the actual value that is being stored by replace_self isn't actually functionally different. For example, converting to another bindable type via TryFrom kind of type conversions does not change the value but only narrows the type through a check. In these cases, calling replace_self on a cloned Scoped does not pose any threat to the JavaScript virtual machine's internal logic.

let value: Value;
let scoped_value = value.scope(agent, gc.nogc());
gc.reborrow();
let Ok(object) = Object::try_from(scoped_value.get(agent).bind(gc.nogc())) else {
    // Check failed.
    return;
}
// SAFETY: Value isn't being changed, only type gets narrowed. Even clone is fine.
let scoped_object = unsafe { scoped_value.clone().replace_self(agent, object.unbind()) };

You can of course also use this API to change entirely unrelated scoped values to others, but in that case the safety requirements need to be fulfilled.

let value: Value;
let scoped_value = value.scope(agent, gc.nogc());
gc.reborrow();
let object: Object; // Not connected with value.
// SAFETY: scoped_value is never cloned and not received as parameter, ie. not
// shared. We can take over its scoped slot without anyone seeing it.
let scoped_object = unsafe { scoped_value.replace_self(agent, object.unbind()) };

The End

We're at the end, for now. But this is a guide and a guide is no good if it doesn't get updated. So, hopefully more will be added later. So long.