Guide to Nova's garbage collector
We recently merged a large and important
pull request that added a lifetime
to our JavaScript Value
type. As a result, the borrow checker is now a lot
noisier and up-in-your-business about how you should write code inside the
engine, as well as how embedders interface with it.
This is not a trivial system, so this post is intended as a step-by-step tutorial into understanding the garbage collector, how it uses the borrow checker, how it needs to be handled, and what sort of errors does it give you and why. If you're interested in contributing to Nova, this will be very useful reading. If you're only interested in seeing what the fuss is all about, then you might only want to read the first one or two chapters.
The Theory
Nova uses an exact or precise tracing garbage collector. A tracing garbage collector is one where the garbage collector algorithm follows references between items to find new items, and marks the items it has seen. Once it no longer finds new items, the algorithm then releases all unmarked ones. An exact or precise tracing garbage collector is one that is guaranteed to only mark known live items.
The difference between an exact and a conservative garbage collector (which traces all possible live items) is, generally, in that a conservative garbage collector will look for any sequences of bytes that look like a potential reference to a garbage collected item and traces all references found this way: It will never mistakenly free any still-referenced data but it may hold onto data that isn't actually referenced anymore. An exact garbage collector will not search for "potential" references like this, instead it uses some compile-time guarantee to only look for items in statically known places: It will never mistakenly free still-referenced data (assuming that all references are properly located in the statically known places) and it will always free all unreferenced data.
Nova's garbage collector uses lists of "roots" in the Agent
and starts tracing
from these lists. What this effectively means is that if a JavaScript Value
type is being used by code and is eg. saved to a local variable in a function
when garbage collection gets triggered, the garbage collector will not see this
local variable. This results in a use-after-free.
// Value in a local variable; it refers to an object's data on the heap.
let value: Value<'_> = OrdinaryObject::create_empty_object(agent).into_value();
// Garbage collector runs; it does not see any references to the object.
agent.gc();
// The object has been free'd now, this is use-after-free!
println!(value.str_repr(agent));
Nova uses Rust's borrow checker to guard against this kind of use-after-free issues. Unfortunately, the method of doing this is by necessity somewhat manual and does not quite resemble your run-of-the-mill Rust fight with the borrow checker. This is kung-fu with borrow checker!
GcScope
Garbage collected values in a tracing garbage collector system do not have a
"single owner" in the traditional Rust sense: You cannot point to a garbage
collected value in memory and say "this value owns its own memory", and much
less can you point to a reference between two values and say "the referee owns
the referrent". Garbage collected systems allow for multiple referees but none
of these references imply ownership over the memory. All the memory is owned by
the garbage collector, communally if you will. In Nova's case this means that
the Agent
owns all value's heap data.
Garbage collected systems also do not generally subscribe to Rust's "exclusive
or multiple shared references" paradigm of write access. In JavaScript's case,
all data is always up for mutation by anyone who can access it. This means that
when we ascribe a lifetime to Value
, that lifetime cannot be bound to the
borrow on Agent
: It is perfectly valid for mutations to happen on the
Value
's data while we still hold the Value
reference, and we are fully
allowed to read and write to its data before and after said mutations.
The "validity" of a value is thus not based on possible mutations on it, but
instead on garbage collection. A garbage collectable value stays valid to use
without limit until garbage collection happens. For this purpose Nova has the
GcScope
zero-sized type (ZST) that gets passed through call graphs where
garbage collection may happen, and its sibling NoGcScope
which is passed into
call graphs where garbage collection cannot happen. The GcScope
is needed to
trigger garbage collection, and it cannot be cloned or copied but it can be
"reborrowed" to pass a temporary copy down into deeper call graphs: While this
temporary copy lives, the original is "inactive" and cannot be used.
NoGcScope
is likewise created from GcScope
through reborrowing, but it does
not stop the original from being used to create other NoGcScope
s. In Rust
parlance, to create a temporary GcScope
copy the original must be borrowed
exclusively, while NoGcScope
can be created through a shared borrow.
// Note: Lifetime generics omitted for simplicity.
impl GcScope {
fn reborrow(&mut self) -> GcScope;
fn nogc(&self) -> NoGcScope;
}
All values bind their lifetime to the GcScope
through the use of a
NoGcScope
. Because a GcScope
is required to trigger garbage collection and a
GcScope
can only be passed to a function call either by value or by using the
reborrow
function which requires exclusive access to self
, it means that any
time that a GcScope
gets passed passed to a function call it invalidates all
shared borrows on it, which invalidates all NoGcScope
s, and that then
invalidates all values that have bound their lifetime to the GcScope
through
the use of a NoGcScope
.
With this, we have a system where all garbage collected values bound to the
GcScope
are automatically invalidated when garbage collection may be triggered
(because we call a function that may trigger garbage collection). Unfortunately,
this is not the end: We haven't yet defined what "bound values" are, and that is
an important and problematic thing to define.
Bindable values
Bindable values in Nova are handles to heap data that carry a Rust lifetime. The
most common bindable value type is Value
, but many others exist besides. (Some
examples would be Number
, String
[not the standard library one], Symbol
,
and Object
.) Normally, in Rust you see lifetimes mostly in references. For
reasons explained above, Nova cannot use the Agent
reference to derive the
lifetime for bindable values. Nova also cannot quite use normal Rust lifetime
rules directly due to Rust not allowing aliasing. For this reason, bindable
values implement the following trait:
unsafe trait Bindable {
type Of<'a>;
fn unbind(self) -> Self::Of<'static>;
fn bind<'a>(self, gc: NoGcScope<'a, '_>) -> Self::Of<'a>;
}
The Of
type is expected to (and once possible to check, be required to) be
equal to the Self
type. For example, this is how Value
's implementation
begins:
unsafe impl Bindable for Value<'_> {
type Of<'a> = Value<'a>;
}
What this says, effectively, is that for any Value
, regardless of what
lifetime it has, it has a function unbind
that takes self (by value, not by
reference) and returns a Value<'static>
, and a bind
function that again
takes takes self and a NoGcScope<'a, '_>
and returns a Value<'a>
. You can
perhaps see where this is going: "Bound values" mean values that carry the 'a
lifetime from a NoGcScope<'a, '_>
, either from having had bind(gc.nogc())
called on them, or from being returned from a function that takes a
NoGcScope<'a, '_>
and returns the value with the 'a
lifetime.
It is also possible to get "exclusively bound values"; these are created when a
bound value is returned with the 'a
lifetime from a function that takes
GcScope<'a, '_>
, ie. when a bound value is returned from a function that can
trigger garbage collection. Due to certain details of Rust's borrow checker
rules related to internal mutation, the returned value will keep the GcScope
from being reused. We'll see how to deal with this situation later.
The Practice
Now that we have seen the GcScope
, NoGcScope
, and bound values, we are ready
to start using them in practice. First, let's do a quick review of these three
principal actors:
GcScope<'a, '_>
: A zero-sized type that is passed to functions that can trigger garbage collection. Only one such type is ever active at a time, and noNoGcScope
s can be active at the same time.NoGcScope<'a, '_>
: A zero-sized type that is passed to functions that cannot trigger garbage collection. Any number of such types can be active at the same time, but noGcScope
can be active at the same time.- Bound values: Handles to garbage collected values that have been bound to a
NoGcScope
orGcScope
, implicitly or explicitly.
With this, we're ready to start our praxis.
Returning bound values from NoGcScope
functions
The simplest function that deals with bound values we can imagine in the engine
is one that cannot trigger garbage collection, and returns a bound value. An
example of this would be Value::from_string
which takes a Rust
std::string::String
by value, moves it to the Agent
heap, and returns a
Value
handle to it.
pub fn from_string<'a>(
agent: &mut Agent,
string: std::string::String,
gc: NoGcScope<'a, '_>
) -> Value<'a>;
There is nothing much to say about these sorts of functions: The only thing of
note is that it is important for the 'a
lifetime to be defined and used in
NoGcScope<'a, '_>
. The following would be incorrect:
// *Incorrect* usage!
pub fn from_string_wrong(
agent: &mut Agent,
string: std::string::String,
gc: NoGcScope
) -> Value; // Wrong! Lifetime not bound to NoGcScope!
With this sort of definition, the returned Value
would be bound to the implied
'a
lifetime of &'a mut Agent
, not to the NoGcScope
lifetime. This would
not be a "bound value". This would also block the entire Agent
from being used
while the returned Value
still exists. Even if the blocking were to be fixed
(by making the function take &Agent
instead), it would still mean that the
handle would be invalidated if the Agent
is mutated. This isn't correct: We
want the Value
to be invalidated when garbage collection may be triggered,
which can happen entirely unconnected from heap mutation.
Returning bound values from GcScope
functions
Currently, the entire Nova project contains no functions that can trigger garbage collection, return bound values, and take no parameters. Such a function would look like this:
pub fn silly_example<'a>(
agent: &mut Agent,
gc: GcScope<'a, '_>
) -> Value<'a>;
Again, it is important for the 'a
lifetime to be defined and used in
GcScope<'a, '_>
. It is also worth it to remember that the returned Value<'a>
will keep all GcScope
inactive while it still exists. Again, we'll come back
to this issue soon, but first let's take a look at a simpler case.
Accepting returned bound values from NoGcScope
functions
Now that we've learned what functions returning bound values look like on the
outside, let's try calling one in a function body. Since we're starting with a
function that takes NoGcScope
, this will be easy:
fn example(
agent: &mut Agent,
gc: GcScope
) {
let string = Value::from_string(agent, "foo".into(), gc.nogc());
let string2 = Value::from_string(agent, "bar".into(), gc.nogc());
println!("{:?} {:?}", string, string2);
// ... presumably more work here ...
}
There is nothing particularly odd here, except maybe for the gc.nogc()
calls.
Our example function takes no bound value parameters (since we don't know how to
do that yet), and merely allocates two strings onto the heap, keeping handles to
them, before printing the two handles.
The gc.nogc()
calls are needed to create a NoGcScope
from our GcScope
while still keeping it accessible for later calls. The exact lifetime tricks
that happen when gc.nogc()
is called and the Value
is returned bound to it
could be annotated thusly:
let gc: GcScope<'gc, 'scope>;
let gc_ref: &'short GcScope<'gc, 'scope> = &gc;
let nogc: NoGcScope<'short, 'scope>;
let string: Value<'short>;
The resulting NoGcScope<'short, 'scope>
thus says that it keeps the gc_ref
borrow alive, and that borrow observes the GcScope
. If someone were to take
exclusive access to GcScope
, then the gc_ref
borrow would be forced to end,
invalidating the NoGcScope
. Likewise, the returned string
bound values are
bound to the 'short
lifetime and they are invalidated if exclusive access to
GcScope
is taken.
Accepting returned bound values from GcScope
functions
It is time to look at the issue I've been mentioning above. Let us call our
silly_example
from above twice and try to print the two results:
let first = silly_example(agent, gc.reborrow());
let second = silly_example(agent, gc.reborrow());
println!("{:?} {:?}", first, second);
This will not compile. The error will say that gc
is borrowed as exclusive at
the first gc.reborrow()
call site, is then again borrowed as exclusive at the
second gc.reborrow()
call site, and the first exclusive borrow is later reused
on at the print line. This is our first useful error from out bound values:
Both the calls to silly_example
can trigger garbage collection. If the second
call does trigger garbage collection, then the first
Value
will be
use-after-free. The compile error here is absolutely on point: This is an error.
But okay, what if we were calling Value::from_string
instead?
let first = silly_example(agent, gc.reborrow());
let second = Value::from_string(agent, "foo".into(), gc.nogc());
println!("{:?} {:?}", first, second);
This still will not compile! The error message says that gc.reborrow()
borrows
gc
as exclusive, and gc.nogc()
then borrows it as shared but the exclusive
borrow is then reused when first
is printed. From a garbage collector
perspective this makes no sense: The first
Value
returned from
silly_example
will not be invalidated by a Value::from_string
call, so why
is this happening?
The reason is buried deep into Rust's internal mutation types, and we'll skip
why it is so: We'll just accept that it needs to be this way for a good reason
and we'll live with it. But, it is a problem for us: Putting this into problem
into garbage collector terms, what Rust here is telling us is that while first
lives, garbage collection is still happening and trying to access the GcScope
at the same time would be equivalent to allowing two garbage collections to
happen at the same time. That sounds dangerous and it makes sense we're stopped
from doing that, but we know that wouldn't be the case: Garbage collection has
potentially started and finished inside the silly_example
call if it did.
A Value
is just a handle, an integer of some kind that the Agent
can use to
find the associated heap data, and importantly the Agent
does not trust a
Value
to contain correct data: All heap data access is always checked. Hence,
we can extract the integer data from a Value
and wrap it into a new Value
with a different lifetime without sacrificing memory safety. Now, the perfect
thing would be if we could simply call bind(gc.nogc())
on our first
, but
this will not compile because first
keeps gc
inactive. What we need to do is
add an unbind
call first:
let first = silly_example(agent, gc.reborrow())
.unbind()
.bind(gc.nogc());
This will now compile, despite looking a little ugly. (Maybe more than a
little.) The unbind
call releases the gc
from the gc.reborrow()
's
temporary exclusive borrow, after which we can immediately call
bind(gc.nogc())
on it to bind it back to the GcScope
but this time with a
shared borrow backing it.
This is our first problem and solution: When returning a strongly bound value,
GcScope
will remain inactive. To fix this, chain .unbind().bind(gc.nogc())
to the call. Note: There is no runtime cost for doing these calls, though there
likely is a small compile time cost.
Passing bound values to NoGcScope
functions
This is going to be easy: Passing bound values as parameters to NoGcScope
functions works just like you would expect from normal Rust:
let first = Value::from_string(agent, "3".into(), gc.nogc());
let result = to_number_primitive(agent, first, gc.nogc());
Because the first
value isn't invalidated by the gc.nogc()
call, passing it
into the call is perfectly okay from Rust's perspective.
Passing bound values to GcScope
functions
This is not going to be quite so easy, but you probably guessed that already.
When gc.reborrow()
is called, all bound values are invalidated immediately and
trying to use them afterwards becomes an error:
let first = silly_example(agent, gc.reborrow()).unbind().bind(gc.nogc());
let child_gc = gc.reborrow(); // <-- Conceptually, Rust thinks garbage collection happens here.
let result = to_number(agent, first, child_gc); // Error: `first` is now use-after-free!
Even if we call the gc.reborrow()
"within" the to_number
call, the first
value will become invalidated "after-the-fact" and the call itself now becomes
invalid:
let first = silly_example(agent, gc.reborrow()).unbind().bind(gc.nogc());
let result = to_number(agent, first, gc.reborrow()); // Error! Trying to pass exclusive and shared reference together.
This is again an aliasing error and the borrow checker will not stand for this.
So, what do we do? Conceptually, we again know that calling gc.reborrow()
doesn't trigger garbage collection but something inside to_number
may do so.
Hence, passing first
as a parameter is perfectly legal here, especially since
it (again) does not put memory safety at risk. So, we can use the unbind
method to release first
from the GcScope
borrow before it is passed to the
method:
let first = silly_example(agent, gc.reborrow()).unbind().bind(gc.nogc());
let result = to_number(agent, first.unbind(), gc.reborrow()); // No error.
This is our second problem and solution: When calling a method that takes
GcScope
, we cannot pass bound values into that same call. To fix this, we must
call .unbind()
on the parameter bound values. To be safe, this should only be
done at the call site and not before. The following would be incorrect:
// *Incorrect* usage!
let first = silly_example(agent, gc.reborrow()).unbind();
let result = to_number(agent, first, gc.reborrow()); // No error.
let other_result = to_number(agent, first, gc.reborrow()); // Uh oh, no error! `first` is now use-after-free!
This is also the reason why GcScope
is always the last argument.
Taking bound values as parameters in NoGcScope
functions
There is nothing particularly complicated here, again. These functions work in all ways very much like normal Rust functions:
fn my_method(
agent: &mut Agent,
value: Value,
gc: NoGcScope
) {
// ... do your thing ...
}
Because no garbage collection can happen within this function scope, the value
is guaranteed to stay valid until the end of the call.
Taking bound values as parameters in GcScope
functions
Once again, functions that may trigger garbage collection are the problem child.
When we receive a parameter value: Value
in a call, by normal Rust lifetime
rules the Value<'_>
's contained lifetime is guaranteed to be valid until the
end of this call. In our case, we do not want that to be the case; the lifetime
should be bound to the GcScope
but there is no way to make Rust perform this
binding automatically. We must thus perform it manually: You can think of this
as the mirror of the .unbind()
call we needed to perform when calling a
GcScope
function earlier.
fn my_method(
agent: &mut Agent,
value: Value,
gc: GcScope
) {
let value = value.bind(gc.nogc());
// ... do your thing ...
}
All bindable values should be bound in this way at the beginning of every
function that takes GcScope
: This is the most important thing in Nova's
garbage collector by far. If this rule is not upheld, then getting
use-after-free in the engine is trivial or even guaranteed. If this rule is
upheld, then getting any further use-after-free becomes nearly impossible.
Luckily, this is fairly easy conceptually: You need only to bind your parameters and you're good to go. Unfortunately, we know that "just doing the right thing" is not quite that easy (see null pointer checks). For that reason, we plan on implementing a custom lint to check this at build time.
Holding bindable values across GcScope
function calls
One final, important thing is left to discuss: We've seen how bindable values are formed and passed around, and how they become invalidated whenever a garbage collector triggering function call is made. This is great because it makes sure that we don't use these values after they may have been moved or free'd. But what if we need to use a bound value after a function call? For example, what do we do in this sort of situation:
fn example(
agent: &mut Agent,
arg0: Value,
arg1: Value,
gc: GcScope
) {
let arg0 = arg0.bind(gc.nogc());
let arg1 = arg1.bind(gc.nogc());
let start_index = to_length_or_infinity(agent, arg0.unbind(), gc.reborrow())?;
let end_index = to_length_or_infinity(agent, arg1.unbind(), gc.reborrow())?; // Error: arg1 is use-after-free
}
This will not compile, and it is a perfectly valid error: if garbage collection
is triggered by the first to_length_or_infinity
call, then arg1
's data would
be moved or removed. Continuing to use it would be well and truly mistaken.
But, we cannot just not use arg1
: we need both start_index
and end_index
so we need to somehow make sure that arg1
can be used and stays valid across
the first to_length_or_infinity
call. The answer to how to do that is,
emphatically, not let arg1 = arg1.unbind();
. That would be a big, big
mistake as it just makes the code compile but would not work correctly.
What we need to do, instead, is to "scope" or "root" arg1
. (The terms are
interchangeable in this context.) What this does is to write the arg1
Value
onto the Agent
heap, into a location that Agent
guarantees will not be
changed by the garbage collector, and returns a handle to it. This is a second
level handle, if you will: Our original Value
is a handle to some data on the
heap, and rooting it moves that onto the heap and returns a handle. This second
level handle has a different type, Scoped
and is defined automatically for all
bindable, rootable values.
trait Scopable: Rootable + Bindable
where
for<'a> Self::Of<'a>: Rootable + Bindable,
{
fn scope<'scope>(
self,
agent: &mut Agent,
gc: NoGcScope<'_, 'scope>,
) -> Scoped<'scope, Self::Of<'static>> {
Scoped::new(agent, self.unbind(), gc)
}
}
For our Value
, this would simplify to the following scope
function
implementation:
impl Value<'_> {
fn scope<'scope>(
self,
agent: &mut Agent,
gc: NoGcScope<'_, 'scope>,
) -> Scoped<'scope, Value<'static>> {
Scoped::new(agent, self.unbind(), gc)
}
}
Note that we're now using the second lifetime of the NoGcScope
, this is the
"scope" lifetime which works like your usual Rust lifetimes: a type carrying the
scope lifetime is guaranteed to be valid for the whole function call. From a
validity standpoint, this is because the Agent
removes these scope-rooted
handles from the heap only at outside of user-controlled code. Garbage
collection does not remove them, so they do not need to be bound to the garbage
collector lifetime.
Now that we know of scoping, this is how we would use it:
fn example(
agent: &mut Agent,
arg0: Value,
arg1: Value,
gc: GcScope
) {
let arg0 = arg0.bind(gc.nogc());
// We scope the argument value. The resulting scoped value is guaranteed to
// be valid until the end of this call.
// Note that we could bind and then scope the value, but that would not
// make any difference to the resulting code or its correctness.
let arg1: Scoped<'_, Value<'static>> = arg1.scope(agent, gc.nogc());
let start_index = to_length_or_infinity(agent, arg0.unbind(), gc.reborrow())?;
// To get our bindable value back out form a scoped value, we can use a get
// method on it. This returns a `Value<'static>`, which we can leave
// unbound because we're passing it as a parameter immediately.
let end_index = to_length_or_infinity(agent, arg1.get(agent), gc.reborrow())?;
}
With this, we've successfully held arg1
across a GcScope
function call,
ensuring that its data will not be removed by the garbage collector (it will
still be moved, but on the heap the scoped value's data will be updated to point
to the new location).
The Mastery
Now we know how to work with bindable values at function interfaces, and we know how to create scoped values and use them to keep bindable values from being invalidated by the garbage collector when we need them to stay. All that remains is the all important question of "how do I actually use these things?!" True mastery can only come from practice, from trying and failing, and trying again until the tools break in your hands. I've never known how to create masters, but I've seen the tools break in my hands a few times by now, so I can hopefully give you a few tricks.
These will be in no particular order, snippets of code that pose challenges and the answers therein. Koans, if you will.
Short-circuit returning bound values from GcScope
functions
If your function calls a function and immediately returns the result, you'll
find that using gc.reborrow()
or gc.nogc()
will not work.
fn my_method<'a>(
agent: &mut Agent,
gc: GcScope<'a, '_>
) -> Value<'a> {
// ...
to_number(agent, some_value, gc.reborrow()) // Error: Returning value bound to local variable.
}
fn my_method_2<'a>(
agent: &mut Agent,
gc: GcScope<'a, '_>
) -> Value<'a> {
// ...
Value::from_string(agent, "foo".into(), gc.nogc()) // Error: Returning value bound to local variable.
}
The reason for this is that our gc.reborrow()
and gc.nogc()
are not "true
reborrows", their contained 'a
lifetime is always guaranteed to be shorter
than the source GcScope<'a, '_>
is, but our return type requires the lifetime
to be equal to 'a
. (This isn't exactly the reason, actually, but it's
technically the same thing. The real reason is the temporary borrow. Whatever.
Deal with it.)
There are two ways to fix this. The first, preferred one, is to consume the
GcScope
variable to get an equal 'a
lifetime. This can be done by passing
the gc
directly (for GcScope
functions) or by using the gc.into_nogc()
method:
fn my_method<'a>(
agent: &mut Agent,
gc: GcScope<'a, '_>
) -> Value<'a> {
// ...
to_number(agent, some_value, gc) // No error
}
fn my_method_2<'a>(
agent: &mut Agent,
gc: GcScope<'a, '_>
) -> Value<'a> {
// ...
Value::from_string(agent, "foo".into(), gc.into_nogc()) // No error
}
Note that both of these actions invalidate all bound values; for the
gc.into_nogc()
method this is quite unfortunate as we actually know that it
signifies the end to all possibility of garbage collector triggering in this
call, and that henceforth all bound values are strictly guaranteed to stay valid
until the end of the call. If you are passing bound values into the call
together with the result of gc.into_nogc()
then you can use .unbind()
on the
parameters at the call site the same way as you'd normally use for a
gc.reborrow()
call.
The second way to handle this issue is by simply calling .unbind()
at the
return site: This is likewise perfectly fine, and perfectly safe. Especially if
you are passing bound values into a NoGcScope
call and returning the result
then this may sometimes be the cleaner option:
fn my_method<'a>(
agent: &mut Agent,
gc: GcScope<'a, '_>
) -> Value<'a> {
// ...
to_number(agent, some_value, gc.reborrow())
.unbind() // No error
}
fn my_method_2<'a>(
agent: &mut Agent,
gc: GcScope<'a, '_>
) -> Value<'a> {
// ...
Value::from_string(agent, "foo".into(), gc.nogc())
.unbind() // No error
}
The question is: How to ensure that a bound value lives long enough to be
returned? The answer is, by consuming the GcScope
. Or when under duress, by
unbinding the value at the site of return.
Splitting a function's tail or branch into a NoGcScope
scope
Oftentimes, functions will call into GcScope
functions at the start but later
move to operating purely with NoGcScope
functions. In this case you might want
to use gc.into_nogc()
after the last gc.reborrow()
call to ensure that the
rest of the work does indeed happen without any chance of garbage collection.
Unfortunately, when calling gc.into_nogc()
any existing bound values get
invalidated and cannot be used in the "tail" of the function.
// Some bound string values we've in the scope after parameter massaging.
let s: String<'_>;
let fill_string: String<'_>;
// NoGcScope tail of the function starts.
let gc = gc.into_nogc();
do_work(agent, s, fill_string, gc); // Error: s and fill_string are bound to GcScope which was consumed.
Here it is, as the only exception to the rule, okay to temporarily unbind bound values to a local variable and rebind them to pass them into the tail:
// NoGcScope tail of the function starts.
let (gc, s, fill_string) = {
let s = s.unbind();
let fill_string = fill_string.unbind();
let gc = gc.into_nogc();
(gc, s.bind(gc), fill_string.bind(gc))
};
Or alternatively:
// NoGcScope tail of the function starts.
let s = s.unbind();
let fill_string = fill_string.unbind();
let gc = gc.into_nogc();
let s = s.bind(gc);
let fill_string = fill_string.bind(gc);
Note that it is important to make sure that the new bound variable names
(created after gc.into_nogc()
) must shadow the unbound variable names for
safety.
Avoiding scoping on fast paths
Scoping of variables is purposefully made to be cheap and in general it shouldn't be something you need to particularly worry about in your code. As an example, stack-only data is never put into the heap during scoping but is kept unchanged on the stack. Still, it is extra work that the engine must do and in many cases it is purely unnecessary.
For instance, a builtin method that takes a string and two number parameters
will usually be defined as calling ToString
and ToNumber
methods on the
parameters. These can methods call arbitrary JavaScript through a toValue
function defined on an object passed in as a parameter, and hence they can
trigger garbage collection. The builtin method must be prepared for this
possibility, so scoping would normally be necessary.
fn builtin_method<'a>(
agent: &mut Agent,
_this_value: Value,
arguments: ArgumentsList,
gc: GcScope<'a, '_>
) -> JsResult<Value<'a>> {
let a = arguments.get(0).bind(gc.nogc());
let b = arguments.get(1).scope(agent, gc.nogc());
let c = arguments.get(2).scope(agent, gc.nogc());
let a = to_string(agent, a.unbind(), gc.reborrow())?.unbind().scope(agent, gc.nogc());
let b = to_number(agent, b.get(agent), gc.reborrow())?.unbind().scope(agent, gc.nogc());
let c = to_number(agent, c.get(agent), gc.reborrow())?.unbind().bind(gc.nogc());
let a = a.get(agent).bind(gc.nogc());
let b = b.get(agent).bind(gc.nogc());
}
At the end here we have bound values a
, b
, and c
on the stack and
everything is fine. There are two unfortunate parts here though: First, we've
called scope
four times to scope three values and second, the usual case is
going to be that these parameters were already of the correct type. In that case
these methods are guaranteed to not call into JavaScript, and are actually
guaranteed to return the values as-is. None of the methods are likely to have
any chance of triggering garbage collection.
We have two main ways to avoid the mostly-unnecessary scoping:
- Conditional fast paths for correctly typed calls.
- Try method variants.
Additionally, the Scoped
type has some unsafe methods that can be used to cut
down on the amount of work that scoping needs to perform.
Conditional fast paths
A conditional fast path is one where the code checks its preferred types ahead of time, and avoids all potentially garbage collection triggering method calls if the types match. The simplest and most targeted fast path for the above example method would be as follows:
let a = arguments.get(0).bind(gc.nogc());
let b = arguments.get(1).bind(gc.nogc());
let c = arguments.get(2).bind(gc.nogc());
if let (Ok(a), Ok(b), Ok(c)) = (String::try_from(a), Number::try_from(b), Number::try_from(c)) {
// Fast path without conversions, scoping.
} else {
// Slow path.
}
This will only work for the exactly correct call of the method, so it is
maximally strict and (probably) maximally optimal. A more permissive but
somewhat less optimal fast path can, in this case, be created based on the
knowledge that primitive values will never cause a call into JavaScript in the
ToString
and ToNumber
methods. Nova includes special variants of these
methods for calling with primitive values:
let a = arguments.get(0).bind(gc.nogc());
let b = arguments.get(1).bind(gc.nogc());
let c = arguments.get(2).bind(gc.nogc());
if let (Ok(a), Ok(b), Ok(c)) = (Primitive::try_from(a), Primitive::try_from(b), Primitive::try_from(c)) {
// Fast path without scoping.
let a = to_string_primitive(agent, a, gc.nogc());
let b = to_number_primitive(agent, b, gc.nogc());
let c = to_number_primitive(agent, c, gc.nogc());
} else {
// Slow path.
}
This will allow a wider range of ways to call the method to enter the fast path but at the cost of still having to call conversion methods. There is no hard and fast rule for which one should be preferred: This will likely be an ongoing matter of personal preference and performance measuring.
Try method variants
Similarly to the primitive conversion variants, Nova also includes "try
variants" of various methods. These methods return a TryResult
(which is just
an alias of ControlFlow
) which tells if the method finished successfully and
the execution of the caller can continue (TryResult::Continue
is then the
returned variant), or if the method encountered a need to call into JavaScript
and thus had to break out early (in which case TryResult::Break
is returned).
These methods can be used to avoid splitting a method into a slow and fast path, at the cost of splitting smaller parts of the method into try and fallback paths. Doing so generally requires quite a bit of dancing around with mutable local variables, and adding mutable local optional scoped values to the method:
let a = arguments.get(0).bind(gc.nogc());
let b = arguments.get(1).bind(gc.nogc());
let c = arguments.get(2).bind(gc.nogc());
if let (
TryResult::Continue(a),
TryResult::Continue(b),
TryResult::Continue(c)
) = (
try_to_string(agent, a, gc.nogc()),
try_to_number(agent, b, gc.nogc()),
try_to_number(agent, c, gc.nogc())
) {
// Try succeeded, continue on the fast path.
} else {
// Slow path.
}
Avoiding excessive scoping
When a slow path is encountered, and we need to perform scoping then the best case scenario is that we only need the scoped value for that singular slow path and never afterwards. This is often not the case, however: A value will need to be conditionally scoped to pass through multiple slow paths. In this case, we'd do not want to re-scope a value over and over again. For example, this kind of code would make no sense:
let value: Value;
let value = value.scope(agent, gc.nogc());
gc.reborrow(); // Some GcScope calls.
let value = value.get(agent).bind(gc.nogc());
gc.nogc(); // Some NoGc calls.
let value = value.scope(agent, gc.nogc());
gc.reborrow(); // Some GcScope calls.
let value = value.get(agent).bind(gc.nogc());
This would scope the same value twice, which means that the value is stored on the heap in duplicate. Luckily we can avoid this.
When in doubt, scope
The simplest choice is to simply scope once and keep calling .get(agent)
on
the scoped value to get a bindable value for use as a parameter and so on. You
may want to name the scoped value as scoped_value
, and name the bound value as
value
. This way you can use the bound value as a parameter for a GcScope
call while also scoping it.
let value: Value;
let scoped_value = value.scope(agent, gc.nogc());
some_call(agent, value.unbind(), gc.reborrow());
other_call(agent, scoped_value.get(agent), gc.reborrow());
let value = scoped_value.get(agent).bind(gc.nogc());
This is a perfectly valid way to handle the question of "how do I deal with
Value
invalidation on gc.reborrow()
". It may not be the absolute peak of
optimal performance, but it will get you to where you need to be the fastest.
Optional scoped value
When using conditional fast paths, you probably want to avoid repeatedly scoping
the same values on each slow path. (It is also perfectly valid that slow paths
are allowed to be slow and you don't need to worry about re-scoping values
there. I'll allow it.) The way you can do this is to use an
Optional<Scoped<T>>
.
let mut value: Value;
let mut scoped_value = None;
let result = if let TryResult::Continue(result) = try_some_call(agent, value, gc.nogc()) {
result?
} else {
scoped_value = Some(value.scope(agent, gc.nogc()));
let result = some_call(agent, value.unbind(), gc.reborrow())?;
value = scoped_value.as_ref().unwrap().get(agent).bind(gc.nogc());
result
};
// other work ...
let other_value: Value;
let result = if let TryResult::Continue(result) = try_other_call(agent, other_value, gc.nogc()) {
result?
} else {
if scoped_value.is_none() {
scoped_value = Some(value.scope(agent, gc.nogc()));
}
let result = some_call(agent, other_value.unbind(), gc.reborrow())?;
value = scoped_value.as_ref().unwrap().get(agent).bind(gc.nogc());
result
};
This is pretty ugly, but it does the job: scoped_value
remains None
for the
entire duration of the call if we always pass through fast paths and only costs
us the stack space. On slow paths we scope our needed value if a previous slow
path didn't already do so, and re-read value
from the heap afterwards.
Changing values
In loops especially, it may happen that a particular value needs to be scoped for a short period of time and then is no longer needed. In loops specifically, another value will then naturally appear on the next loop iteration and needs to again be scoped for the same short period of time.
In these cases, you'd want to avoid allocating new value data onto the heap in
each loop iteration. For this purpose, Scoped
has the unsafe fn replace
function. To use this function, you'll want to prepare a Scoped
outside the
loop and call the replace
function on it when it is time to scope you value
within the loop. Preparing the Scoped
value outside the loop may be done
normally using value.scope(agent)
but for Scoped<Value>
specifically, you
can also use Value::Undefined.scope_static()
. This prepares only the stack
data for the Scoped
and enables calling the replace
function later.
let scoped_value = Value::Undefined.scope_static();
loop {
let v: Value;
// SAFETY: scoped_value is never shared.
unsafe { scoped_value.replace(agent, v.unbind()) };
gc.reborrow();
let value = scoped_value.get(agent).bind(gc.nogc());
}
The replace
function is marked unsafe
not because it can cause undefined
behaviour (it currently cannot) but because it can break the JavaScript virtual
machine's internal logic. If a scoped value is cloned and then passed as a
parameter to a call that calls replace
on the clone, then get
on the
original will return a (JavaScript-wise) different value than was originally
scoped. Thus, calling the function on any Scoped
that was received as a
parameter should generally be avoided unless you're very sure that it is not a
clone, or that no other caller will call get
on the value anymore.
Note: The API is not guaranteed to never perform new heap allocations for scoping. If the API is called on the same scoped value repeatedly, with every other call scoping a stack-only value (like a small integer or string) and every other call scoping a heap value (like an object), then the stack-only value calls will forget about the scoped heap allocation without explicitly releasing it, causing the heap value calls to need to allocate a new one.
Changing types
Sometimes a value needs to be scoped but is then later checked to be of a different type or gets converted to a different type with the original value never used afterwards, and needs to be scoped again. To keep the change, you'd want to reuse the previous scoped value but it's of a different type and won't work.
In these cases, you can use the unsafe fn replace_self
to change both the
value and the type of what is being stored at the same time. The safety
condition for this function is the same as unsafe fn replace
above: The scoped
value should not be a clone passed in as a parameter or if it is, you must be
very sure that it's not going to be used by other callers. Or, you must be very
sure that the actual value that is being stored by replace_self
isn't actually
functionally different. For example, converting to another bindable type via
TryFrom
kind of type conversions does not change the value but only narrows
the type through a check. In these cases, calling replace_self
on a cloned
Scoped
does not pose any threat to the JavaScript virtual machine's internal
logic.
let value: Value;
let scoped_value = value.scope(agent, gc.nogc());
gc.reborrow();
let Ok(object) = Object::try_from(scoped_value.get(agent).bind(gc.nogc())) else {
// Check failed.
return;
}
// SAFETY: Value isn't being changed, only type gets narrowed. Even clone is fine.
let scoped_object = unsafe { scoped_value.clone().replace_self(agent, object.unbind()) };
You can of course also use this API to change entirely unrelated scoped values to others, but in that case the safety requirements need to be fulfilled.
let value: Value;
let scoped_value = value.scope(agent, gc.nogc());
gc.reborrow();
let object: Object; // Not connected with value.
// SAFETY: scoped_value is never cloned and not received as parameter, ie. not
// shared. We can take over its scoped slot without anyone seeing it.
let scoped_object = unsafe { scoped_value.replace_self(agent, object.unbind()) };
The End
We're at the end, for now. But this is a guide and a guide is no good if it doesn't get updated. So, hopefully more will be added later. So long.