Sugar, spice, and everything nice... Part 1

Published by
Aapo Alasuutari

In the Why build a JavaScript engine? blog post I mentioned that "we do have an idea, a new spin on the ECMAScript specification." It is time to talk about that idea, or actually the first idea of many. This is the idea of the "backing object".

What makes an object?

The ECMAScript specification defines two main types of objects: The ordinary objects and exotic objects. An ordinary object is defined as an "object that has the default behaviour for the essential internal methods that must be supported by all objects". Any object that is not an ordinary object, meaning that it has one or more non-default behaviours for their internal methods, is thus an exotic object.

These definitions may not tell you much if you've never delved into the ECMAScript specification too deeply. When you run your average JavaScript code, you deal in both ordinary and exotic objects. Here are some examples:

const obj = {}; // an ordinary object
const func = () => {}; // an ordinary object
const func2 = func.bind(null); // an exotic object
const arr = []; // an exotic object
const map = new Map(); // an ordinary object
const ab = new ArrayBuffer(); // an ordinary object
const ta = new Uint8Array(); // an exotic object
const dv = new DataView(); // an ordinary object

This seems to be rather confusing: Why is Uint8Array an exotic object but the ArrayBuffer "within" it is ordinary? And, if an ArrayBuffer is "ordinary" then how can it contain the memory buffer for typed arrays to use but {} cannot? To solve this mystery, we must introduce the concept of "internal slots". All ordinary objects have [[Prototype]] and [[Extensible]] slots. The first defines what the object's current prototype is (accessible using Object.getPrototypeOf(obj)), and the second defines if the object accepts new properties (accessible using Object.isExtensible(obj)).

Some objects have additional internal slots. Let's take a look at the above examples but now mark those extra internal slots as well.

const obj = {}; // an ordinary object
const func = () => {}; // an ordinary object, extra internal slots
const func2 = func.bind(null); // an exotic object, extra internal slots
const arr = []; // an exotic object, extra internal slots
const map = new Map(); // an ordinary object, extra internal slots
const ab = new ArrayBuffer(); // an ordinary object, extra internal slots
const ta = new Uint8Array(); // an exotic object, extra internal slots
const dv = new DataView(); // an ordinary object, extra internal slots

This starts to make more sense from an intuitive sense: Plain JavaScript objects are ordinary objects with no extra internal slots. All other objects are either exotic or have some extra internal slots; we can lump these together as "sort of exotic objects" if you will. For these "sort of exotic objects", their extra internal slots are what gives them their awesome powers: It's their sugar, spice, and everything nice!

What's that objéct?

We now understand that plain objects are separated from all more "interesting" types of objects at the specification level by their differing internal methods (remember, exotic objects have non-default internal method behaviours) and/or internal slots.

With this basic idea in mind, let's play a little guessing game: You are an ECMAScript engine, and your job is to evaluate the first few steps of the following expressions. Try to think, what is it that you are looking for in each object: Is it an object property you need, or are you looking for something that is hidden from JavaScript code (that is, an internal slot)?

Slihouette #1

Let's start off small, just a function call:

func();

Now what do you, the engine, do first? Yes, correct! First you need to check that func is callable. At a JavaScript programmer level this could be done with typeof func === "function" but the engine has an internal way to check: It is the presence of the [[Call]] internal method on the object.

The next step would be to go find the function's source code and start executing that. (There are some setup steps but let's not get bogged down in the details.) Again from a JavaScript programmer's perspective you might know that you can get a function's source text using func.toString() but that is of course not what an engine does directly, and the toString function is not a property on the func object but on the Function.prototype object. That function must somehow be accessing the necessary information through the this parameter of the toString call, that is from the func object.

That information cannot be accessed from the func object within JavaScript. Thus we must conclude that the data must be held in some internal slot. Functions usually only have length and name properties, and even those can be deleted without affecting the functionality of the function object. Calling a function clearly does not rely on the object features of the function.

Silhouette #2

We'll continue with simple things, this time an ArrayBuffer being used as a parameter for a Uint8Array construction:

new Uint8Array(ab);

What do you, as an engine, do first? Yes, you check that the parameter is indeed an ArrayBuffer (let's ignore other parameter types). But how? You could check the prototype of ab but that is not guaranteed to be anything you expect: You can set the prototype of an ArrayBuffer to null and it will still work as a parameter to a TypedArray constructor as normal. If you look at ab's property descriptors using Object.getOwnPropertyDescriptors(ab) you will find there are none.

Yet an engine still knows an ArrayBuffer to be an ArrayBuffer: This must again be based on internal slots. The most important thing that makes an ArrayBuffer has nothing to do with its prototype or properties, but only with its internal slots.

Silhouette #3

Now it's time to rumble! Let's look at indexing into an Array:

arr[0];

Now, we're finally on traditional footing here. This is object property access plain and simple, right? You could have arr be an Array or a plain object and the code and the behaviour would be one and the same, right? From a specification standpoint you would be exactly correct, but: From a usage standpoint, the purpose of an Array is to act as a linear, access-by-index collection of values. An object's purpose is to act as something less defined, an acces-by-name collection of values.

Most if not perhaps all production engines out in the world make no difference between an object and an Array (except for the specification mandated handling of the length property). Both can hold indexed properties and named properties, and both store properties in the same way: Indexed properties go into an "elements" storage, and named properties go into a "properties" storage.

But now consider the indexing into an Array and forget that objects already have an "elements" storage in V8 and SpiderMonkey. Accessing the 0'th index in an Array does not depend on the properties of the Array, nor does it depend on the prototype of the Array (unless there is no 0'th entry). The length of an Array also tells us immediately if we are within the possible range of entries, or if we are accessing outside the bounds of the Array. Again we find that from a concrete usage standpoint, an Array does not depend on its object features.

What did we learn, reader?

We took a stroll through the underbrush that is objects in ECMAScript and how they are normally used. Now that we are on the other side, it may look like a bit of a clusterfuck. But here's what I want you to take out of this: Ordinary, plain objects are ordinary and plain. Everything beyond that is first and foremost defined by its internal slots, and sometimes by its internal methods or common usage.

This finally brings me to the concept of a "backing object". An ArrayBuffer is first and foremost the byte buffer it carries within it. The ECMAScript specification would have you believe that your ArrayBuffer should look like this:

struct ArrayBuffer {
    /// [[Prototype]]
    prototype: ObjectOrNull,
    /// [[Extensible]]
    extensible: bool,
    /// Object property storage
    properties: Properties,
    /// [[ArrayBufferData]]
    data: ArrayBufferData,
    /// [[ArrayBufferByteLength]]
    byte_length: usize,
    /// [[ArrayBufferDetachKey]]
    detach_key: Any,
}

You can then clean this up by putting the "common parts" into a ObjectBase struct that you share between all special object types. This seems entirely reasonable but there's a problem: The first half of the ArrayBuffer is meaningless fluff! The prototype is not needed for the ArrayBuffer to function! Its extensibility is basically of no concern! And assigning properties is likewise very rarely if ever done!

What we've done here is waste a ton of good memory for something that ough to have been one of the more efficient and performant building blocks in the ECMAScript specification. As a concrete example, an empty ArrayBuffer in Node.js takes 80 bytes (with pointer compression it's probably quite a bit less). If we assume that the data is a raw pointer and that the detach_key is one likewise, the actual ArrayBuffer parts take up 24 bytes (32 bytes if we account for growable ArrayBuffers, which V8 does). More than half the object size is taken up by things that are never used.

What can we do to improve this? Let's get rid of the fluff! Here is what we'll do: We rip out the ObjectBase and put it on the side somewhere. Then, we use some pointer-tagging magic to get a two-variant pointer crammed into a single pointer-sized slot. These variants will be the ObjectBase pointer variant and the Realm pointer variants. Our ArrayBuffer struct then becomes this:

struct ArrayBuffer {
    /// "[[BackingObject]]"
    backing_object: ObjectBaseOrRealm,
    /// [[ArrayBufferData]]
    data: ArrayBufferData,
    /// [[ArrayBufferByteLength]]
    byte_length: usize,
    /// [[ArrayBufferDetachKey]]
    detach_key: Any,
}

Suddenly that looks a lot nicer. The backing_object is now a tagged pointer that either points to an ObjectBase struct, or it points to an ECMAScript Realm. The size of the entire struct is now only 32 bytes. It grows to 40 if we support growable buffers with the same struct.

Now for how this works. Initially, when a new ArrayBuffer is created, the backing_object points to the Realm that created it. Later, if a property is assigned into the ArrayBuffer object or its prototype is changed from the default %ArrayBuffer.prototype% (of the pointed-to Realm), a new ObjectBase is allocated (in the proper Realm) and the backing_object pointer is set to point to that struct.

But: No one does that, for the absolute most part. And if they do, they are likely using those additional properties or changed prototype only rarely. They are probably using the ArrayBuffer's object features because it is convenient to be able to treat it as an object in addition to it being an ArrayBuffer, not for purely the object features themselves. Those extra properties are still a secondary concern to the actual ArrayBuffer usage, for if it were not then they would have used a plain object.

So, this is what we do: For every exotic object and for every object with additional internal slots we replace the ordinary object internal slots and property storage with a ObjectBaseOrRealm pointer. Let's take a look at some examples.

Here's what an Array looks like after this transformation:

struct Array {
    /// "[[BackingObject]]"
    backing_object: ObjectBaseOrRealm,
    /// Pointer to the elements backing store
    elements: Elements,
    /// Length of the Array
    length: u32,
}

And this is what a TypedArray like Uint8Array looks like:

struct TypedArray {
    /// "[[BackingObject]]"
    backing_object: ObjectBaseOrRealm,
    /// [[ViewedArrayBuffer]]
    viewed_array_buffer: ArrayBufferPointer,
    /// [[TypedArrayName]]
    typed_array_name: StringPointer,
    /// [[ContentType]]
    content_type: TypedArrayContentType,
    /// [[ByteLength]]
    byte_length: usize,
    /// [[ByteOffset]]
    byte_offset: usize,
    /// [[ArrayLength]]
    array_length: usize,
}

You probably get the point: The object features of specialized objects disappear. A function only carries in it those parts that it must for function calling to work. All else is delegated to the backing object. The end result is a slim engine where most of the time your JavaScript objects are only the things you need them to be, and nothing more.