Reducing the pressure on the garbage collector by using the F12 developer bar of Internet Explorer 11

As you may know, I’m working on a 3D engine for WebGL (Babylon.js) during my spare time. A 3D engine is a place where matrices, vectors and quaternions live. And there may be tons of them!

Please note that everything done here applies to Internet Explorer 11 and Windows 8.1 apps developed with HTML5/JavaScript.

Removing non required instantiations

For instance, let’s have a look at this scene:

Using the F12 developer bar, you can launch a profiler to analyze what is going on from the point of view of the performance. The profiler has a Start/Stop button to capture a period of time and then give you this screen:

The drawElements function is the function that *EFFECTIVELY* renders the objects. We’re then not surprised to get it first. The second one (multiply) is about multiplying two matrices. This function is used a lot. Indeed, for each object, you have to compute the matrix required to draw it, the matrix to compute the position of every texture, etc…

multiply is used more than 12000 times during a period of 2 seconds!

Here is the code for this function:

BABYLON.Matrix.prototype.multiply = function (other) {
    var result = new BABYLON.Matrix();

    result.m[0] = this.m[0] * other.m[0] + this.m[1] * other.m[4] + this.m[2] * other.m[8] + this.m[3] * other.m[12];
    result.m[1] = this.m[0] * other.m[1] + this.m[1] * other.m[5] + this.m[2] * other.m[9] + this.m[3] * other.m[13];
    result.m[2] = this.m[0] * other.m[2] + this.m[1] * other.m[6] + this.m[2] * other.m[10] + this.m[3] * other.m[14];
    result.m[3] = this.m[0] * other.m[3] + this.m[1] * other.m[7] + this.m[2] * other.m[11] + this.m[3] * other.m[15];

    result.m[4] = this.m[4] * other.m[0] + this.m[5] * other.m[4] + this.m[6] * other.m[8] + this.m[7] * other.m[12];
    result.m[5] = this.m[4] * other.m[1] + this.m[5] * other.m[5] + this.m[6] * other.m[9] + this.m[7] * other.m[13];
    result.m[6] = this.m[4] * other.m[2] + this.m[5] * other.m[6] + this.m[6] * other.m[10] + this.m[7] * other.m[14];
    result.m[7] = this.m[4] * other.m[3] + this.m[5] * other.m[7] + this.m[6] * other.m[11] + this.m[7] * other.m[15];

    result.m[8] = this.m[8] * other.m[0] + this.m[9] * other.m[4] + this.m[10] * other.m[8] + this.m[11] * other.m[12];
    result.m[9] = this.m[8] * other.m[1] + this.m[9] * other.m[5] + this.m[10] * other.m[9] + this.m[11] * other.m[13];
    result.m[10] = this.m[8] * other.m[2] + this.m[9] * other.m[6] + this.m[10] * other.m[10] + this.m[11] * other.m[14];
    result.m[11] = this.m[8] * other.m[3] + this.m[9] * other.m[7] + this.m[10] * other.m[11] + this.m[11] * other.m[15];

    result.m[12] = this.m[12] * other.m[0] + this.m[13] * other.m[4] + this.m[14] * other.m[8] + this.m[15] * other.m[12];
    result.m[13] = this.m[12] * other.m[1] + this.m[13] * other.m[5] + this.m[14] * other.m[9] + this.m[15] * other.m[13];
    result.m[14] = this.m[12] * other.m[2] + this.m[13] * other.m[6] + this.m[14] * other.m[10] + this.m[15] * other.m[14];
    result.m[15] = this.m[12] * other.m[3] + this.m[13] * other.m[7] + this.m[14] * other.m[11] + this.m[15] * other.m[15];

    return result;
};

It is a bit brutal but there is nothing complex.

Things are going crazy when you use the F12 developer bar to track the responsiveness of your page:

As you can see the garbage collector (orange bars) is called very often! And this is not a good thing because it can lead to visual glitches due to interruption in your frames’ rendering.

On the same screen, you can also have more details:

This capture shows an important thing: The garbage collector runs on a background thread (12516) which is really good to free time for the render thread (you can see that the garbage collector runs simultaneously with the animation frame callback (babylon.js uses requestAnimationFrame to render each frame).

Even if the garbage collector of IE11 runs on a background thread, we have to reduce the memory pressure. This is because our code can run on a low end hardware where threads are not available or because not all browsers have a background garbage collector.

So as much as you can, do not rely on instantiation (the new BABYLON.Matrix() here). You should prefer reusing objects instead of creating new ones. The updated multiply function can be then:

BABYLON.Matrix.prototype.multiplyToRef = function (other, result) {
    result[0] = this.m[0] * other.m[0] + this.m[1] * other.m[4] + this.m[2] * other.m[8] + this.m[3] * other.m[12];
    result[1] = this.m[0] * other.m[1] + this.m[1] * other.m[5] + this.m[2] * other.m[9] + this.m[3] * other.m[13];
    result[2] = this.m[0] * other.m[2] + this.m[1] * other.m[6] + this.m[2] * other.m[10] + this.m[3] * other.m[14];
    result[3] = this.m[0] * other.m[3] + this.m[1] * other.m[7] + this.m[2] * other.m[11] + this.m[3] * other.m[15];

    result[4] = this.m[4] * other.m[0] + this.m[5] * other.m[4] + this.m[6] * other.m[8] + this.m[7] * other.m[12];
    result[5] = this.m[4] * other.m[1] + this.m[5] * other.m[5] + this.m[6] * other.m[9] + this.m[7] * other.m[13];
    result[6] = this.m[4] * other.m[2] + this.m[5] * other.m[6] + this.m[6] * other.m[10] + this.m[7] * other.m[14];
    result[7] = this.m[4] * other.m[3] + this.m[5] * other.m[7] + this.m[6] * other.m[11] + this.m[7] * other.m[15];

    result[8] = this.m[8] * other.m[0] + this.m[9] * other.m[4] + this.m[10] * other.m[8] + this.m[11] * other.m[12];
    result[9] = this.m[8] * other.m[1] + this.m[9] * other.m[5] + this.m[10] * other.m[9] + this.m[11] * other.m[13];
    result[10] = this.m[8] * other.m[2] + this.m[9] * other.m[6] + this.m[10] * other.m[10] + this.m[11] * other.m[14];
    result[11] = this.m[8] * other.m[3] + this.m[9] * other.m[7] + this.m[10] * other.m[11] + this.m[11] * other.m[15];

    result[12] = this.m[12] * other.m[0] + this.m[13] * other.m[4] + this.m[14] * other.m[8] + this.m[15] * other.m[12];
    result[13] = this.m[12] * other.m[1] + this.m[13] * other.m[5] + this.m[14] * other.m[9] + this.m[15] * other.m[13];
    result[14] = this.m[12] * other.m[2] + this.m[13] * other.m[6] + this.m[14] * other.m[10] + this.m[15] * other.m[14];
    result[15] = this.m[12] * other.m[3] + this.m[13] * other.m[7] + this.m[14] * other.m[11] + this.m[15] * other.m[15];
};

Almost the same thing without the instantiation! But removing 6000 instantiations per second can be a great optimization!

The point here is that you have to create a storage matrix for each operation (the matrix is created once inside the constructor and reuse each time the multiply operation needs to be used).

After doing the same thing for every function working with matrices, vectors, colors and quaternions, the responsiveness graph of babylon.js is far better:

Great isn’t it? Obviously the garbage collector will be called but in a less frequent way.

GC Friendly array object

I also found another solution to remove the memory pressure. Indeed, during a frame’s render, I have to use a lot of arrays to determine active objects, active shaders, active particles, etc…

So basically at the beginning of every frame, I was using this code:

this._activeMeshes = [];

But obviously even if the code is simple, it has a big impact on memory. That’s why I decided to create a new kind of array that is able to reuse the initially allocated space:

// Garbage collector friendly array
BABYLON.Tools.GCFriendlytArray = function (capacity) {
    this.data = new Array(capacity);
    this.length = 0;
};

BABYLON.Tools.GCFriendlytArray.prototype.push = function (value) {
    if (this.length >= this.data.length) {
        this.data.length *= 2;
    }
    this.data[this.length++] = value;
};

BABYLON.Tools.GCFriendlytArray.prototype.reset = function () {
    this.length = 0;
};

BABYLON.Tools.GCFriendlytArray.prototype.indexOf = function (value) {
    var position = this.data.indexOf(value);

    if (position >= this.length) {
        return -1;
    }

    return position;
};

With this small piece of code, you can have an array that can be reset in order to reuse its memory. You can create it with an estimated size and just call reset() to, well, reset it.

Using our new array is like using a standard array, you have a length property and a push function. The only difference is when you want to access data because you have to use myArray.data:

for (subIndex = 0; subIndex < activeMeshes.length; subIndex++) {
    activeMeshes.data[subIndex].render();
}

Some additional notes

Please note that the optimizations described here are useful in my case because I was looking for performances and I did not care about memory consumptions. Indeed, I had to use a lot of memory to create the required cached objects.

For instance, the GC friendly arrays have to reserve a lot of memory that will not necessary be used. The tradeoff between memory and performance must be taken seriously.

Going further

Please find here some great links about the F12 developer bar of IE delivered during Build 2013: