Why JavaScript needs cross-heap collection

2
Official Construct Team Post
Ashley's avatar
Ashley
  • 21 Jan, 2018
  • 1,294 words
  • ~5-9 mins
  • 3,370 visits
  • 0 favourites

Web Workers allow a page to create multiple JavaScript contexts which run in parallel to each other. This is great for performance and an important way to avoid heavy work janking the page. The memory for each context is usually referred to as a heap. To make Web Workers easier to use, there needs to be a way for garbage collection (GC) to work across heaps. This blog post will outline how I came to this conclusion.

Moving code to Web Workers

In an ideal world, we could conveniently move all a page's JavaScript code in to a Web Worker. This would mean JavaScript execution itself would never jank the page, and makes it far more amenable to running on multiple CPU cores — important for peak performance since almost all modern devices have two or more cores. This is much like the separation between the UI thread and logic thread that other app frameworks use.

However moving code to a worker is difficult. Workers have significantly fewer capabilities: they can't access the DOM (e.g. to create or modify HTML elements), they can't listen to DOM events (e.g. getting touch/pointer events), and many APIs are missing entirely (e.g. Web Audio, video playback, media streams, WebRTC and more). This means most of the time you can't simply copy-and-paste code to a worker — something will be unavailable and break. You can work around it by posting messages around, but it adds a lot of complexity. So the current status quo is that pages still run a lot of their JavaScript code on the main thread, with all the performance pitfalls that entails.

Multi-process is not always multi-threaded

Additionally, an important but lesser-known point is that some modern browsers like Chrome are multi-process, but not truly multi-threaded. Normally JavaScript runs on the "main thread" (the thread with access to the DOM). To save on resources most browsers limit how many processes they can use. A simplified example might be sharing 4 processes between all open tabs, so if you have 20 tabs open, each process manages 5 tabs. However within each process there may only be one main thread! This means you have multiple tabs all vying for the same main thread, and they can all jank each other by using too much. Background tabs can be suspended, but this problem also applies to iframes and new windows on the current tab.

Using Web Workers also avoids this problem and allows pages and frames to all run in parallel, with no risk of being held up by other pages or frames.

Using a library to fill the gaps

JavaScript provides a very interesting kind of object called a Proxy. This essentially allows you to intercept normal object operations will callback functions. For example you can get a callback whenever someone sets or gets a property on the Proxy, and implement the behavior with custom logic.

It's possible to use Proxies to automatically redirect accesses to another context. For example, Proxies in a Web Worker can create a 'document' object and redirect all calls to the real document on the main thread. Now you can use the full DOM APIs in a worker! Alternatively it can go the other way round: Proxies on the main thread can redirect calls to a Worker, more conveniently offloading heavy processing to a Worker without having to manage a message-passing system.

This is not just a theory: I wrote an experimental library called via.js that uses this approach to give workers full access to DOM APIs. It has real working examples and in many cases you can use identical code in both the main thread and worker. Google also developed Comlink that lets you create and use objects in a worker, more or less just by prefixing calls with the 'await' keyword.

Unfortunately, both libraries currently have a fatal flaw.

The inherent memory leak

The way these libraries work is by creating pairs of objects: a local Proxy, and a remote-controlled object. These objects are in different JavaScript contexts. The Proxy forwards all calls to the real object by posting messages and getting responses back.

The remote context has to remember the association. For example via.js sends messages like "call function on object ID 3", and has to find the real object with that ID. In the case of Comlink, it gives every object pair its own MessageChannel.

JavaScript is garbage-collected so when you're done with a Proxy, all references to it are dropped. At some point the garbage collector notices it has no references and deletes its memory. Unfortunately this does not clean up the remote-controlled object. In the case of via.js there is still an entry in the ID map, and in Comlink the MessageChannel still references the object, so it's held in memory.

It's currently impossible to work around this. Requiring that lots of 'free' calls be added in to code defeats the goal of easily moving code to a worker, and is highly error-prone anyway. Also, garbage collection in JavaScript is not observable. This means you can't get a callback when a Proxy is collected and explicitly send a message to clean it up on the other side as well. Browser engineers are very reluctant to add features that allow this, since it can become a compatibility problem as code ends up depending on the precise timing of GC.

Cross-heap collection

If we don't want to allow pages to observe GC, then there's only one other option: the JS engine itself must know that the Proxy references the real object, even though they're in different contexts. Then if all references to the Proxy are dropped, then the GC knows that the corresponding object in a different heap is also unreferenced, and can now be collected. Hence, cross-heap collection.

Unfortunately I'm told engines aren't currently architected for this and it would be very difficult to add. The lowest-hanging fruit is probably to fix Comlink's approach. This only requires one special case in the GC: if both ends of a MessageChannel are unreferenced, then allow the MessageChannel to be collected. This appears to be the absolute minimal feature necessary to properly collect both objects. The downside is creating a MessageChannel per object likely adds a significant perfomance overhead, which may discourage moving code to a worker. It would be great to add a low-overhead alternative to ensure worker code is still performant.

I came up with another idea based on a special object called WeakKey, which allows cross-heap collection without making GC observable. This allows library code to explicitly indicate the cross-heap relationship between objects. However this would require architectural upgrades to JavaScript engines.

Finally we could bite the bullet and make GC observable, so we can send explicit "object was collected" messages. The WeakRef proposal would make this possible, but as mentioned many browser engineers are sceptical.

In the mean time, the only good solution is still to roll your own message posting system. In a lot of cases this is pretty inconvenient, so it's a shame we can't yet build a library to make this easier.

Conclusion

Libraries could make it far easier to move code to a worker, which would bring great performance benefits and matches the standard architecture in many existing UI frameworks. Unfortunately memory management for these libraries is currently an unsolved problem. There are a variety of possible solutions, but they have various trade-offs or involve a lot of work. So this could take a while to solve. Meanwhile we'll have to keep running code on the main thread, or managing message-posting systems.

I filed crbug.com/798855 to follow up on cross-heap collection with MessageChannel. As far as I know other browsers aren't working on this yet, so more bugs need to be filed!

Subscribe

Get emailed when there are new posts!