Quantum Flow Engineering Newsletter #1

A while ago a number of engineers including myself started to look into a performance project that turned into Quantum Flow.  The focus of the project is finding and prioritising the issues across the entire browser so we will need help from many of you to get them fixed.  I’m planning to write regular updates about the project and highlight the focus areas and the ongoing work.  In this first email I’m going to start by giving some background about how we started and where we are now.

Quantum Flow is a performance task force focusing on eliminating performance cliffs in the browser that aren’t part of other Quantum projects.   Project Quantum’s overall focus is to deliver a high performance browser engine, and we are making some great progress on the four main sub-projects that are attacking large portions of the rendering pipeline, but that leaves us with various performance issues elsewhere in the browser which users may still hit, and we have to fix all such issues to ensure that the ultimate result is a next generation browser (and browser engine) we all can be proud of.

A good way to think about how Quantum Flow fits with the rest of Quantum projects is to imagine it as the foundation we need for the other projects to build up on.  For example, if a bad bug somewhere in the browser causes a jank in some code for a few hundreds of milliseconds, all of the benefit that we obtain from cooperative scheduling of JS on the page with Quantum DOM, resolving the styles in parallel on all of your CPU cores with Quantum CSS and rasterize the page directly on the GPU with Quantum Render will still result in a janky experience[0].  e want to ensure that we remove these types of roadblocks that would prevent the rest of Quantum to shine.

The above description may feel a little big vague, and a little bit too broad, so let me try to explain how the need for Quantum Flow became apparent.  Around the beginning of this year a number of us gathered for a work week in Taipei with the goal of measuring and improving the performance of Firefox on a few large websites that we knew we had performance problems on.  Initially we were only focusing on Google Suite[1], and we started by profiling some of the test cases run by the Hasal framework[2].

We had a bit of a difficult time finding actionable issues that we could improve since these websites are massive and it can be extremely difficult to find out why the overall time of some particular interaction is different when comparing different browsers head to head.  Also, we started seeing some performance issues on those websites that were coming from parts of the browser that were a bit surprising.  For example, Chris Pearce found out that on Google Docs, the content process can be blocked on the parent process for a synchronous IPC message to initialize spell checking[3] even though Google Docs doesn’t use the browser’s spell checking facilities!

Following the breadcrumbs, we started to wonder what else we can learn about if we profile more usage scenarios in the browser.  As you may expect, we have found a fair amount of performance issues in various parts of the browser.  That’s hardly a surprise given the size and complexity of the code base, but we have also learned a lot about the adverse impact of some of these issues at play here.  These findings have uncovered larger problem areas that we decided we need to address as part of an initiative that we call Quantum Flow.

I’m planning to focus on one important class of performance issues in this first email, mostly because it’s probably the most prevalent of the issues we have been looking at so far: synchronous IPC messages from the content process to the parent process.  We currently have a high number[4] of these types of messages.  But of course not every one of these messages is equal, we have gathered telemetry[5] on them.  We have a tracker bug[6] to track fixing them all.

Some people here may remember the impact of synchronous I/O on the performance of Firefox a few years ago, or you may have had to deal with such performance issues in other applications.  Based on my experience measuring synchronous IPC, I now sometimes miss synchronous I/O.  🙂  I have seen synchronous IPC calls that take amount to *seconds* of pause time on the content process’s main thread.  To some extent, with e10s, we hide some of the pauses that happen on the content process.  For example, APZ[7] allows you to scroll even when the content process main thread is busy and we can force-paint on tab switch when the content process is busy running JS[8], but eventually some user out there is going to want to interact with the page, and that’s when the input events are going to be handled with a noticeable lag, and the browser is going to behave sluggish.

To give a couple of examples of the really painful performance penalty that we are paying as a result of these synchronous IPC messages, consider the document.cookie API[9] and the window.screen API[10].  Both of these are pretty old APIs in the Web platform which are used by millions of web pages, and we implement them by pausing the content process’s main thread and sending a request to the parent process before we return to the JS code running on the page.  This means that a loop somewhere on a page that accesses document.cookie, for example, can potentially run for several seconds, even as the page is sitting in the background.  In one exceptionally bad scenario that I personally hit on Nightly, an ad iframe in a background page was querying the cookies with a high frequency and the work load coming from just that one page was effectively making one of my two content processes unusable as the main thread was almost always busy waiting on the parent process.

I’ll stop here, but I’m planning to send these updates regularly, about once a week or so.  In each newsletter, I’ll tell a short story about a performance aspect of the browser that we have been looking at as part of the Quantum Flow project, will talk a little about the current focus areas, and will also include a short section to appreciate the help of all of the engineers who contributed to the Quantum Flow project in the week since the last newsletter.  The performance story of the next week’s issue will be about page navigations.

Please let me know if you have any ideas for making the format more useful (preferably off-list).  Making a fast browser is a really important goal for us this year, and I hope you find this newsletter informative and helpful for that goal.  We can always use your help in this project.  Please get in touch with us on #flow on IRC and Bugzilla[11] if you’re wondering how you can help!

[0] Of course, I’m only focusing on performance here.  Quantum Compositor’s benefits are obviously orthogonal to other performance pitfalls.

[1] That is, Google Docs, Google Sheets and Google Slides.

[2] https://github.com/Mozilla-TWQA/Hasal

[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1330912

[4] The full list is available here: https://hg.mozilla.org/mozilla-central/file/tip/ipc/ipdl/sync-messages.ini

[5] The current probe is called IPC_SYNC_LATENCY_MS, but it may be soon renamed in bug 1337073.

[6] https://bugzilla.mozilla.org/show_bug.cgi?id=SyncIPC

[7] Asynchronous panning and zooming, which basically means not blocking on the content process when scrolling, and “checkerboard” (or show a blank area temporarily) if the content process can’t paint quickly enough.

[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1279086

[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1331680

[10] https://bugzilla.mozilla.org/show_bug.cgi?id=1194751

[11] https://bugzilla.mozilla.org/show_bug.cgi?id=QuantumFlow

Posted in Blog Tagged with: , ,