Quantum Flow Engineering Newsletter #4

As promised (with a day of delay), here is an update on what happened in the last two weeks on making Firefox faster as part of the Quantum Flow project.

Last week we had a big work week at the Mozilla Toronto office. Many members of the various teams were attending and the week was packed with a lot of planning around the performance issues that have been identified in each area so far, and what we are planning to do in each area for Firefox 57 and beyond. I tried to attend as many of the discussions as I could, but of course many of the discussions were happening concurrently so I'm sure a lot of details is going to be missing, but here is a super high level of some of the plans that were being discussed.

DOM. In the DOM team there are several plans and projects under way which will hopefully bring various performance improvements to the browser. Probably the largest one is the upcoming plans for cooperative scheduling of tasks, which will allow us to interrupt currently executing JavaScript on background pages in order to service tasks belonging to foreground pages. You may have seen patches landing as part of a large effort to label all of our runnables. This is needed so that we can identify how to schedule tasks cooperatively. We are planning to also soon do some work on throttling down timeouts running in background pages more aggressively. More details will be announced about all of these projects very soon. Furthermore we are working on smaller scale performance improvements in various parts of the DOM module as new performance issues are discovered through various benchmarks.
JavaScript. In the JavaScript team there have been several streams of work ongoing to work on various improvements to the various aspects of our JS execution. Jan de Mooij and colleagues have been running the CacheIR project for a while as an attempt to share our inline caches (ICs) between the baseline and Ion JIT layers. This helps with unifying the cases that can be optimized in these JIT layers and has been showing meaningful improvements both on real web pages and benchmarks such as Speedometer. They have also been looking at various opportunistic optimizations that also help performance issues we have identified through profiling as well. Another line of investigation in the JS team for a while has been looking into this bug. We have some evidence to suggest that our JIT generated code isn't very efficient in terms of the CPU instruction cache usage, but so far that investigation hasn't resulted in anything super conclusive. Another extensive discussion topic was GC scheduling. Right now the way that our GC (and cycle collection) scheduling works is pretty dis-coordinated between SpiderMonkey and Gecko, and this can result in pathological cases where for example SpiderMonkey sometimes doesn't know that a specific time is an unfortunate time to run a long running GC, and Gecko doesn't have a good way to ask SpiderMonkey to stop an ongoing GC if it detects that now would be a good time to do something else, etc. We're going to start to improve this situation by coordinating the scheduling between these two parts of the browser. This is one of those architectural changes that can have a pretty big impact also in the longer term as we find more ways to leverage better coordination. Another topic that was discussed was improving the performance of our XRay wrappers that provide chrome JS code access to content JS objects. This is important for some front-end code, and also for the performance of some Web Extensions.
Layout. In the Layout team, we are focusing on improving our reflow performance. One challenge that we have in this area is finding which reflow issues are the important ones. We have done some profiling and measurement and we have identified some issues so far, and we can definitely find more issues, but it's very hard to know how much optimization is enough, which ones are the important ones, and whether we know of the important problems. The nature of the reflow algorithm makes it really difficult to get really great data about this problem without doing a lot of investigation and analysis work, and we talked about some ideas on what we can do to improve our work flows, but nobody seemed to have any million dollar ideas. So at the lack of that we won't be waiting for the perfect data to arrive and we'll start acting on what we know about for now. Through looking at many reflow profiles, we have also developed some “intuitions” on some patterns on the types of expensive things that typically show up in layout profiles, which we are working on improving. There are also some really bad performance cliffs that we need to try to eliminate.
Graphics. In the Graphics team, we are planning to make some performance improvement to display list construction by retaining and incrementally updating them instead of reconstructing them every time. This is an extremely nice optimization to have since in my experience display list construction is the bottleneck in many of the cases where we suffer from expensive paints, and it seems like we have telemetry data that confirms this. The graphics team is also looking into doing some optimizations around frame layer building and display list building based on measurements highlighting places where things could be improved.

Another thing that happened last week was that I gave a talk on Friday as an introduction on how to use the Gecko Profiler to find performance issues in Firefox. During the week a few people had expressed interest to sit down with me and look over my shoulder as I use the profiler to analyze some performance problems, and due to the lack of time to sit down with people 1:1 we decided to do a recorded talk. This was decided a few minutes before the talk happened. :-) So, I didn't really have anything prepared in advance, which was both good and bad, since the talk is basically me live profiling Firefox, and it shows how you can start from scratch and go all the way to a bug report. The recording is here: <https://air.mozilla.org/gecko-profiler-introduction/>, if you're interested in learning how to use the profiler and/or how to read and analyze a profile you may want to check it out! BTW if you ever felt like you could use documentation and/or training material on how to use a profiler (or how to do that more effectively) please feel free to contact me privately with ideas on what you would like to learn. We haven't been so great on the documentation front, and many people have been pointing this out to me, and I hear and acknowledge your feedback! I'd like to try to improve this situation, and your feedback on what you'd like to learn about will help prioritize what's more important.

In hindsight, looking back at the work week, I wish we had more people from the front-end teams also attend the work week. The planning of the work week happened a couple of months or so earlier but due to the nature of this project, the measurements lead us to the various areas of the code base, and now we have a fair amount of issues in the front-end code and I think perhaps the work week was a bit too much focused on Gecko. But on the flip side, I also found the breadth of topics to cover in one week a bit too much, so perhaps adding more people and more meetings to the agenda wouldn't have been a net benefit. A lesson to learn for the next time!

On the bug fixing front, we had a great couple of weeks, there is a long list of really great work that happened, and a list of really amazing individuals to be recognized and appreciated for their contributions. As always I know I'll be missing a few people, so apologies in advance for that!

Mike Conley made some improvements to our perceived session restore performance by making us smarter about when we update the UI.
Mike Conley also instrumented our tab closing times so that we can have some telemetry data on some huge tab closing improvements that he is planning to work on.
Mike Conley also fixed a recent tab switching performance regression.
And as if none of this was enough, Mike also removed a sync IPC message used when restoring a session!
Markus Stange made sure (really!) that the compositor process on Windows shows up in the profiles captured by the Gecko Profiler.
Shih-Chiang Chien implemented retargeting of Necko data delivery notifications to background threads for the content process. This is an important optimization that avoids a round trip to the main thread for things like feeding the data to the HTML parser when loading a web page, for example, which runs off the main thread. We had this optimization before e10s and it's nice to have it again for the content process now!
Bill McCloskey removed an old expiring telemetry which wasn't needed any more and was slow. I don't think anybody is going to miss this old probe!
Florian Quèze improved the performance of the ctrl+tab switcher code.
Dão Gottwald and Jared Wein switched our tab throbbers to use CSS animations. This is super nice since if the main thread of the parent process janks during page load, now the animation of the tab throbber will run on the compositor and can proceed smoothly. Also thanks to Jonathan Watt and Daniel Holbert for providing some SVG help on that bug.
Gijs Kruitbosch massively improved the performance of importing data from Chrome. The importance of changes like this can't be overstated, this is precisely when you don't want to turn potential users away from using Firefox. :-)
David Major found and fixed a quadratic algorithm for insertion of generated content in Gecko.
Felipe Gomes created a tool to help find code that reads the same preferences over and over again which could benefit from being ported to use a preference observer.
Tooru Fujisawa optimized the creation of really short strings in SpiderMonkey. As it turns out, really short strings are super common on the Web. Some further investigation is also ongoing.
Jan de Mooij improved the performance of atomizing strings in SpiderMonkey.
Chris Pearce moved the media cache away from sync IPC.
Kris Maglione optimized some of the core Add-on SDK modules. And some of the code thereabouts as well.
Kartikaya Gupta added some telemetry measures for compositor frame throughput during scrolling/animations.
David Teller reduced the allocation overhead of the performance monitoring code in SpiderMonkey.
Nika Layzell added some telemetry on sync IPC triggered from JavaScript.
Nika Layzell also removed the sync IPC that the permission manager used for its initialization, improving our page navigation speed with multi-e10s, while winning some privacy benefits by reducing the set of permissions that the content process knows about to only those of the web pages the content process has loaded.
Kan-Ru Chen removed all of the sync IPCs used by the screen manager API. This is one of the biggest sync IPC problems that we currently have and one of the largest sources of the pauses of the content process main thread that we currently have, it's great to see it finally fixed!
Nicholas Hurley switched a couple of usages of UUIDs in Necko to integers. It turns out that generating UUIDs and converting them to strings in hot code paths can be expensive and should be avoided at all costs!
David Anderson improved some of the sync IPC situation with the compositor. The overall issue is difficult to address and it's great to see the low hanging fruit to be fixed here!
Greg Tatum and Julien Wajsberg improved the profiler UI by adding a context menu that assists in copying information out of the UI, making it unnecessary to have to use devtools to delve into the DOM to copy out information from there! :-)
The Firefox Screenshots team were very responsive to feedback about assessing any performance issues with the upcoming Firefox Screenshots feature.