Quantum Flow Engineering Newsletter #25

The Quantum Flow project started as a cross-functional effort to study and fix the most serious performance issues of Firefox affecting real world browsing use cases for the Firefox 57 release. Thanks to the hard work of everyone who helped us along the way, we believe that we have managed to fix a significant portion of the issues discovered in the past seven months or so that this project has run and have managed to achieve the performance goals that we had set for ourselves.

A Short Retrospective

Looking back at the past months, the Quantum Flow project went through three different stages in terms of the type of issues we focused on (even though these stages weren't consecutive in time). In the first stage, we focused most of our energy on gathering information about performance issues and planning work that would fit in the scope of Firefox 57. A lot of time was spent doing profiling and measurements to gather evidence around the most problematic areas of the code needing attention. We also spent time thinking about what parts of our plans may or may not finish in time for Firefox 57, and thought about which issues were the most urgent to fix, and which ones could be deprioriotized to future releases. In this time we had a work week to consult various platform teams for help in various areas of their expertise.

After the scope of all the work that was necessary to be done became more clear, we knew that we needed help from other teams to oversee some aspects of the ongoing work independently. We started to work with the Firefox front-end team to ask for their help on reducing synchronous layout/style flushes and scheduled timers in the UI code. That relationship grew with their effort on reducing the browser start-up time (which has been a great success so far!) and eventually with the Photon Performance project in place, the amount of effort on the Quantum Flow side to keep track of the UI performance was greatly reduced, which was a huge help. Another successful example here was working with the layout team to improve reflow performance. During our measurements we had seen a lot of reflow performance issues, which were not easy for us to diagnose and analyze. The layout team was really busy with a lot of ongoing work all along, but they did a great job at keeping track of the issues we reported to them and fix the ones which were important. They also didn't stop there, a lot of great work happened based on other independent investigations which would benefit reflow performance in these past few months.

The last stage and the longest running one perhaps was a cross-functional effort at fixing the bugs we had discovered and prioritized which I'm sure you are all familiar with by now. There were some challenges to overcome here. Perhaps the most obvious was the sheer number of bugs at hand. I remember triage meetings with 50+ bugs to go through, and we had to be careful to not miscategorize something or not miss something important. The other challenge was the amount of time we had and the number of people who could help with fixing the bugs. The number of person-hours of help we could get from each team depended on the existing workload of the team and their other priorities, so sometimes it was hard to predict how much help we can count on in a given area especially in the earlier stages. As more people got involved, we needed to do more communication to make sure things kept moving forward, and nobody was blocked on something where we could give help. Because of the number of bugs on our plate, we also always needed more help! We continually thought about new ways of seeking help from more people and teams. Due to the short amount of time that we had before our final deadline for the development of Firefox 57, we had to experiment with ideas to deal with these issues quickly, see what works, abandon what didn't, and rinse and repeat.

Over this time, at the time of this writing, we triaged 895 bugs in total. We used a three-tier priority scheme, and out of these, 277 P1 bugs, 38 P2 bugs and 54 P3 bugs were fixed (as in, marked FIXED, not counting other resolutions such as WONTFIX, DUPLICATE, etc.). We mass-moved all remaining open P1 bugs to P2 (because the definition of P1 was bugs that we want to fix for the 57 release), and we are now left with 141 P2 open bugs and 133 open P3 bugs.

The Future of Quantum Flow

From the beginning of this project, it was clear that we are going to discover more issues that we will have time to fix for Firefox 57, so the fact that we have so many open bugs is a good thing. We tried to be intentional in what we focus on first for Firefox 57, but we didn't expect to stop there.

Now we have a large pool of existing bugs, some in progress, some in need of owners to pick them up. But we also need more people to measure the performance of the browser in their areas of expertise, and plan future work to improve the existing issues. Going forward, we would like to start experimenting with a different structure to continue the performance work that Quantum Flow started. Instead of having a small team doing triage and prioritization in a centralized fashion, we would like to distribute this across the existing teams at Mozilla, to allow them to integrate the performance work with their existing development work (if they don't already!) and triage and fix these issues on their own pace.

So, instead of using the [qf] status whiteboard tag, we will use the existing perf keyword in Bugzilla, and when triaging, teams will use the normal Bugzilla Priority fields to assign priority to the performance bugs. The existing set of open QF bugs will be moved to use the perf keyword as well to unify how we handle these bugs.

Long term, we view performance as an aspect of quality very much like stability and security. This means both running periodic projects to improve on the solid foundation that we have built so far, and continually monitor and measure the performance of the various aspects of the browser to make sure regressions are caught and fixed in time, and new features are developed with performance in mind from early on. And to do so, we need to continue our investment in the tooling around performance, perfherder, AWFY, Profiler, arewesmoothyet, and perhaps more tools in the future.

Continuing the QF project into the future is an area of active planning. I expect more details about the next steps for this project will be shared as the plans continue to shape up.

On QF Newsletters

The idea of starting to write an ongoing newsletter about the Quantum Flow project was first suggested to me by Bill McCloskey. Our first goal was to find a way to increase the visibility into what the QF project is, and highlight people's contributions to the effort, because we felt like due to the speed at which the project was spun up and the vast scope of the project, it may be difficult for a lot of people to understand in detail what actually happens under the hood.

Over time, I tried to shed light into the most important aspects of the work that happened in the project, and document the history of the technical work that happened throughout the past seven months or so. At its lowest level, Quantum Flow consisted of many performance bug fixes all over the place. I think it was important to highlight that fact, but also explain to some extent how and why some classes of bugs were investigated in-depth. But it's also possible to not see the forest when looking at the trees, so at times I also tried to highlight some of the ongoing higher level efforts in addition to listing the individual fixes landing each week.

Another super important goal of mine was to give credit where it's due. What we have accomplished for Quantum Flow (and Firefox 57) so far couldn't have been accomplished without the help of many wonderful people in the community. I often think that we need to take more opportunities to thank people for the hard work that they are doing, and I took these newsletters as my opportunity to do just that!

Also, I have always wanted to read newsletters like this myself about what happens in projects and teams that I'm not actively involved in myself. I'm extremely happy that there are now a number of newsletters that people have started writing recently, and I look forward to seeing even more of them! Writing about what we work on is important, it helps share knowledge across Mozilla and keeps people informed and engaged. I urge more people to write about what they do, and I'll try to continue to do so myself.

Before closing, it is time to give credits where it's due one last time in these series. Firstly, I would like to thank Jan de Mooij, Mike Conley, and Florian Quèze who helped me with collecting the credits section at the end of the newsletters in the past few months! Also, I would like to extend to thank the following people who helped land the last performance improvements for Firefox 57:

Andrew McCreight enabled the single-compartment for all JSMs! This reduces the memory overhead incurred by each JSM, and also side-step the “compartment crossing” performance penalty when JSMs access objects in one another. Here’s our ts_paint performance test (time to first window paint in ms) - that fall off around September 16 was caused by this work!
Jim Chen enabled the single-compartment JSMs for Android! This brought some nice speed and memory usage wins on Android like desktop above.
Olli Pettay fixed a recent regression from one of my patches which was caused by creating nsContentList objects too eagerly.