Quantum Flow Engineering Newsletter #25

The Quantum Flow project started as a cross-functional effort to study and fix the most serious performance issues of Firefox affecting real world browsing use cases for the Firefox 57 release.  Thanks to the hard work of everyone who helped us along the way, we believe that we have managed to fix a significant portion of the issues discovered in the past seven months or so that this project has run and have managed to achieve the performance goals that we had set for ourselves.

A Short Retrospective

Looking back at the past months, the Quantum Flow project went through three different stages in terms of the type of issues we focused on (even though these stages weren’t consecutive in time).  In the first stage, we focused most of our energy on gathering information about performance issues and planning work that would fit in the scope of Firefox 57.  A lot of time was spent doing profiling and measurements to gather evidence around the most problematic areas of the code needing attention.  We also spent time thinking about what parts of our plans may or may not finish in time for Firefox 57, and thought about which issues were the most urgent to fix, and which ones could be deprioriotized to future releases.  In this time we had a work week to consult various platform teams for help in various areas of their expertise.

After the scope of all the work that was necessary to be done became more clear, we knew that we needed help from other teams to oversee some aspects of the ongoing work independently.  We started to work with the Firefox front-end team to ask for their help on reducing synchronous layout/style flushes and scheduled timers in the UI code.  That relationship grew with their effort on reducing the browser start-up time (which has been a great success so far!) and eventually with the Photon Performance project in place, the amount of effort on the Quantum Flow side to keep track of the UI performance was greatly reduced, which was a huge help.  Another successful example here was working with the layout team to improve reflow performance.  During our measurements we had seen a lot of reflow performance issues, which were not easy for us to diagnose and analyze.  The layout team was really busy with a lot of ongoing work all along, but they did a great job at keeping track of the issues we reported to them and fix the ones which were important.  They also didn’t stop there, a lot of great work happened based on other independent investigations which would benefit reflow performance in these past few months.

The last stage and the longest running one perhaps was a cross-functional effort at fixing the bugs we had discovered and prioritized which I’m sure you are all familiar with by now.  There were some challenges to overcome here.  Perhaps the most obvious was the sheer number of bugs at hand.  I remember triage meetings with 50+ bugs to go through, and we had to be careful to not miscategorize something or not miss something important.  The other challenge was the amount of time we had and the number of people who could help with fixing the bugs.  The number of person-hours of help we could get from each team depended on the existing workload of the team and their other priorities, so sometimes it was hard to predict how much help we can count on in a given area especially in the earlier stages.  As more people got involved, we needed to do more communication to make sure things kept moving forward, and nobody was blocked on something where we could give help.  Because of the number of bugs on our plate, we also always needed more help!  We continually thought about new ways of seeking help from more people and teams.  Due to the short amount of time that we had before our final deadline for the development of Firefox 57, we had to experiment with ideas to deal with these issues quickly, see what works, abandon what didn’t, and rinse and repeat.

Over this time, at the time of this writing, we triaged 895 bugs in total.  We used a three-tier priority scheme, and out of these, 277 P1 bugs, 38 P2 bugs and 54 P3 bugs were fixed (as in, marked FIXED, not counting other resolutions such as WONTFIX, DUPLICATE, etc.).  We mass-moved all remaining open P1 bugs to P2 (because the definition of P1 was bugs that we want to fix for the 57 release), and we are now left with 141 P2 open bugs and 133 open P3 bugs.

The Future of Quantum Flow

From the beginning of this project, it was clear that we are going to discover more issues that we will have time to fix for Firefox 57, so the fact that we have so many open bugs is a good thing.  We tried to be intentional in what we focus on first for Firefox 57, but we didn’t expect to stop there.

Now we have a large pool of existing bugs, some in progress, some in need of owners to pick them up.  But we also need more people to measure the performance of the browser in their areas of expertise, and plan future work to improve the existing issues.  Going forward, we would like to start experimenting with a different structure to continue the performance work that Quantum Flow started.  Instead of having a small team doing triage and prioritization in a centralized fashion, we would like to distribute this across the existing teams at Mozilla, to allow them to integrate the performance work with their existing development work (if they don’t already!) and triage and fix these issues on their own pace.

So, instead of using the [qf] status whiteboard tag, we will use the existing perf keyword in Bugzilla, and when triaging, teams will use the normal Bugzilla Priority fields to assign priority to the performance bugs.  The existing set of open QF bugs will be moved to use the perf keyword as well to unify how we handle these bugs.

Long term, we view performance as an aspect of quality very much like stability and security.  This means both running periodic projects to improve on the solid foundation that we have built so far, and continually monitor and measure the performance of the various aspects of the browser to make sure regressions are caught and fixed in time, and new features are developed with performance in mind from early on.   And to do so, we need to continue our investment in the tooling around performance, perfherder, AWFY, Profilerarewesmoothyet, and perhaps more tools in the future.

Continuing the QF project into the future is an area of active planning.  I expect more details about the next steps for this project will be shared as the plans continue to shape up.

On QF Newsletters

The idea of starting to write an ongoing newsletter about the Quantum Flow project was first suggested to me by Bill McCloskey.  Our first goal was to find a way to increase the visibility into what the QF project is, and highlight people’s contributions to the effort, because we felt like due to the speed at which the project was spun up and the vast scope of the project, it may be difficult for a lot of people to understand in detail what actually happens under the hood.

Over time, I tried to shed light into the most important aspects of the work that happened in the project, and document the history of the technical work that happened throughout the past seven months or so.  At its lowest level, Quantum Flow consisted of many performance bug fixes all over the place.  I think it was important to highlight that fact, but also explain to some extent how and why some classes of bugs were investigated in-depth.  But it’s also possible to not see the forest when looking at the trees, so at times I also tried to highlight some of the ongoing higher level efforts in addition to listing the individual fixes landing each week.

Another super important goal of mine was to give credit where it’s due.  What we have accomplished for Quantum Flow (and Firefox 57) so far couldn’t have been accomplished without the help of many wonderful people in the community.  I often think that we need to take more opportunities to thank people for the hard work that they are doing, and I took these newsletters as my opportunity to do just that!

Also, I have always wanted to read newsletters like this myself about what happens in projects and teams that I’m not actively involved in myself.  I’m extremely happy that there are now a number of newsletters that people have started writing recently, and I look forward to seeing even more of them!  Writing about what we work on is important, it helps share knowledge across Mozilla and keeps people informed and engaged.  I urge more people to write about what they do, and I’ll try to continue to do so myself.

Before closing, it is time to give credits where it’s due one last time in these series.  Firstly, I would like to thank Jan de Mooij, Mike Conley, and Florian Quèze who helped me with collecting the credits section at the end of the newsletters in the past few months!  Also, I would like to extend to thank the following people who helped land the last performance improvements for Firefox 57:

Posted in Blog Tagged with: , ,
15 comments on “Quantum Flow Engineering Newsletter #25
  1. Mike Conley says:

    Thank you so much for your work on these newsletters, Ehsan! Great stuff.

  2. sfleiter says:

    Thanks for being part of these massive improvements and writing about them.
    It was a pleasure to read your newsletters and the performance improvements are just awesome.
    Keep up the great work!

  3. sebastianz83 says:

    I also want to say thank you, Ehsan! Your newsletters around QF were always interesting to read and it’s great to see those performance improvements all over the place. I’m really looking forward to the release of 57!
    For everyone involved in this project, a big thank you and keep up the great work!

  4. IdiotFour says:

    Hello!
    Not long time ago I posted a bunch of profiles in your blog that showed frequent scrolling fps drops. You filed this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1396856

    Unfortunately, I still come across performance regressions after this bug landed so Kris Maglione told me to file a new bug with STR. Here it is: https://bugzilla.mozilla.org/show_bug.cgi?id=1400534

    The problem is that I filed this bug 6 days ago and Kris still hasn’t left a comment so I don’t know if he reproduced the problem successfully and if he is planning to start working on this bug. Could you ask Kris what he thinks about this bug?

    • ehsan says:

      I pinged him about it. 6 days is a short amount of time to wait, I’m sure Kris has a lot on his plate. Please be patient. 🙂

  5. Anonymous says:

    Thanks Ehsan for this series, I’ve really enjoyed following the work at Quantum Flow!

    What are the next goals at bringing Servo components into Gecko? I know Stylo is basically done and WebRenderer in the works, but what comes next?

    • ehsan says:

      WebRender is I think the next big part of Servo to be integrated into Gecko. I’m not sure what the next step is, unclear at this point. Don’t know if anyone has figured it out yet. 🙂

  6. Share with us the thinking for retiring the current centralized triage process, which has been super-effective leading up to 57.
    The efficiency of a centralized system is hard to beat, in syncing various aspects of the browser. What kind of systems are you guys considering to maintain the effectiveness in a decentralized system?

    • ehsan says:

      I don’t think we have done any specific research around whether a centralized or decentralized system for managing this kind of project would be more efficient, so without such research I think we can’t assume one specific answer. 🙂

      That being said, I agree that our centralized process leading up to 57 was effective, but it was also designed around a deadline, and also around the restrictions of having to do lots of work for very specific purposes in a short amount of time. Going forward, those circumstances won’t hold any longer, so we’ll be working in a different environment, where our shipping browser is much faster than what we had a year or so ago, and we have existing ongoing programs on improving the performance even more. Also, a centralized process does have downsides as well, such as taking up a lot of the time of people whose time is very valuable and we need to be careful about where it makes sense to invest that time and attention into.

      As I mentioned in my post, not all of the details of how the ongoing performance work going forward is clear yet, but we will have regular check-ins by people who have driven all of the great work done so far about the current state of things, and if the new process doesn’t work out well, we will readjust as we always do. I’m not too worried about getting the new process perfect from the first iteration. Just like software, software development processes can also be buggy and need to evolve as we learn about their problems, but it doesn’t mean we shouldn’t try new things. 🙂

  7. Mark Bokil says:

    I am really enjoying the developer edition of Firefox. The Photon UI is very fast and looks great. Nice work on quantum.

  8. Montese3 says:

    One thing that I noticed while trying out Firefox Nightly for the past couple months is that the CPU usage is sometimes noticeably higher in some sites than with past Firefox releases (I’m thinking mainly of Firefox 54-55). Is there some way to debug this to help you find regressions and solve this issue? Thanks in advance and congrats on all the work!

    • ehsan says:

      Sure, you can use Gecko Profiler to record a profile of what’s happening under the hood when Firefox is displaying a web page, this helps show the cause of issues like high CPU usage. If you have specific websites where you have noticed this regression on, another helpful tool is mozregression which allows you to find out what commit to Firefox first introduced the regression you are observing (but it could be hard to track down issues like CPU usage unless the difference is quite large.)

      Please note that merely more CPU usage may not be a bad thing in itself unless if it’s also accompanied by some other usability issues (such as jank, scrolling choppiness, etc.) Otherwise if software manages to make better usage of system resources, sometimes that may manifest as more CPU usage (e.g. if we manage to keep all CPU cores busy where we didn’t use to efficiently consume all available cores before.) At any rate if you suspected something is a regression, once you find out some information about it, please don’t hesitate to file a bug!

  9. voracity says:

    Thanks Ehsan! I’ve really enjoyed these newsletters in my weekly Planet Moz catch up, and it’s great to see so many other (technical) Mozilla newsletters around now.

    Also, thanks for the help with the perf issue I had brought up in a comment on this blog. I wasn’t expecting much could be done about it, but it was dealt with very quickly and very well.

6 Pings/Trackbacks for "Quantum Flow Engineering Newsletter #25"
  1. […] Servo parallel browser engine project. Additionally, the “Quantum Flow” team tracked down and fixed 369 performance bugs in Firefox, with a special focus on responsiveness and UI interactions. Lastly, the “Quantum […]

  2. […] Servo parallel browser engine project. Additionally, the “Quantum Flow” team tracked down and fixed 369 performance bugs in Firefox, with a special focus on responsiveness and UI interactions. Lastly, the “Quantum […]

  3. […] Servo parallel browser engine project. Additionally, the “Quantum Flow” team tracked down and fixed 369 performance bugs in Firefox, with a special focus on responsiveness and UI interactions. Lastly, the “Quantum […]

  4. […] the Servo parallel browser engine project. Additionally, the aQuantum Flowa team tracked down and fixed 369 performance glitches in Firefox, with a special focus on responsiveness and UI interactions. Lastly, the” Quantum […]

  5. […] browser engine project. Additionally, the “Quantum Flow” team tracked down and fixed 369 performance bugs in Firefox, with a special focus on responsiveness and UI interactions. Lastly, the “Quantum […]

  6. […] Servo parallel browser engine project. Additionally, the “Quantum Flow” team tracked down and fixed 369 performance bugs in Firefox, with a special focus on responsiveness and UI […]