Quantum Flow Engineering Newsletter #17

Next week, Nightly will switch to the 57 branch, beginning the development cycle of what will be the last train leaving the station towards Firefox 57. Around 5 months ago, I started writing the first one of these newsletters, which of course was well past when the Quantum Flow project got started. It's probably a good time for a retrospective on the way that we have come so far.

During this time, many small and medium size performance projects were started. Some are finished (and shipping even!) and some still ongoing, but the rate of progress has been quite astonishing. I have tried to talk about as many of these as possible in some detail in the newsletter, and these are only one part of the overall performance work happening. Here is a list of some of these projects with some overall status report:

We had a lot of reports of performance issues such as during page loads that were either caused or exacerbated by long running sync IPCs from the content to the chrome process. We started a focused effort to deal with this problem, which started from restricting the addition of new synchronous messages and gradually working towards removing the existing ones that were slow according to telemetry data. I've reported on the status of this work periodically, and it will probably be an ongoing effort for some more time, but we are well on the path of solving the majority of the severe issue that affects our users by Firefox 57 at this point.
The most important performance issues are the ones that affect the real users. In the past we had built an infrastructure for reporting backtraces from hangs that users experience through telemetry so that we can diagnose and fix them called Background Hang Reports, but this setup hadn't survived the passage of time. We started to stand up some python scripts to process the data coming through the telemetry servers to start getting actionable data while starting to create an awesome new UI for it. Many thanks to Nika Layzell and Doug Thayer for their great work on this so far. Through this data, we have [found and fixed](https://bugzilla.mozilla.org/buglist.cgi?quicksearch=ALL sw%3A”[bhr]") a number of bugs. The rate of discovery and fixing bugs from this data has been slower than I would have liked. The reason is we didn't have enough engineers to look through this data and extract actionable bugs out of it. This process is still manual and very slow and time consuming.
We've kept up a rigorous process of continually measuring and tracking the performance of the browser in various workloads with the goal of identifying the most severe performance issues and eliminating them where possible. In order to get help from the broad group of engineers and contributors, it's important to communicate what issues we consider as the most critical, so we have created an active bug triage process to identify the most important bugs and you have all probably heard all about this by now. :-) This may sound like a lot of process, but after several months when I look back now at the rate of performance fixes that have landed as a result of this, I think this has been fairly effective. I think there has also been a lot about this triage process that we could have done better, like maintaining more consistency in prioritizing bugs, communicating more clearly about the criteria being used, etc. But time pressure and the sheer number of bugs to deal with forced the situation in a lot of cases anyway.
A few projects grew into their own independent parallel efforts. One example is some of the initial bugs that we had filed in various UI components in the front-end code started to emerge patterns that seemed to warrant some mini-projects formed around them. One example was synchronous layout and style flushes triggered by various code in the browser front-end (and sometimes by Gecko invoked by the front-end code), or various code such as timers going off at random time in the front-end code. Chasing issues like this is now part of the Photon Performance project and is being actively worked on, and the difference this is making is quite noticeable in the performance of various parts of the UI. Another example is reflow performance. We had seen expensive reflows in many profiles and even though we didn't have much concrete information about the sources of the problems, we reached out to the Layout team and asked for help. That resulted into an effort to improve reflow performance, which is actively continuing. Thanks to both the front-end and layout teams for leading these efforts!
We have continued to improve the Gecko Profiler. It is really impressive how far this effort has come. On the platform side, the profiler has become from unusable to 100% reliable on Win64. The UI keeps getting improved, and is very effective to use both for developers and users who want to help us by submitting profiles. We have also tried to work on some documentation for profiling using it. Thanks to Markus Stange, David Major and Nick Nethercote who work on the back-end and Greg Tatum and Julien Wajsberg who work on the tool's front-end.
Since about 2 months ago, we started actively focusing on improving on the Speedometer 2 benchmark. Ben Kelly made some tweets this week about our progress on optimizing for this benchmark so far which are worth mentioning: “Firefox Nightly has gotten ~38% faster at speedometer v2 in the last 2 months.” and “Firefox Nightly 56 is ~80% faster at speedometer v2 compared to currently released Firefox 54.” This is great! But we aren't done yet.

In other performance related news this week, Stylo, our new parallel CSS engine written in Rust, is ready for testing on the Nightly channel. I've been playing with it on my main browser profile for a while and so far I've been really liking what I'm seeing. I have gone to some pages where restyling cost typically shows up in profiles, and have added “StyleThread” to the list of threads in the Gecko Profiler settings, and have watched the parallel restyles happening in profiles, it's quite interesting. Especially when you see magic such as the Rust rayon threads calling into C++ code without any kind of synchronization on a background thread and the safety of this is ensured through static analysis. We are indeed living in the future! It is so exciting that we're probably going to be the first browser engine deploying a parallel CSS engine through the superpowers of Rust, deploying this technology into the hands of millions of people.

And now again it's time for the most important part of these newsletters, taking a moment to say thanks to those who helped make Firefox faster last week. Apologies to those of you who I'm forgetting to name here.

Thomas Nguyen made SafeBrowsing initialize in an idle callback once per session, rather than once per window
Samael Wang got rid of the PBrowser::Msg_GetDPI sync IPC message!
Kris Maglione optimized XPCOMUtils.generateQI() and Components.utils.import() for the cached module case.
Florian Quèze made most scripts in browser.xul load lazily.
Jan de Mooij removed BytecodeAnalysis from IonBuilder. This analysis showed up in some profiles, and its results could be obtained through other mechanisms. He also optimized Function.prototype.toString().
Henry Chang further optimized the IPC message size, this time for smaller messages. Last week he had landed the initial optimizations for IPC message sizes.
Chris H-C refactored TelemetryHistogram internal storage, solving some of the inefficiencies in the previous code, as well as improving memory usage, among other benefits.
Alexander Poirot made DevTools initialization lazy, which gave us some session restore wins on Talos. He also swapped out some code from the WebIDE that was using the often-slow Add-on SDK.
Jon Coppeard fixed a bug where we could queue up many GC’s in a row when doing off-main-thread parsing.
Ming-Chou Shih added a high priority queue for input events.
Evelyn Hung implemented speculative connections to the selected search engine from the location bar (the feature is currently disabled behind a pref.)
Jessica Jong optimized the NodeList object returned from getElementsByName() to actively ignore DOM elements that don’t have a name attribute. She also replaced nsGenericHTMLFormElement::IsDisabled() with a boolean flag. That virtual function regularly used to show up in profiles.
Kannan Vijayan added a CacheIR stub for optimizing Array.prototype.push(), which enables the optimization of this operation in the baseline JIT. This is the continuation of his optimization efforts based on the data he is collecting from his rdtsc-based instrumented measurements of the top JS builtins used in Speedometer and general browsing.
Ryan Hunt enabled keyboard asynchronous scrolling on all desktop platforms on Nightly.