Quantum Flow Engineering Newsletter #22

With around three weeks left in the development cycle for Firefox 57, everyone seems to be busy getting the last fixes in to shape up this long-awaited release. On the Quantum Flow project, we have kept up with the triage of the incoming bug reports that are tagged as [qf], and as we're getting closer to the beta uplift date, the realistic opportunity for fixing bugs is getting narrower, and as such the bar for prioritizing incoming bug reports as [qf:p1] keeps getting higher. This matches with the overall shift in focus in the past few weeks towards getting all the ongoing work that is targeting Firefox 57 under control to make sure we manage to do as much of what we have planned to do for this release as possible.

This past week we made more progress on optimizing the performance of Firefox for the Speedometer V2 benchmark. Besides many of the usual optimizations, which you will read about in the acknowledgement section of the newsletter, one noteworthy item was David Major's investigation for adding this benchmark to the set of pages that we load to train the PGO profile we use on Windows builds. This allowed the MSVC code generator to generate better optimized code using the profile information and bought us a few benchmark score points. Of course, earlier similar attempts hadn't really gained us better performance, and it's unclear whether this change will stick or get backed out due to PGO specific crashes or whatnot, but in the mean time we're not stopping landing other improvements to Firefox for this benchmark either! At the time of this writing, the Firefox Health Dashboard puts our benchmark score on Nightly within a 4.07% difference compared to Chrome.

Another news worthy of mention related to Speedometer is that recently Speedometer tests with Stylo were enabled on AWFY. As can be seen on the reference hardware score page, Stylo builds are now a bit faster than normal Gecko when running Speedometer. This has been achieved by the hard work of many people on the Stylo team and I'd like to take a moment to thank them, and especially call out Bobby Holley who helped make sure that we have a great performance story here.

In other performance related news, this past week the first implementation of our cooperative preemptive scheduling of web page JavaScript, more commonly known as Quantum DOM, landed. The design document describes some of the background information which may be helpful if you need to understand the details of how the new world looks like. For now, this feature is disabled by default while the ongoing work to iron out the remaining issues continues.

The Quantum DOM project has been a massive overhaul of our codebase. A huge part of it has been the “labeling” project that Bevis Tseng has been tirelessly leading for many months now. The basic idea behind this part of the project is to give each runnable a name and indicate which tab or document the runnable is associated with (I'm simplifying a bit, please see the wiki page for more details.) Bill McCloskey had a great suggestion about some performance lessons that we have learned through this project for the performance story section of this newsletter, which was to highlight how this project ended up uncovering some unexpected performance issues in Firefox!

Bevis has some telemetry analysis which measures the number of runnables of a certain type (to view the interesting part, please scroll down to the “full runnable list” section). This analysis has been used to prioritize which runnables need to be worked on next for labeling purposes. But as this list shows the relative frequencies of runnables, we've ended up finding several surprises in where some runnables are showing up on this list, which have uncovered performance issues which would otherwise be very difficult to detect and diagnose. Here are a few examples (thanks to Bill for enumerating them!):

We used to send the DidComposite notification to every tab, regardless of whether it was in the foreground or background. We tried to fix this once, but the fix actually only fixed a related issue involving multiple windows. The real fix finally got fixed later.
We used to have a “startup refresh driver” which used to have only for a few milliseconds during startup. However, it was showing up as #33 on the list of runnables. We found out that it was never being disabled after it was being started, so if we ever started running the startup refresh driver, it would run indefinitely in that browsing session, and get to the top of the list. Unfortunately, while this runnable disappeared for a while after that bug was fixed, it is now back and we're not sure why.
We found out that MediaStreamGraphStableStateRunnable is #20 on this list, which was surprising as this runnable is only supposed to be used for WebRTC and WebAudio, neither being extremely popular features on the Web. Randell Jesup found out that there is a bug causing the runnable to be continually dispatched after a WebRTC or WebAudio session is over.
We run a runnable for the intersection observer feature a lot. We tried to cut the frequency of this runnable once, but it doesn't seem to have helped much. This runnable still shows up quite high on the list, as #6.

I encourage people to look at the telemetry analysis to see if they can spot a runnable with a familiar name which appears too high on the list. It's very likely that there are other performance bugs lurking in our codebase which this tool can help uncover.

Now, please allow me to take a moment to acknowledge the hard work of everyone who helped make Firefox faster this past week. I hope I'm not forgetting any names!

Mike Conley made it so that background tabs are “warmed up” when hovering the mouse cursor over them. This should improve tab switching perceived performance.
Samael Wang made it so that focus changes don’t cause sync IPC messages for IME, which should help reduce UI jank.
Masayuki Nakano optimized retrieving the selection from nsTextEditorState. He also optimized nsTextEditorState::GetValue(). He also made Selection cache a Range object under some circumstances instead of recreating it if possible. In addition, he made nsContentEventHandler not use nsRange objects for representing two DOM points, as these objects can have a high performance cost. Last but not least, he made sure EditorBase::EndPlaceholderTransaction() doesn’t retrieve an nsCaret object needlessly. These are all part of the continued effort to reduce the cost of setting the value attribute of input text controls.
Kris Maglione used the subscript loader to load the browser.xul scripts instead of the script tag.
Wei-Cheng Pan added a new telemetry probe for measuring startup input latency based on the user’s notion of when a session becomes interactive (which is calculated based on when the first tab is displayed after startup.)
Bill McCloskey landed the initial implementation of cooperative preemptive scheduling of web page JS in the same content process (aka Quantum DOM scheduling). This implementation is currently intended for testing and is disabled by default.
David Major, with the help of Gregory Szorc and Ryan VanderMeulen added the Speedometer V2 benchmark to our Windows PGO profile phase training set, which improved our benchmark score by assisting the compiler at generating better optimized code.
Nicholas Hurley ensured that the network predictor only parses URIs when needed.
Nathan Froyd eliminated a virtual call when querying whether an XPCOM-style weak pointer is alive by making nsQueryReferent not be an nsCOMPtr_helper. He also manually devirtualized nsIWeakReference::QueryReferent() which was the second virtual call that we used to incur when checking such weak pointers.
Thom Chiovoloni optimized the Quote() function which was the bottleneck in JSON.prototype.stringify.
Samael Wang removed the PBrowser::NotifyIMEFocus synchronous IPC message.
Jan de Mooij finished his mini-project of removing the getProperty/setProperty hooks from SpiderMonkey ClassOps. This entailed fixing the consumers that were relying on these hooks in the dependencies of the bug, and removes some of the overhead from invoking getter/setters where before we needed to handle these hooks. He also optimized property addition even further.
Olli Pettay optimized HasRTLChars() usage. He also made the two-byte character variant of nsTextFragment use an nsStringBuffer so that its string data can be obtained without copying memory.
Till Schneidereit helped finish André Bargull’s original patch to move parts of Object.getOwnProperty and Object.defineProperty to self-hosted code in order to improve their performance.
Bas Schouten cached the reference frame when building a display list to make GetScrolledRect() faster and switched to estimate the scale of frames based on their PresShell resolution.
Robert Wood fixed capturing profiles from Talos using a trychooser syntax.
Shane Caraveo removed the WebRequest:ShouldLoad synchronous IPC message.