Quantum Flow Engineering Newsletter #12

It has been a few weeks since I have given an update about our progress on reducing the amount of slow synchronous IPC messages that we send across our processes. This hasn't been because there hasn't been a lot to talk about, quite to the contrary, so much great work has happened here that for a while I decided it may be better to highlight other ongoing work instead. But now as the development cycle of Firefox 55 comes to a closing point, it's time to have another look at where we stand on this issue.

I've prepared a new Sync IPC Analysis for today including data from both JS and C++ initiated sync IPCs. First bit of unfortunate news is that the historical data in the spreadsheet is lost because the server hosting the data had a few hiccups and Google Spreadsheets seems to not really not like that. Second bit of unfortunate news is that our hopes for disabling the non-multiprocess compatible add-ons by default in Nightly helping with reducing some of the noise in this data don't seem to have panned out. The data still shows a lot of synchronous IPC triggered from JS as before, and the lion's share of it are messages that are clearly coming from add-ons judging from their names. My guess about why is that Nightly users have probably turned these add-ons back on manually. So we will have to live with the noise in the data for now (this is an issue that we have to struggle with when dealing with a lot of telemetry data unfortunately, here is another recent example that wasted some time and energy).

This time I won't give out a percentage based break-down because now after many of these bugs have been fixed, the impact of really commonly occurring IPC messages such as the one we have for document.cookie really makes the earlier method of exploring the data pointless (you can explore the pie chart to get a quick sense of why, I'll just say that message alone is now 55% of the chart and that plus the second one together form 75% of the data.) This is a great problem to have, of course, it means that we're now starting to get to the “long tail” part of this issue.

The current top offenders, besides the mentioned bug (which BTW is still being made great progress on!) are add-on/browser CPOW messages, two graphics initialization messages that we send at content process startup, NotifyIMEFocus that's in the process of being fixed, and window.open() which I've spent weeks on but have yet to fix all of our tests to be able to land my fixes for (which I've also temporarily given up working on looking for something that isn't this bug to work on for a little while!). Besides those if you look at the dependency list of the tracker bug, there are many other bugs that are very close to being fixed. Firefox 55 is going to be much better from this perspective and I hope the future releases will improve on that!

The other effort that is moving ahead quite fast is optimizing for Speedometer V2. See the chart of our progress on AreWeFastYet.com:

Last week, our score on this chart was about 84. Now we are at about 91. Not bad for a week worth a work! If you're curious to follow along, see our tracker bug. Also, Speedometer is a very JS heavy benchmark, so a lot of the bugs that are filed and fixed for it happen inside SpiderMonkey so watching the SpiderMonkey specific tracker bug is probably a good idea as well.

It's time for a short performance story! This one is about technical debt. I've looked at many performance bugs over the past few months of the Quantum Flow project, and in many cases the solutions have turned out to be just deleting the slow code, that's it! It turns out that in a large code base as code ages, there is a lot of code that isn't really serving any purpose any more but nobody discovers this because it's impractical to audit every single line of code with scrutiny. But then some of this unnecessary code is bound to have severe performance issues, and when it does, your software ends up carrying that cruft for years! Here are a few examples: a function call taking 2.7 seconds on a cold startup doing something that became unnecessary once we dropped support for Windows XP and Vista, some migration code that was doing synchronous IO during all startups to migrate users of Firefox 34 and older to a newer version, and an outdated telemetry probe that turned out to not in use any more scheduling many unnecessary timers causing unneeded jank.

I've been thinking about what to do about these issues. The first step is fix them, which is what we are busy doing now, but finding these issues typically requires some work, and it would be nice if we had a systematic way of dealing with some of them. For example, wouldn't it be nice if we had a MIMIMUM_WINDOWS macro that controlled all Windows specific code in the tree, and in the case of my earlier example perhaps the original code would have checked that macro against the minimum version (7 or higher) and when we'd bump MINIMUM_WINDOWS up to 7 along with bumping our release requirements, such code will turn itself into preprocessor waste (hurray!), but of course, the hard part is finding all the code that needs to abide by this macro, and the harder part is enforcing this consistently going forward! Some of the other issues aren't possible to deal with this way, so we need to work on getting better at detecting these issues. Not sure, definitely some food for thought!

I'll stop here, and move on to acknowledge the great work of all of you who helped make Firefox faster this past week! As per usual, apologies to those who I'm forgetting to mention here:

Mike Conley added more reflow tests for common user interactions for windows and tabs, and for the new Photon app menu.
Jonathan Watt has been swapping out all uses of a CSS filter on SVG icons to use the new context paint ability instead, which should be much cheaper to calculate and paint.
André Bargull added an optimization to avoid atomizing empty function names when retrieving the unresolved name of a bound function. He also reduced the number of array allocations when invoking bound functions with many arguments. He also avoided duplicate property lookups when setting a new property.
Kyle Machulis got rid of the PContent::FindPlugins sync IPC message!
Kris Maglione improved the speed of the extension policy service. He also improved the performance of using structured clone to transfer data for WebExtension APIs. Additionally he reduced the overhead of content script and extension page matching by rewriting them in C++.
Florian Quèze removed the messageWakeupService which was unused but still loaded during startup, and moved a few other module imports off of app-startup,
Dimi Lee removed a timer that would fire every second for up to 10 minutes after attempting to set Firefox as the default browser. This should help free up the main thread to do much more important things.
Boris Zbarsky improved the performance of Object.getOwnPropertyNames(window).
Andrea Marchesini landed infrastructure that should allow background tabs to have lower process priority. As support for each platform lands, this should help with resource contention when running with many tabs and many content processes.
Panos Astithas made it so that we mute tabs immediately upon closing them. This should help improve perceived tab closing time.
Mats Palmgren reduced the amount of hashtable lookups that we do when inserting via nsBaseHashtable::GetOrInsert by half! He also reduced hashtable lookups in ImageLoader::DropRequestsForFrames by a lot! He did the same thing in nsCounterManager::DestroyNodesFor. And in nsContainerFrame::SafelyDestroyFrameListProp. He also merged ClearAllUndisplayedContentIn and ClearAllDisplayContentsIn to avoid doing duplicated work! All of this is his efforts to make destroying frames cheaper. This is work that we currently do synchronously in some situations such as an innerHTML setter.
Miko Mynttinen reduced the amount of hashtable lookups that we do when deleting a DisplayItemData object by half!
Jan de Mooij optimized Array.prototype.unshift! He also optimized @@toStringTag and @@toPrimitive property lookups!
Gary Chen moved the detection of system proxy auto-config settings to happen in a background thread. We had gathered evidence through the Background Hang Reporter that this can this can cause hangs on some Windows systems when the network connection drops.
Morris Tseng fixed a recent regression in hit testing performance on large tables.
Ting-Yu Chou removed one of the calls to XrayTraits::getExpandoObject() from js::Proxy::get()! He also fixed the order of checking the boolean conditions to check the flag in js::CheckedUnwrap before doing some expensive work not after! These are part of his investigations into the performance of access from chrome privileged JS code to content objects.
Masayuki Nakano fixed a hang when setting innerHTML in rich text editors.
Nihanth Subramanya ensured that captive portal detection happens after startup has completed.