Another week full of performance related updates quickly went by, I'd like to share a few of them.
We're almost mid-April, about 3 weeks after I shared my first update on our progress battling our sync IPC issues. I have prepared a second Sync IPC Report for 2017-04-13. For those who looked at the previous report, this is in the same spreadsheet, and the data is next to the previous report, for easy comparison. We have made a lot of great progress fixing some of the really bad synchronous IPC issues in the recent few weeks, and even though telemetry data is laggy, we are starting to see this reflect in the data coming in through telemetry! Here is a human readable summary of where we are now:
- PCookieService::Msg_GetCookieString is still at the top of the list, now taking a whopping 45% piece of the pie chart! I don't think there is any reason to believe that this has gotten particularly worse, it's just that we're starting to get better at not doing synchronous IPC, so this is standing out even more now! But its days are numbered. :-)
- PContent::Msg_RpcMessage and PBrowser::Msg_RpcMessage at 19%. We still need to get better data about the sync IPC triggered from JS, that shows up in this data under one of these buckets.
- PJavaScript::Msg_Get at 5% (CPOW overhead) could be caused by add-ons that aren't e10s compatible.
- PAPZCTreeManager::Msg_ReceiveMouseInputEvent. This one (and a few other smaller APZ related ones) tends to have really low mean values, but super high count values which is why they tend to show high on this list, but they aren't necessarily too terrible compared to the rest of our sync IPC issues.
- PVRManager::Msg_GetSensorState also relatively low mean values but could be slightly worse.
- PJavaScript::Msg_CallOrConstruct, more CPOW overhead.
- PContent::Msg_SyncMessage, more JS triggered sync IPC.
A few items further down on the list are either being worked on or recently fixed as well. I expect this to keep improving over the next few weeks. It is really great to see this progress, thanks to everyone who has worked on fixing these issues, helping with the diagnoses, code reviews, etc.
We have also been working hard at triaging performance related bug reports. In order to keep an eye over the bug-to-bug status of project you can use the Bugzilla queries on the wiki. As of this moment, we have triaged 160 bugs as [qf:p1] (which means, these performance related bugs are the ones we believe should be fixed now for the Firefox 57 release). Of these bugs, 92 bugs are unassigned right now. If you see a bug on this list in your area of expertise which you think you can help with, please consider picking it up. We really appreciate your help. Please remember that not every bug on this list is complicated to fix, and there's everything from major architectural changes to simple one-liner fixes up for grabs. :-)
Another really nice effort that is starting to unfold and I'm super excited about is the new Photon performance project, which is a focused effort on the front-end performance. This includes everything from engineering the new UI with things like animations running on the compositor in mind from the get-go, being laser focused on guaranteeing good performance on key UI interactions such as tab opening and closing, and lots of focused measurements and fixes to the browser front-end.
The performance story of this week is about how measurement tools can distort our vision. And this one isn't much of a story, it's more of a lesson that I have been learning seemingly over and over again, these days. You may have heard of the measurement problem, which basically amounts to the fact that you always change what you measure. Markus and I were recently talking about the cost of style flushes for browser.xul that I had seen in my profiles and how they could sometimes be expensive, and noticed that this may be due to the profiler overhead that we incur in order to show information about the cause of the restyle in the profile UI. He fixed the issue since. I think the reason why I didn't catch this in my own profiling was that I have gotten so used to seeing expensive reflows and restyles that sometimes I accept that as a fact of life and don't look under the hood closely enough. Lesson learned!
We have a bug tracking these types of issues, so if you know of something similar please create a dependency. If you also profile Firefox regularly using the Gecko Profiler, adding yourself to the CC list of that bug may not be a bad idea.
Now it's time to acknowledge those who have helped make Firefox faster in the past week. I will probably forget a few people here, apologies for any unintended omissions!
- Tim Taubert made a couple of fixes to the performance of SessionCookies.jsm code, which, IINM runs periodically on the UI thread and during startup. Well done!
- Henry Chang continued his work on improving url-classifier performance.
- Dana Keeler disabled an expensive telemetry probe which could slow down HTTPS page loads by up to 2 times in certain cases.
- Jonathan Kew added some caching to gfxFontShaper::GetRoundOffsetsToPixels().
- Olli Pettay enabled high priority vsync events in the parent process. This seems to have finally stuck. Olli was trying to get this landed for months now, and our unit tests disagreed. :-)
- Kyle Machulis restricted the plugin finding/initialization code to Flash/PDF, lowering the overhead of the expensive MIME type lookups we used to do.
- JW Wang fixed a bug where we were blocking the main thread for file I/O done on a background thread for up to 8 seconds according to telemetry data. Readers of this newsletters should be familiar with this anti-pattern now…
- Gerald Squelart fixed a graphics initialization synchronous IPC issue. This could be a navigation performance issue with multi-e10s.
- Evelyn Hung lowered the cost of size calculation for the height of the squiggly line we draw for misspelled words during spell checking.
- Marco Bonardo removed a sync reflow which caused jank when opening the awesomebar panel.
- Dão Gottwald also removed a sync reflow loop which caused jank when resizing windows with pinned tabs.
- Masatoshi Kimura added a cache for the preference access in nsIWidget::DefaultScaleOverride().
- Jan de Mooij further improved the external string cache by moving it into the JS engine. He also improved our SetProp failure rate on sites like Google Spreadsheets.
- Kartikaya Gupta removed a synchronous IPC from the compositor.
- Makoto Kato avoided using synchronous IPC to set the spellchecker dictionary language.
- Neil Deakin removed a sync reflow that happened when closing a window.
- Mike Conley started to create an add-on helping front-end engineers in finding sync reflow issues in the browser UI.
Until next week, happy hacking!