Quantum Flow Engineering Newsletter #4

As promised (with a day of delay), here is an update on what happened in the last two weeks on making Firefox faster as part of the Quantum Flow project.
Last week we had a big work week at the Mozilla Toronto office.  Many members of the various teams were attending and the week was packed with a lot of planning around the performance issues that have been identified in each area so far, and what we are planning to do in each area for Firefox 57 and beyond.  I tried to attend as many of the discussions as I could, but of course many of the discussions were happening concurrently so I’m sure a lot of details is going to be missing, but here is a super high level of some of the plans that were being discussed.
  • DOM.  In the DOM team there are several plans and projects under way which will hopefully bring various performance improvements to the browser.  Probably the largest one is the upcoming plans for cooperative scheduling of tasks, which will allow us to interrupt currently executing JavaScript on background pages in order to service tasks belonging to foreground pages.  You may have seen patches landing as part of a large effort to label all of our runnables.  This is needed so that we can identify how to schedule tasks cooperatively.  We are planning to also soon do some work on throttling down timeouts running in background pages more aggressively.  More details will be announced about all of these projects very soon.  Furthermore we are working on smaller scale performance improvements in various parts of the DOM module as new performance issues are discovered through various benchmarks.
  • JavaScript.  In the JavaScript team there have been several streams of work ongoing to work on various improvements to the various aspects of our JS execution.  Jan de Mooij and colleagues have been running the CacheIR project for a while as an attempt to share our inline caches (ICs) between the baseline and Ion JIT layers.  This helps with unifying the cases that can be optimized in these JIT layers and has been showing meaningful improvements both on real web pages and benchmarks such as Speedometer.  They have also been looking at various opportunistic optimizations that also help performance issues we have identified through profiling as well.  Another line of investigation in the JS team for a while has been looking into this bug.  We have some evidence to suggest that our JIT generated code isn’t very efficient in terms of the CPU instruction cache usage, but so far that investigation hasn’t resulted in anything super conclusive.  Another extensive discussion topic was GC scheduling.  Right now the way that our GC (and cycle collection) scheduling works is pretty dis-coordinated between SpiderMonkey and Gecko, and this can result in pathological cases where for example SpiderMonkey sometimes doesn’t know that a specific time is an unfortunate time to run a long running GC, and Gecko doesn’t have a good way to ask SpiderMonkey to stop an ongoing GC if it detects that now would be a good time to do something else, etc.  We’re going to start to improve this situation by coordinating the scheduling between these two parts of the browser.  This is one of those architectural changes that can have a pretty big impact also in the longer term as we find more ways to leverage better coordination.  Another topic that was discussed was improving the performance of our XRay wrappers that provide chrome JS code access to content JS objects.  This is important for some front-end code, and also for the performance of some Web Extensions.
  • Layout.  In the Layout team, we are focusing on improving our reflow performance.  One challenge that we have in this area is finding which reflow issues are the important ones.  We have done some profiling and measurement and we have identified some issues so far, and we can definitely find more issues, but it’s very hard to know how much optimization is enough, which ones are the important ones, and whether we know of the important problems.  The nature of the reflow algorithm makes it really difficult to get really great data about this problem without doing a lot of investigation and analysis work, and we talked about some ideas on what we can do to improve our work flows, but nobody seemed to have any million dollar ideas.  So at the lack of that we won’t be waiting for the perfect data to arrive and we’ll start acting on what we know about for now.  Through looking at many reflow profiles, we have also developed some “intuitions” on some patterns on the types of expensive things that typically show up in layout profiles, which we are working on improving.  There are also some really bad performance cliffs that we need to try to eliminate.
  • Graphics. In the Graphics team, we are planning to make some performance improvement to display list construction by retaining and incrementally updating them instead of reconstructing them every time.  This is an extremely nice optimization to have since in my experience display list construction is the bottleneck in many of the cases where we suffer from expensive paints, and it seems like we have telemetry data that confirms this.  The graphics team is also looking into doing some optimizations around frame layer building and display list building based on measurements highlighting places where things could be improved.
Another thing that happened last week was that I gave a talk on Friday as an introduction on how to use the Gecko Profiler to find performance issues in Firefox.  During the week a few people had expressed interest to sit down with me and look over my shoulder as I use the profiler to analyze some performance problems, and due to the lack of time to sit down with people 1:1 we decided to do a recorded talk.  This was decided a few minutes before the talk happened.  🙂  So, I didn’t really have anything prepared in advance, which was both good and bad, since the talk is basically me live profiling Firefox, and it shows how you can start from scratch and go all the way to a bug report.  The recording is here: <https://air.mozilla.org/gecko-profiler-introduction/>, if you’re interested in learning how to use the profiler and/or how to read and analyze a profile you may want to check it out!  BTW if you ever felt like you could use documentation and/or training material on how to use a profiler (or how to do that more effectively) please feel free to contact me privately with ideas on what you would like to learn.  We haven’t been so great on the documentation front, and many people have been pointing this out to me, and I hear and acknowledge your feedback!  I’d like to try to improve this situation, and your feedback on what you’d like to learn about will help prioritize what’s more important.
In hindsight, looking back at the work week, I wish we had more people from the front-end teams also attend the work week.  The planning of the work week happened a couple of months or so earlier but due to the nature of this project, the measurements lead us to the various areas of the code base, and now we have a fair amount of issues in the front-end code and I think perhaps the work week was a bit too much focused on Gecko.  But on the flip side, I also found the breadth of topics to cover in one week a bit too much, so perhaps adding more people and more meetings to the agenda wouldn’t have been a net benefit.  A lesson to learn for the next time!
On the bug fixing front, we had a great couple of weeks, there is a long list of really great work that happened, and a list of really amazing individuals to be recognized and appreciated for their contributions.  As always I know I’ll be missing a few people, so apologies in advance for that!
Posted in Blog Tagged with: , ,