Quantum Flow Engineering Newsletter #21

We're now about mid-way through the Firefox 57 development cycle. The progress of Quantum Flow bugs has been steady, we now have 65 open [qf:p1] bugs at the time of this writing and 283 fixed bugs. There are still more bugs being flagged for triage constantly. I haven't really spoken much about the triage process lately and the reason is that it has been working as per usual and the output should be fairly visible to everyone through our dashboard.

On the Speedometer front, if you are watching the main tracking bugs, the addition of new dependencies every once in a while should be an indication that we are still profiling the benchmark looking for more areas where we can think of speedup opportunities. Finding these new opportunities has become more and more difficult as we have been fixing more and more of the existing performance issues, which is exactly what you would expect working on improving the performance of Firefox on such a benchmark workload. Of course, we still have ongoing work in the existing dependency tree (which is quite massive at this point) so more improvements should hopefully arrive as we keep landing more fixes on this front.

I realize that I have been quite inconsistent in having a performance story section in these newsletters, and I hope the readers will forgive me for that! :-) But these past couple of weeks, Jan de Mooij's continued effort on removing getProperty/setProperty JSClass hooks from SpiderMonkey made me want to write a few sentences about some useful lessons we have learned from performance measurements which can hopefully be used in the future when designing new components/subsystems. Often times when we are thinking of how to design software, one can think of many extension points at various levels which consumers of the code can plug into in order to customize behavior. But many such extension points come at a runtime cost. The cost is usually quite small, we may need to consume some additional memory to store more state, we may need to branch on some conditions, we may need to perform some more indirect/virtual calls, etc. The problem is that usually this cost is extremely small, and it can easily go unnoticed. But this can often happen in many places, and over time performance issues like this tend to creep in and hide in corners. Of course, usually when these extension points are added there are good reasons for creating them, but it may be a good idea to ask questions like “Is this mechanism too high level of a solution for this specific problem?", “Is the runtime cost paid for this over the years to come justified to solve the issue at hand?", “Could this issue be solved by adding an extension point in a more specialized place where the added cost would only affect a subset of the consumers?", etc. The reality of software engineering is that in a lot of cases we need to trade off having a generic, extensible architecture in our code versus having efficient code, so if you end up choosing extensibility, it's a good idea to ensure you have had the performance aspects in mind. It's even better if you document the performance concerns!

And since we touched on this, now may be a good time to also take a quick moment to call out another issue which I have seen come up on some of the performance issues we have been looking into in the past few months. That is the death by a thousand cuts performance problems. In my experience, many of the performance issues that we need to deal with, when profiled turn out to be caused by only a few really badly performing parts of the code, or at least are due to a few underlying causes. But we also have no shortage of the other kind of performance issues which are honestly much more difficult to deal with. The way things work out in the opposite scenario is you look at a profile from the badly performing case, you narrow down on the section of the profile which demonstrates the issue, and no matter how hard you squint, there are no major issues to be fixed. Rather, the profile shows many individual issues each contributing to a tiny portion of the time spent during the workload. These performance issues are much harder to analyze (since there are typically many ways you can start approaching it and it's unclear where is a good place to start) and they take a much longer time to result in measurable improvements, as you'd need to fix quite a few issues in order to be able to measure the resulting improvement. For a good example of this, please look at the saga of optimizing setting the value property of input elements. This project has been going on for a few months now, and during this time the workload has been made faster by more than an order of magnitude, but still if you look at each of the individual patches that have landed, they look like micro-optimizations, and that's for a good reason, because they are. But overall they add up to significant improvements.

Before closing, it is worth mentioning that the ongoing performance work isn't suddenly going to stop with the release of Firefox 57! In fact we have large performance projects which are going to get ready after Firefox 57, and that is a good thing, since I view Firefox 57 not as an ultimate performance goal, but as a solid performance foundation for us to start building upon. A great example is the Quantum Render project which has been going on for a long time now. This project aims to integrate the WebRender component of Servo into Firefox. This project now has an exciting newsletter, and the first two issues are out! Please take a moment to check it out.

And now it is time to take a moment to thank the contributions of those who helped make Firefox faster last week. As usual I hope I'm not forgetting any names!

Evelyn Hung made it so that when we tell the content process to load a URI, we first initiate a speculative connection in the parent process so that we don’t have to wait for the content process to request network access for the connection to be set up. In a similar vein, Evelyn also made us initiate a speculative connection upon the mousedown event on an awesomebar result entry to start setting up the network connection even faster when using the mouse to pick an awesomebar result.
Paolo Amadini made FileUtils.getFile() not do main-thread IO for the common case. You may remember some fixes to callers of this function were mentioned in the previous newsletters to avoid this pattern of main-thread IO, and this fix will hopefully address this issue for most of the remaining callers. Paolo also got rid of some reflows we were doing when opening up the AwesomeBar panel.
prasanthp96 deferred reading several preferences to reduce their impact on startup performance.
Botond Ballo enabled support for asynchronous autoscrolling on the Nightly channel.
André Bargull inlined IsCallable when called from MIRType::Value.
Olli Pettay added a nursory to the cycle collector purple buffer in order to speed up AddRef/Release calls to cycle collectible objects on the main thread. He also added a faster variant of TextEditor::GetDocumentIsEmpty() in order to speed up setting the value property of input elements.
Ming-Chou Shih enabled coalescing mousemove events to once per refresh cycle. This feature helps performance by dispatching fewer mousemove events on pages which have expensive mousemove handlers, and is shipped in Chrome recently. It is currently disabled behind a preference for testing.
Kris Maglione cached some extension manifest data in the startup cache. He also converted FrameLoader bindings to WebIDL for improved performance of the JS code going through these bindings to access the underlying C++ code. He also made the WebExtension schema normalization code faster. Last but not least, Kris added a UI for notifying the user about long running WebExtension content scripts and provide the option to stop them, similar to the existing UI we have for long running content scripts. While this isn't strictly a performance improvement in itself, it is worthy of mention here because it allows the user to interrupt a badly behaving WebExtension content script causing performance issues.
Masayuki Nakano optimized TextEditRules::CollapseSelectionToTrailingBRIfNeeded().
Jessica Jong made sure we skip the potentially expensive pattern matching code when validating input elements if the element has no pattern attribute set.
Jan de Mooij devirtualized MNode::kind() and MDefinition::op().
Bao Quan moved _saveStateAsync to the idle event queue.
John Dai ensured that we avoid processing custom element reactions stack when web components are disabled.
Makoto Kato enabled lazy frame construction in editable regions of HTML documents. The original lazy frame construction optimization was enabled in 2010 and shipped in Firefox 4 but it never covered editable sections of HTML documents such as contents of input and textarea textboxes as well as contenteditable elements until now.
Doug Thayer made it so that we avoid some main thread IO when registering MIME-type handlers on start-up.