Quantum Flow Engineering Newsletter #6

I would like to share some updates about some of the ongoing performance related work.
We have started looking at the native stack traces that are submitted through telemetry from the Background Hang Reports that take more than 8 seconds.  (We were hoping to have been able to reduce this threshold to 256ms for a while now, but the road has been bumpy — but this should land really soon now!)  Michael Layzell put together a telemetry analysis job that creates a symbolicated version of this data here: https://people-mozilla.org/~mlayzell/bhr/.  For example, this is the latest generated report.  The grouping of this data is unfortunate, since the data is collected based on the profiler pseudo-stack labels, which is captured after 128ms, and then native stack (if the hang continues for 8 seconds) gets captured after that, so the pseudo-stack and the native stack may or may not correspond, and this grouping also doesn’t help going through the list of native stacks and triage them more effectively.  Work is under way to create a nice dashboard out of this data, but in the mean time this is an area where we could really use all of the help that we can get.  If you have some time, it would be really nice if you can take a look at this data and see if you can make sense of some of these call stacks and find some useful bug reports out of them.  If you do end up filing bugs, these are super important bugs to work on, so please make sure you add “[qf]” to the status whiteboard so that we can track the bug.
Another item worthy of highlight is Mike Conley’s Oh No! Reflow! add-on.  Don’t let the simple web page behind this link deceive you, this add-on is really awesome!  It generates a beep every time that a long running reflow happens in the browser UI (which, of course, you get to turn off when you don’t need to hunt for bugs!), and it logs the sync reflows that happened alongside the JS call stack to the code that triggered them, and it also gives you a single link that allows you to quickly file a bug with all of the right info in it, pre-filled!  In fact you can see the list of already filed bugs through this add-on!
Another issue that I want to bring up is the [qf:p1] bugs.  As you have noticed, there are a lot of them.  🙂  It is possible that some of these bugs aren’t important to work on, for example because they only affect edge case conditions that affects a super small subset of users and that wasn’t obvious when the bug was triaged.  In some other cases it may turn out that fixing the bug requires massive amounts of work that is unreasonable to do in the amount of time we have, or that the right people for it are doing more important work and can’t be interrupted, and so on.  Whatever the issue is, whether the bug was mis-triaged, or can’t be fixed, please make sure to raise it on the bug!  In general the earlier these issues are uncovered the better it is, because everyone can focus their time on more important work.  I wanted to make sure that this wasn’t lost in all of the rush around our communication for Quantum Flow, my apologies if this hasn’t been clear before.
On to the acknowledgement section, I hope I’m not forgetting to mention anyone’s name here!
Posted in Blog Tagged with: , ,
2 comments on “Quantum Flow Engineering Newsletter #6
  1. Are you guys also working on updating Talos and having more modern perf infra/test to find regressions later ?

    • ehsan says:

      Yes, that is a more longer term focus though. We are also thinking about other possible ways of detecting performance regressions that we may not be currently using and hopefully our experience in the next few months will be useful in teaching us some useful lessons. We also need to get better at learning from past mistakes (see for example this bug.)

Leave a Reply