Data about people’s contribution to the Mozilla code base

Tonight I was talking to Josh and he mentioned how he’s interested in getting data on people’s recent contributions to different parts of the Mozilla code base.  He basically wanted to get a list of people who have contributed patches touching a given directory within a time frame.  I told him that I can simply get this data out of git, and he took me up on that challenge.  So we sat down and figured out how to get to this information.

As an example, let’s say we’re interested in the list of people who have submitted patches to the js/ engine within the first 6 months of this year.  In order to get this information, firstly you need a clone of the mozilla-central git mirror if you don’t have one around already.  Then, you can run the following command to get the data you’re looking for out of git:

$ git log --format='%an <%ae>' --since=2012-01-01 --until=2012-07-01 --all --no-merges -- js/ | sed 's/@/--at--/' | grep -v ^commit | sort | uniq -c | sort -rn | head
 199 David Anderson <danderson--at--mozilla.com>
 155 Jan de Mooij <jdemooij--at--mozilla.com>
 133 Jeff Walden <jwalden--at--mit.edu>
 115 Luke Wagner <luke--at--mozilla.com>
 114 Bill McCloskey <wmccloskey--at--mozilla.com>
 108 Nicholas Nethercote <nnethercote--at--mozilla.com>
 105 Marty Rosenberg <mrosenberg--at--mozilla.com>
 104 Bobby Holley <bobbyholley--at--gmail.com>
 103 Nicolas Pierron <nicolas.b.pierron--at--mozilla.com>
  78 Terrence Cole <terrence--at--mozilla.com>

The interesting bits here are the following:

  • –format=’%an <%ae>’ basically tells git how it should format the information about each commit.  Since we’re only interested in author names, we only need the name (%an) and the email address (%ae).  You can see the git log man page for a full list of supported formatting tokens.
  • –since and –until allow you to specify date ranges.  You can specify only one of them if you’re interested in queries such as “the past 6 months” or “the first year of the project”.
  • –all tells git-log to show you all of the commits matching the criteria.  Without it, git will walk the parent chain from the HEAD of your branch and stop on the first commit which does not match the criteria.
  • –no-merges tells git to ignore merge commits.  If you’re only interested in genuine patches, merge commits should be ignored since they don’t really reflect a useful contribution.  See below for a case where you might actually be interested in the merge commits.
  • The sed trickery is only there because I wanted to make the lives of spammers a bit harder.
  • Figuring out the rest of the shell trickery will be left as an exercise for the reader.

Here are a number of other interesting queries that I ran locally.

The list of our most active merge vikings (or, people who’ve done the most merges):

$ git log --format='%an <%ae>' --all --merges | sed 's/@/--at--/' | grep -v ^commit | sort | uniq -c | sort -rn | head
 483 Robert Sayre <sayrer--at--gmail.com>
 368 David Anderson <danderson--at--mozilla.com>
 278 Andreas Gal <gal--at--mozilla.com>
 275 Ehsan Akhgari <ehsan--at--mozilla.com>
 242 Ryan VanderMeulen <ryanvm--at--gmail.com>
 213 Marco Bonardo <mbonardo--at--mozilla.com>
 186 Ed Morley <bmo--at--edmorley.co.uk>
 184 Ed Morley <emorley--at--mozilla.com>
 153 Tim Taubert <tim.taubert--at--gmx.de>
 138 Benjamin Smedberg <benjamin--at--smedbergs.us>

Well, some of these people have done a lot of merges, but whatever!

The list of the people who have backed out the most number of patches:

$ git log --format='%an <%ae>' --all --grep='[Bb]acked *out' --grep='[Bb]ack *out' | sed 's/@/--at--/' | grep -v ^commit | sort | uniq -c | sort -rn | head
 356 Ehsan Akhgari <ehsan--at--mozilla.com>
 238 Dão Gottwald <dao--at--mozilla.com>
 217 Phil Ringnalda <philringnalda--at--gmail.com>
 199 L. David Baron <dbaron--at--dbaron.org>
 174 Robert O'Callahan <robert--at--ocallahan.org>
 171 Matt Brubeck <mbrubeck--at--mozilla.com>
 160 Ed Morley <emorley--at--mozilla.com>
 157 Shawn Wilsher <sdwilsh--at--shawnwilsher.com>
 155 Marco Bonardo <mbonardo--at--mozilla.com>
 128 David Anderson <danderson--at--mozilla.com>

The list of the people who are most conscious about our RelEng infra resource consumption:

$ git log --format='%an <%ae>' --all --grep=DONTBUILD | sed 's/@/--at--/' | grep -v ^commit | sort | uniq -c | sort -rn | head
  76 Matthew Noorenberghe <mozilla--at--noorenberghe.ca>
  73 Jonathan Griffin <jgriffin--at--mozilla.com>
  55 Ehsan Akhgari <ehsan--at--mozilla.com>
  33 Daniel Holbert <dholbert--at--cs.stanford.edu>
  31 Serge Gautherie <sgautherie.bz--at--free.fr>
  26 Jed Parsons <jparsons--at--mozilla.com>
  24 Blake Kaplan <mrbkap--at--gmail.com>
  21 ffxbld <none--at--none>
  21 Axel Hecht <l10n--at--mozilla.com>
  19 Philipp von Weitershausen <philipp--at--weitershausen.de>

The list of the people who have pushed the most patches when the tree has been closed:

$ git log --format='%an <%ae>' --all --grep='CLOSED TREE' | sed 's/@/--at--/' | grep -v ^commit | sort | uniq -c | sort -rn | head
 135 ffxbld <none--at--none>
  57 Phil Ringnalda <philringnalda--at--gmail.com>
  49 Ehsan Akhgari <ehsan--at--mozilla.com>
  30 Ed Morley <emorley--at--mozilla.com>
  26 Benjamin Smedberg <benjamin--at--smedbergs.us>
  25 Matt Brubeck <mbrubeck--at--mozilla.com>
  25 Marco Bonardo <mbonardo--at--mozilla.com>
  24 Boris Zbarsky <bzbarsky--at--mit.edu>
  20 seabld <none--at--none>
  20 Jim Mathies <jmathies--at--mozilla.com>

The list of people who have authored the most number of patches in 2012:

$ git log --format='%an <%ae>' --since=2012-01-01 --all --no-merges | sed 's/@/--at--/' | grep -v ^commit | sort | uniq -c | sort -rn | head
 590 Ehsan Akhgari <ehsan--at--mozilla.com>
 553 Ms2ger <ms2ger--at--gmail.com>
 428 Kartikaya Gupta <kgupta--at--mozilla.com>
 404 Bobby Holley <bobbyholley--at--gmail.com>
 363 Mike Hommey <mh+mozilla--at--glandium.org>
 348 Justin Lebar <justin.lebar--at--gmail.com>
 329 Aryeh Gregor <ayg--at--aryeh.name>
 326 Robert O'Callahan <robert--at--ocallahan.org>
 291 Boris Zbarsky <bzbarsky--at--mit.edu>
 283 Nicholas Nethercote <nnethercote--at--mozilla.com>

The list of people who have authored the most number of patches in all time (yes, including the old CVS days!):

$ git log --format='%an <%ae>' --all --no-merges | sed 's/@/--at--/' | grep -v ^commit | sort | uniq -c | sort -rn | head
3840 bzbarsky%mit.edu <bzbarsky%mit.edu>
3636 sspitzer%netscape.com <sspitzer%netscape.com>
3246 alecf%netscape.com <alecf%netscape.com>
2604 hyatt%netscape.com <hyatt%netscape.com>
2556 mscott%netscape.com <mscott%netscape.com>
2534 scott%scott-macgregor.org <scott%scott-macgregor.org>
2286 waterson%netscape.com <waterson%netscape.com>
2267 timeless%mozdev.org <timeless%mozdev.org>
2236 mkaply%us.ibm.com <mkaply%us.ibm.com>
2140 pinkerton%netscape.com <pinkerton%netscape.com>

One last note, Mercurial supports a similar functionality through the churn extension too, but the last time I tried it (which was a while ago) it was too slow to be useful on a repository the size of mozilla-central.

Posted in Blog Tagged with: , ,
0 comments on “Data about people’s contribution to the Mozilla code base
1 Pings/Trackbacks for "Data about people’s contribution to the Mozilla code base"
  1. [...] [1]: Data about people’s contribution to the Mozilla code base [...]