Wednesday, November 19, 2003

(for sports/stats fans only)

I have only now had the chance to review an extensive e-mail sent to me by the wunderkind "Rob" Babycakes Clark. Rob pursued the question that I posed in my earlier "Any Given Sunday" weblog post and its follow-ups, namely, what is the relationship between the variability in outcomes of professional football and baseball games, and is football just as variable in its outcomes as is baseball. So that there wouldn't be any real grounds behind the common quasi-statistical argument that people make for a baseball regular season containing ten times as many games as a football regular season (namely, that talent outs more successfully in football games). The result of Rob's painstaking analysis is, basically, no--baseball is still more unpredictable. For this, among other things, he compared the variance among the regular season records for the last five years with the MLB records after 16 games. The variance in NFL records in higher.

Here, you can review part of his e-mail to me and see for yourself the razor-sharp cogs of his mind at work:
```you suggest that we should see what the
distribution of wins/losses are across baseball teams after 16 games.
however, you also caution that baseball teams play fewer different
opponents during their first 16 games than football teams.  so this may
actually exaggerate the standard deviation.  thus, another option is to
look at the distribution of winning percentages in baseball through May.
since the average series is three games, baseball teams play about 16
series through the first 1/3 of the season, or about 50+/- games, which is
about how many games baseball teams play through May.  so i looked at both
of these for MLB's 2002 season.  below is the output from STATA:

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
mlb200216 |        30    .4979167    [.1561278]    .1875      .8125
mlb2002may |        30    .4993397    [.0958173] .3461538         .7

nflmeansd |         1    [.1932825]         .   .1932825   .1932825
mlbmeansd |         1    [.0785974]         .   .0785974   .0785974
nbameansd |         1    [.1517753]         .   .1517753   .1517753

from this, we see that the standard deviation for MLB after 16 games
(mlb200216) is still less than football's *average* standard deviation
over a 5-year period (nflmeansd), and only slightly higher than
basketball's *average* (nbameansd).  i would have guessed the MLB 16-game
standard deviation would be higher than the season average for all three
sports, especially when considering that in baseball you only play about 5
different opponents in the first 16 games.  also, we see that after the
month of May (mlb2002may), about 50-55 games, the standard deviation for
baseball approaches the baseball 5-year *average* (mlbmeansd) and is well
lower than football or even basketball which still has about 30 more
games.```