jeremy freese's weblog: (for sports/stats fans only)

Wednesday, November 19, 2003

(for sports/stats fans only)

I have only now had the chance to review an extensive e-mail sent to me by the wunderkind "Rob" Babycakes Clark. Rob pursued the question that I posed in my earlier "Any Given Sunday" weblog post and its follow-ups, namely, what is the relationship between the variability in outcomes of professional football and baseball games, and is football just as variable in its outcomes as is baseball. So that there wouldn't be any real grounds behind the common quasi-statistical argument that people make for a baseball regular season containing ten times as many games as a football regular season (namely, that talent outs more successfully in football games). The result of Rob's painstaking analysis is, basically, no--baseball is still more unpredictable. For this, among other things, he compared the variance among the regular season records for the last five years with the MLB records after 16 games. The variance in NFL records in higher.

Here, you can review part of his e-mail to me and see for yourself the razor-sharp cogs of his mind at work:

you suggest that we should see what the 

distribution of wins/losses are across baseball teams after 16 games. 

however, you also caution that baseball teams play fewer different 

opponents during their first 16 games than football teams.  so this may 

actually exaggerate the standard deviation.  thus, another option is to 

look at the distribution of winning percentages in baseball through May.  

since the average series is three games, baseball teams play about 16 

series through the first 1/3 of the season, or about 50+/- games, which is

about how many games baseball teams play through May.  so i looked at both

of these for MLB's 2002 season.  below is the output from STATA:



    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

   mlb200216 |        30    .4979167    [.1561278]    .1875      .8125

  mlb2002may |        30    .4993397    [.0958173] .3461538         .7



   nflmeansd |         1    [.1932825]         .   .1932825   .1932825

   mlbmeansd |         1    [.0785974]         .   .0785974   .0785974

   nbameansd |         1    [.1517753]         .   .1517753   .1517753



from this, we see that the standard deviation for MLB after 16 games 

(mlb200216) is still less than football's *average* standard deviation 

over a 5-year period (nflmeansd), and only slightly higher than 

basketball's *average* (nbameansd).  i would have guessed the MLB 16-game  

standard deviation would be higher than the season average for all three 

sports, especially when considering that in baseball you only play about 5

different opponents in the first 16 games.  also, we see that after the

month of May (mlb2002may), about 50-55 games, the standard deviation for

baseball approaches the baseball 5-year *average* (mlbmeansd) and is well

lower than football or even basketball which still has about 30 more 

games.

jeremy freese's weblog

Wednesday, November 19, 2003

(for sports/stats fans only)

No comments:

Followers

Blog Archive

additional manifestations

beloved tools

old website stuff

some fellow travelers

drawing credit

Contributors