Here, you can review part of his e-mail to me and see for yourself the razor-sharp cogs of his mind at work:
you suggest that we should see what the
distribution of wins/losses are across baseball teams after 16 games.
however, you also caution that baseball teams play fewer different
opponents during their first 16 games than football teams. so this may
actually exaggerate the standard deviation. thus, another option is to
look at the distribution of winning percentages in baseball through May.
since the average series is three games, baseball teams play about 16
series through the first 1/3 of the season, or about 50+/- games, which is
about how many games baseball teams play through May. so i looked at both
of these for MLB's 2002 season. below is the output from STATA:
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
mlb200216 | 30 .4979167 [.1561278] .1875 .8125
mlb2002may | 30 .4993397 [.0958173] .3461538 .7
nflmeansd | 1 [.1932825] . .1932825 .1932825
mlbmeansd | 1 [.0785974] . .0785974 .0785974
nbameansd | 1 [.1517753] . .1517753 .1517753
from this, we see that the standard deviation for MLB after 16 games
(mlb200216) is still less than football's *average* standard deviation
over a 5-year period (nflmeansd), and only slightly higher than
basketball's *average* (nbameansd). i would have guessed the MLB 16-game
standard deviation would be higher than the season average for all three
sports, especially when considering that in baseball you only play about 5
different opponents in the first 16 games. also, we see that after the
month of May (mlb2002may), about 50-55 games, the standard deviation for
baseball approaches the baseball 5-year *average* (mlbmeansd) and is well
lower than football or even basketball which still has about 30 more
games.
No comments:
Post a Comment