Saturday, January 17, 2004

(stats, boring) crimes of quantification, continued

I'm reading some papers today to try to decide if I want to add them to the syllabus for my graduate methods course. The one I just finished reading was an empirical study where the number of observations was relatively small and the explanatory variables and outcomes were binary (i.e, yes or no). As is standard practice, the author used logistic regression. For one of the key variables in one model, the logit coefficient was just over 3, but was nonsignificant. Because it was nonsignificant, the author concluded that their analysis showed that there was no effect of the key variable on the outcome.

This is a longstanding pet peeve of mine. The issue is that, regardless of what the p-value from the significance test is, a logit coefficient of 3 is huge. Say if, in an election, 82% of men voted for the Republican candidate, but only 18% of women did. That's the kind of difference in percentages equals a logit coefficient of 3.* If the results of a logistic regression model indicate that an explanatory variable is associated with something like a 64 percentage point difference in the outcome, and yet the result is still not "statistically significant" by conventional standards, what it reveals is not that there is "no association" between the explanatory variable and outcome, but that the data really aren't adequate for the purposes to which they are being put.

* Which I looked up on this handout I made on this topic three years ago when I was teaching categorical data analysis, which also goes to show how long this issue has been a pet peeve.

No comments: