Wednesday, March 28, 2007

nothing up my sleeve (p < .05)

My ability to figure out magic tricks increased greatly the day I realized that magic tricks are optimal in an important sense. A trick is basically as impressive as it can be without giving away its secret. When the magician saws the woman in half, the parts you can see are all you can see. If the trick would be even more impressive if the woman wiggled her toes but she doesn't wiggle her toes, there's a reason there's no toe-wiggling.

It's distressing the extent to which I have been able to export this epiphany to the evaluation of quantitative social science.

I was recently talking to a colleague who does not do quantitative research about a paper we had both, in very different contexts, read. The paper addressed its substantive question using Approach A. It could have used Approach B instead. Approach A is good enough for the standards of where the paper was published, but Approach B would be the approach preferred by more quantitatively-discerning types. The paper acknowledged the existence of Approach B but made substantive and statistical arguments for why Approach A was superior. In talking to my colleague, I explained that these arguments were really not very good arguments, and that, indeed, people who understand the technical issues are not going to very persuaded by results from Approach A because you really need results from Approach B to be able to assert the conclusions of this paper with any real confidence.

The thing, though, was that I went on to take for granted that analysis using Approach B wouldn't yield statistically significant (i.e., publishable) results. My colleague asked how I could be so certain of this, since no results from Approach B were reported in the paper.

I replied that, if Approach B would have yielded the same results as Approach A, the author would have announced this fact to assuage the concerns of people like me. Especially because the author clearly understands how to do Approach B and it would have only taken, say, five minutes to check. So when I saw that the paper contained these weak arguments for the superiority of Approach A over Approach B, and made no mention of what the results would have looked like using Approach B, I read this as basically equivalent to the paper containing a giant invisible footnote that said "We tried Approach B and it doesn't work."

I hate this.


Dan Myers said...

I think you're dead right 90% of the time. One exception though is when the editor deletes the "I tried B and it worked out the same" footnotes because they aren't "necessary." This has happened to me at least twice that I can remember. I suppose I could've taken the high road and insisted or threatened to yank the paper--yeah right.

Anonymous said...

Well, maybe. But I would urge more caution before impugning the motives of the paper's author and, I might add, accusing him/her of an ethical violation. How confident are you that (a) Approach B is unambiguously superior to Approach A -- and what does that mean, anyway, in what I am inferring is the use of quasiexperimental data to make causal inferences -- and (b) the author does not legitimately believe that Approach A is preferable in this instance? Sure, *you* think that Approach B is superior, and you indicate that "more quantitatively-discerning types" do as well. But I'd be more convinced of the author's slipperiness if there were evidence that in other papers s/he had argued that Approach B was superior, but in this paper argued for the superiority of Approach A.

Anonymous said...

Ah, but this is a game of 'Gotcha!'. So, whatever.

a very public sociologist said...

Good to see symptomatic readings are alive and well in the field of quantitative research!

Anonymous said...

I am in complete agreement with Dan Myers with the possible exception of the 90% estimate, since I have had very similar experiences. I report a nice full set of robustness checks and the editors take them out and then later on someone says to me, hey, I read your paper - it was OK but why the heck didn't you look to see about this or that?

And the answer is almost always I did look, an earlier version of the paper said that I did [even at times reported the results in a separate table], and I was 'encouraged' to remove such a waste of space.

So maybe the 90% is high.


Brayden said...

I think the opposite pattern exists in many organizational journals (e.g. ASQ), where the editors/reviewers go to great lengths to ensure that every little alternative hypothesis is ruled out or that different modeling approaches are tested. In an average Strategic Management Journal paper, for example, about half or more of the results section deals with various robustness checks.

Dan Myers said...

As I think about it more, I think Chip is right. 90% is high, especially because there is a new variant of this going around now that I forgot about: "Why don't you put all of that extra stuff on a web page?" I'm not saying that's the wrong thing to do, but that kind of logic is further constricting our notions of what's appropriate for the journal proper and eventually the integrity of the journal, I fear. Most soc journals at least, do not have the capacity to claim they will have even semi-permanent web space.

Steve M said...

Even if Jeremy is correct, the scholar probably did a disservice to the field by not revealing model b. As Andrew Gelman has argued very persuasively, the social sciences have gone horribly astray by using hypothesis tests so rigidly. If model A had a p value of .04, and model b had one of .06, then that is corroboration that the result is the same under both methods with only a trivial difference in p values that can be attributed to many things (or as Gelman would say, the difference between significant and nonsignificant is not itself significant). I think this is fairly common in sociology, as lots of people seem to suppress corroboarative findings because they are not quite as good as the 'best' model that they choose to report. (Blalock actually wrote about this phenomenon in the 1960s and asked that journals do a better job of encouraging a full reporting of such models.) Now that we all have websites, we have no excuse anymore!

Anonymous said...

it seems like journals encourage this kind of behavior, by only publishing things that have significant and very strong results. There should be an incentive for people to publish negative results as well, no?