Monday, March 27, 2006

the revolution may not be televised, but it will be peer-reviewed

Okay, so say you are in sociology and your reaction to my proposal about code archiving is something like (a) you think it is a good idea but that (b) the rest of sociology will never go for it. Hey, consider this: you are part of sociology. As such, you can help contribute to positive disciplinary change.

Here's how: If you review a paper for a major sociology journal that you think deserves to be published (or to be published after adequate revision), you could copy and paste this at the end of the COMMENTS FOR AUTHOR section of your review:
I think the results of this paper are sufficiently interesting and provocative that I can imagine the possibility of others wanting to verify the results or to build off these results in future work. For this reason, I strongly encourage the author to deposit code and other information relevant for reproducing the results in an permanent online archive at the time of publication, and perhaps even indicating that this has been done in a footnote to the paper (e.g., "Code used in the analyses presented in this paper is available in the ICPSR Publications Related archive"). Such archiving at the time of publication is becoming standard in economics, and I see no reason why this work is less important or less worthy of serious anticipatory attention to replication than what economists do.
Okay, you might think the last sentence is a little much. But, that aside, there is presumably nothing insincere in your saying at least the rest of this. In other words, presumably in thinking that the article should be published in a major sociology journal you do think the article is worthwhile enough that someone could care about whether its results can be verified or care otherwise about the decisions the analyst made. If you don't think this about the paper, why do you think it deserves prime sociology journal space?*

And in the COMMENTS FOR EDITOR section, copy and paste this:
See paragraph at the end about code and/or data sharing. I urge you to underscore this point in your letter.
As Margaret Mead once said, "Never doubt that a small group of thoughtful, persistent reviewers can change the prevailing methodological practices of a discipline. Indeed, it is the only thing that ever has."

* I mean, if your opinion about the worthwhileness of sociological research is so low that you would recommend publication in a major journal of findings that you can't imagine anyone actually caring much about, why are you in sociology? Unless you are old and stuck and can't do anything else, but, if you are that alienated/disillusioned/disenchanted, why are you still reviewing papers? Seriously, life is too short, do something else with yours.

16 comments:

Rebekah Ravenscroft-Scott said...

first of all, i love that this post is dated "Monday, March 27, 2006" and I'm commenting on it on Sunday, March 26.

secondly, what does this say about qualitative work? how can we get qualitative researchers to provide anything that will allow other researchers to replicate their findings? does replication not apply to qualitative work?

just asking,
rebekah

jeremy said...

A few months ago, I put a policy on my timestamping practices in my sidebar.

As for qualitative work, note that I'm dodging even the thorny issues that characterize efforts to encourage greater data-sharing in quantitative social science. These issues are even more sensitive in qualitative social science, and, indeed, IRBs often provide strictures that hamstring qualitative social scientists from engaging in data-sharing they might wish to do. With the idea of sharing code, I don't know if there is really much that could be considered equivalent to code (a set of computer operations that transform 'raw data' into 'findings') in much qualitative work.

Rebekah Ravenscroft-Scott said...

thanks, on the first point, i didn't notice your policy as it is way at the bottom of the bar. my soul was not, in any way, tortured by your timestamp, in case you were wondering. :)

on the second point, indeed, there is not much comparable to code in qualitative work so i worry that your worthy efforts will only serve to further distance qualitative work from the "science" of sociology. it would be an entirely unintended consequence, i'm sure but also a sad one.

brady said...

As for replicating qualitative work:

On the one hand, you could go about this by re-analyzing particular cases (which seems to be what y'all are talking about). Fair 'nuff. This is probably most important in regard to "big deal" findings.

On the other hand, something that I think speaks to there being some fairly rigorous qualitative methodology going on out there is that by and large, qualitative reseachers tend to replicate each others' findings quite a bit without trying.

Sure, there's quibbling about the specifics of certain (sometimes very important) points, but by and large I'd say that most studies find that 90% of what was previously reported still holds, given differences you might expect from changes in setting, particular cases, etc. That may not be the point of the write-up, and the author may spend very little time discussing it, but when you step back and dissect the text/article in a very close reading and ignore what the author says is most "interesting", you start to see that the majority of what is being reported jives pretty closely with the rest of the relevant literature.

Now, this could all be caused by significant theoretical biases or assumptions, too. But that's a whole 'nuther muddle.

mein zwei pfennig. . .

Kieran said...

seeing as this thread continues the previous one, a random question: is your officemate who likes Sweave Jake Bowers?

jeremy said...

Yes, Jake is my officemate. He's amused that I am going to be trying to bring sociology onto the replication train.

Anonymous said...

Thanks for the sample paragraph. I'll make sure to use it (or something like it) in my next review (due 4/5/06). :-)

-sr

Anonymous said...

A few comments, starting with the obligatory yes, it's a good idea, but...

1) Despite all their talk, economists have been pretty bad at this, too. I urge you to try to find the promised place on the AER website, for example, where data is available. The existence of an editorial policy alone seems to have no effect. Demography has the same data-sharing policy, and the ASA Code of Ethics Section 13.05, which should apply to ASR among other things, also has such a policy. And, as evidenced by the Rothstein-Hoxby throwdown, people can get nasty when you demand their data based on threats about editorial policies. What we need is to establish a norm that replication is an important practice that improves the quality of work in the field. Have you ever tried to get a replication article published? Good luck. Many social scientists seem implicitly not to want anyone to talk about their work. They'd rather send it out to some vacuum, add a line to their CV, and move on.

2) I'm not sure that code alone does anything to encourage/support replication if it does not come with data. And then, of course, we're back to the problem of proprietary/confidential data. I had reason recently to go through the last 5 years of ASR and AJS, looking for articles that used public data, and it's a shockingly small number. I'm currently writing a paper based on confidential data collected by the government, and the paper provides a footnote with detailed instructions on how to acquire the data, but it's a long process that requires some degree of clout (a grad student probably couldn't get it, for example). I know you're aware of this problem, but this in my mind is still maybe the biggest obstacle to widespread replication. I don't think very many people will download and look at code if there's no way to actually run it on the data.

So...how do we establish a norm? It seems like this is something that takes local action. Some economists and some sociologists already post their data and code on their websites. The incentive to do this is that it potentially generates more interest in their work--people find it easier to build on the work and are more likely to engage with it, cite it, etc. Everybody wins, in theory.

I'm glad this is provoking so much interest. Perhaps sociology won't be as hostile to it after all...

-dks

jeremy said...

(dks? I don't know if I'm supposed to recognize those initials, but I don't.)

AER has links to datasets alongside its articles. See this page for an example. I haven't followed these links yet to look at what exactly they contain.

My worry about the data-sharing issue is that most sociological articles contain data that the author does not have rights to distribute, and that along with other proprietary interest matters can be used to be a showstopper for the entire replication effort (this is my reading of the replication debate in political science in the 1990s).

jeremy said...

Addendum: The AER web posting of dataset appears to have started in September 2004, so that's where the clock starts for how far sociology ends up being behind on this issue.

Brayden said...

You're right that proprietary rights could create some problems. I use S&P's COMPUSTAT for a lot of my research, which you can't share with anyone at an institution that doesn't subscribe to the same data. While I'm pretty sure I could share my stata coding of the exported data, I'm not sure how I would specify the exporting syntax itself.

Also, I use a relational database to aggregate data from different sources. I suppose you could export the SQL syntax to show how these data were pulled together, but I think it would make the replication process a little trickier.

Anonymous said...

Jeremy,

Still puzzling over this one. Of course, if you are inclined to replicate, you are already free to do so, and empowered to do so my the norms of the profession. Just don't get it

jeremy said...

Anon 7:56am: Have you ever tried to reproduce results from an article? Because this is something I've done several times, with varying success.

jeremy said...

Anon 7:56: I mean, I guess a first question would be: do you understand why economics (which takes itself to be doing important and provocative research) might not see it as enough that people who want to verify results are "free to do so, and empowered to do so my the norms of the profession"? Or do you think they are irrational by thinking that published results should be supported by (online) published documentation of the code that produced those results.

Anonymous said...

Jeremy:

I mean, I guess a first question would be: do you understand why economics (which takes itself to be doing important and provocative research) might not see it as enough that people who want to verify results are "free to do so, and empowered to do so my the norms of the profession"?

Anon 7:56 sez: Lack of social capital/trust? Econ is a boys game over who's got the bigger...um...formal proof and da mad skills. It is a hyper-competitive, testosterone-drenched enterprise which is all about the game of "gotcha." Enough?

Jeremy:

Or do you think they are irrational by thinking that published results should be supported by (online) published documentation of the code that produced those results.

Anon 7:56 sez: Nah, not irrational, but, again, you could just ask. BTW, I'm all for replication, but I'm also for maintaining professional norms and standards like, you know, a profession. There is nothing about economics as a profession, other than the mad props they get from the decision-makers, that I envy or would care to import into sociology. And, no, they don't get the influence because they are more science-y, so don't go there.

Anonymous said...

I'm getting to this conversation a little late. I'm all for sharing code but I do wonder how much that will actually further social scientific research. I suspect it might instead lead to excessive discussion (I might even call it quibbling) about methodological choices which, ultimately, are just choices since statistics can only try to approximate life.