Tuesday, April 18, 2006

sharing

I have a circulable draft of my paper arguing that sociologists should be doing more to share code and data at the time of publication. If you read it, I'd certainly be interested in comments.

I just got an e-mail from someone I had sent it to asking "Is it OK to share your member?" I think the person must have been simultaneously thinking "memo" and "paper", but I'm not sure. In any case, I wrote back that it was okay to share my paper. Remarkably, perhaps, I was able to restrain myself from adding that it was definitely not OK to share my member with anyone.

Anyway, I received more information on the Great Leap Forward by economics in its expectations regarding the information researchers will provide at the time of publication. A key moment was apparently a 2003 paper by McCullough and Vinod that included an effort to replicate the five papers that used nonlinear models in a single 1999 issue of AER. Authors of one of the papers cooperated with the investigators' queries. Of the other four, McCullough and Vinod report:
Two authors provided neither data nor code: in one case the author said he had already lost all the files; in other case, the author initially said it would be "next semester" before he would have time to honor our request, after which he ceased replying to our phone calls, e-mails, and letters. A third author, after several months and numerous requests, finally supplied us with six diskettes containing over 400 files--and no README file... A fourth author provided us with numerous datafiles that would not run with his code. We exchanged several e-mails with the author as we attempted to ascertain how to use the data with the code. Initially, the author responded promptly, but soon the amount of time between our question and his response grew. Finally, the author informed us that we were taking too much of his time.

12 comments:

jeremy said...

I post a link to a paper I've written, and all I get from the world is comment spam. Ah, academia.

Anonymous said...

i'm intrigued by the paper and the larger proposition, but i'm still formulating responses to the critical "is it OK to share your member?" question. i'm also distracted, trying to discern why your spammer treats "audis" as plural but "lamborghini's" as possessive. i'm no language maven, but one would hope for a little consistency.

Anonymous said...

What is this strange field? Do taxpayers fund it, or is it self-sustaining?

jeremy said...

If taxpayers want to be irritated at a field that studies human affairs, in terms of the relationship between the amount of government money that it receives and the extent to which it is public about its data and materials (or even, in some areas, its measures), it should be irritated first at psychology.

tina said...

Jeremy,

I really like the paper. I feel strange giving feedback on a blog, though I can't for the life of me say why, so I will do it anyway.

I especially like the comparison between the social and individualistic approaches to this issue. Very mature of you not to note the irony that sociology has the individualistic policy--it shines through anyway.

One improvement I'd like to suggest: it strikes me that the tone of the paper is a bit too defensive. It really gives you the sense that this is a crazy idea from a sociology radical. Though I agree that speaking to anticipated criticisms is a good idea, I think that there is more to be said about the positive benefits of code publishing. Perhaps an example or two of a case in which a replication revealed more, added to our knowledge, spared debate, etc. (I can think of one example, but I'm sure there are lots). And perhaps another example of the Internet making replications super timely. I'm racking my brain trying to remember a conversation over at Crooked Timber a couple years back where someone made some claims and shared his data, and a flurry of re-analyses made for a really interesting and smart conversation. I'm sure Kieran would remember.

So, I don't think you should just footnote the various sorts of re-analysis that code sharing makes possible. I think you should pull that out and present it as something Really Useful to sociology. You are faced with the unusual problem that the biggest arguments against adopting this are quickly dismissed, as you do successfully. After that, I suggest you push in a more positive direciton.

Other than that minor concern, I think the paper is great.

One last thing, Gary King co-wrote my Most Favorite Qualitative Methods Book ever. I just lent my copy out to yet another grad student. He rocks. And so does your data sharing idea.

tina said...

that would be sparked debate, not spared debate...

Anonymous said...

Without getting irritated, what is this [...] field and who or what subsidizes it?

Anonymous said...

I agree with Tina. No point encouraging all those people who keep saying, "Oh, well, I think it's a good idea but of course no one else will ever go for it." Anyway, I thought it was good that you emphasized the need to be explicit that data are not available (and why) in cases where they can't be shared; I thought that pretty much answered any last objections anyone could have, a point you made well.

Anonymous said...

anon 1:11:

what the [...] are you talkig about?

Anonymous said...

talkig -- otherwise known as 'talking'

Anonymous said...

I would echo Tina in noting the uncomfortable / defensive tone of the paper. Something is off. You should come out swinging, but the impression is one of special pleading. The paper doesn’t “set em up and knock em down” in the way a paper like this should. In terms of suggestions:

Page 3 (mis)characterizes the obligation of sociologists as “ethical” and “individual.” Actually, it is a professional responsibility. The document you are citing is a code of professional ethics, written by a committee on professional ethics, not a code of personal or individual ethics.

While I am sympathetic to your scheme, the implicit assumption that my responsibility to assist others in replication extends beyond a circle of “responsible researchers” to anyone, anywhere with a copy of SAS or SPSS on their desktop is more than a little frightening. Sure, that is not what you are specifically proposing, but it is surely implied.
Riddle me this one Jeremy: I park the files from my latest addition to the literature at the Freese Data Barn. Next fall, I get 100 inquiries from students in intro regression courses around the world on how to make my program “go.” Do I have to answer?

jeremy said...

(I like this idea of getting anonymous comments on a paper prior to sending it officially out. Score another point for the possibilities of the Internet for intellectual work.)

The points about tone are well-taken and something I'm working on in the revision-in-progress. I don't agree on the point that the way sociology handles the issue makes it an individualistic matter and a matter of ethics (which is different from saying that it's a matter of 'individual ethics'). In any event, if one's individual ethics on this matter diverge from the official professional ethics about verification, it seems like one is submitting oneself to the collective ethics when one agrees to publish work in a sociological journal.

Thing is, if sociology embarks on public depositing of code, it's not anything new. So, it's weird to imagine problems arising (like getting 100 student e-mails with inane software questions), when it hasn't arisen elsewhere.

I will admit that I am generally uncomfortable with the idea that a researcher gets to make the call on who they share things with. Psychology uses the language "competent professionals" rather than "responsible researchers," which may be more to your liking and even less to mine.

Riddle you this, btw: Say I'm outside your area, and the only reason I want to try to replicate your work is that I'm doing a study of the extent to which work in sociological journals can be replicated. You, for whatever reason, think my doing this substantively-indifferent study is a dumb exercise. Are you obliged to share your stuff with me?