Saturday, July 01, 2006

a brief note to my friends who use spss

If you are only generating cross-tabs and frequency distributions, fine. Or simple graphs and you don't especially want to invest any time in having them look nice. Otherwise, stop. You know the phrase "friends don't let friends drive drunk"? Well, it's not like we're at some party where I can take away your SPSS keys and call you a Stata cab. But if we were, I would. In any case, the important point is not that you should be using Stata. If you can figure out R, that's at least as good. And SAS is fine enough. I like Stata because it makes doing data analysis like playing one of those all-text adventure games from the 80s. But that's not the point. The point is that SPSS is toxic. It's like it is designed to encourage people to make mistakes and forever do their work in wildly inefficient ways. It's the intellectual equivalent of a car that was built with some kind of funhouse-optical-illusion-trick-glass for the windshield. Stop, now.

As a separate and more advanced matter, regardless of the statistical package you are using, if the process of writing a quantitative paper for you involves looking at printed output and retyping numbers from that output into a table (or paying someone to retype the numbers for you), you should really get away from that. It's much better to take the extra time to figure out a way to get your stats package to present the numbers in a way that makes them easy to paste into a table. You may have to put the little stars for statistical significance in by hand, but that should be it.

You know I only say this because I love you.

Update: This video contains an interview with one of the originators of SPSS, which he starts to talk about around 3 minutes into it.

15 comments:

Anonymous said...

I bet most of your readers already know this, especially those using Stata, but mktab was a godsend for helping me get output into tables quickly. It even does the stars for you.

jeremy said...

I use -estout-, which is complicated but once you get it to make one table you like, the code is easily adapted to other table. -estout- is insanely flexible (which is mainly why the syntax ends up complicated).

Anonymous said...

For the R/LaTeX perspective, here's the obligatory pointer to my brief note on choosing workflow applications". Hey, if I can "figure out R", believe me anyone can.

Anonymous said...

can you post some funny stuff?

jeremy said...

Anon: Shush and watch Chad Vader again.

Kieran: The note looks great. The point about figuring out your platform in grad school is especially poignant to certain of us. Then again, a postdoc would also be a great platform for software retooling, if not used for internet and novel-reading addictions.

Anonymous said...

Jeremy loves me!

Tom Bozzo said...

I dunno about the export-results-into-table thing. Call me old-fashioned, but actually having to pore over the results with my eyeballs and look for things that are amiss before they go into a report has its uses.

SPSS, on the other hand, really doesn't.

jeremy said...

Tom: I find it easier to look over results if I have them in a table, which I suppose is good or else I should just be presenting output. A main advantage to automatically generated tables is that they are easy to regenerate if you decide to make a small change. Although that raises the danger of your point, as one can regenerate the numbers without appreciating whether a seemingly small change actually made a difference for results.

Anonymous said...

I can't believe I didn't get credited in this post! Maybe you were trying to protect me. It's okay, I'm happy to go public on this.

I second Jim's note about mktab, absolutely love it. The note about the asterisk gives away our affiliation, I think. Some fields don't bother with that, after all, it's somewhat subjective and if you know what the numbers mean (yeah, I know, I guess that's the big iffy), it's not as though it's that hard to figure out what would deserve a star in a star-studded world.

Anonymous said...

Eszter, I can't remember where I first heard about mktab, but it was definitely your Stata Goodies Page that helped me figure out how to use it. Big kudos on that.

Anonymous said...

Glad it helped, Jim. I wouldn't be surprised if I had learnt about it from Conrad.

This is what you get Jeremy when you don't acknowledge someone in your post. We take over the thread!

Unknown said...

My sole complaint about Stata is that it isn't bundled well for teaching. "Big" Stata is too expensive for undergraduates, especially non-majors who aren't likely to ever need it again. Small Stata's 1000 case limit makes downloading and analyzing subsets of, say, the GSS a needless hassle. SPSS' student version is the same price as Small Stata, but if it has a case limit, none of my students have encountered it.

My second sole complaint is that the authors' of the Stata manuals haven't figured out that "data" is a plural noun.

Janet said...

I agree with you about SPSS and I'm trying to encourage my grad students to be critical about their software choices and introduce them to alternatives, such as Stata. (Kieran: Your document on workflow applications was helpful.)

However, the institutional and budgetary constraints are real. I had one colleague tell me that I should be careful about converting all of the grad students to Stata because they would likely later be employed at schools that can only afford SPSS. While I disagree with this perspective, it highlights some of the institutional baggage that comes with software, computing, and overall IT decisions.

jeremy said...

Janet: If budgetary constraints a dominant issue, all the stronger argument for R (free!), except that (Kieran is being modest here) it is hard to use, especially for beginners.

I don't really see SPSS as a big bargain given all their add-ons and (if still year-by-year) the way they do their licenses.

Anonymous said...

Wow. When I first saw this topic title I thought it was "a brief note to my friends who use spas." Maybe when everyone is feeling tense after talking stats, we can explore this new topic. - T.