jeremy freese's weblog: (a stats thing, boring and trivial, so much so that i don't really know why i'm posting it)

Monday, January 05, 2004

(a stats thing, boring and trivial, so much so that i don't really know why i'm posting it)

Today's late afternoon distraction has been some items from the the 1998 General Social Survey. Respondents were told "Women are more likely than men to take care of children. I'm going to read several reasons why this might be so. Please tell me how important you think each reason is: very important, important, somewhat important, or not at all important."

The five explanations given were:

1. Women are biologically better-suited to care for children.
2. Women are taught from childhood how to care for children.
3. The way society is set up, women don't have much choice.
4. Men have more freedom to do other things.
5. It is God's will that women care for children.

One might expect that some of these explanations would be inversely correlated with one another: for example, we might think that people who think explanation #1 is "very important" would be more likely to regard explanation #3 as not important, etc.. We might likewise expect negative correlations for items #3 and #4. However, when we look at the correlation matrix for these variables, this is not what we observe.

(obs=1322)



             |  fekids1  fekids2  fekids3  fekids4  fekids5

-------------+---------------------------------------------

     fekids1 |   1.0000

     fekids2 |   0.5270   1.0000

     fekids3 |   0.1866   0.2415   1.0000

     fekids4 |   0.1536   0.2101   0.4708   1.0000

     fekids5 |   0.4501   0.3857   0.2611   0.2216   1.0000

All of the correlations are positive. People who rated any one of these explanations as "very important" were more likely to rate any other explanation as also "very important." Instead of observing some positive and some negative correlations, we end up treating a correlation like it was negative just because it was less positive than other correlations.

This happens all the time with survey data and is cause for at least mild despair. The reason for it is that different people use the answer categories differently. Some people think all of the explanations provided are at least "important," while other people don't find any of them anything more than "somewhat important." Another way of saying this is that even though respondents were given four answer categories, they didn't use all four.

See what happens to the correlation matrix when we take out all the people who gave the same rating to all five items:

(obs=1233)



             |  fekids1  fekids2  fekids3  fekids4  fekids5

-------------+---------------------------------------------

     fekids1 |   1.0000

     fekids2 |   0.4685   1.0000

     fekids3 |   0.0916   0.1481   1.0000

     fekids4 |   0.0582   0.1170   0.4055   1.0000

     fekids5 |   0.3990   0.3256   0.1740   0.1271   1.0000

The correlations are still all positive, but they are somewhat smaller than before. Ultimately, there weren't that many people who gave the same answer to all five. Let's also eliminate people who used only two answer categories for all five items--in other words, let's get rid of the effect of there being some people who answered everything in terms of "very important"/"important" and other people who answered everything in terms of "important"/"somewhat important". Here:

(obs=707)



             |  fekids1  fekids2  fekids3  fekids4  fekids5

-------------+---------------------------------------------

     fekids1 |   1.0000

     fekids2 |   0.3230   1.0000

     fekids3 |  -0.1201  -0.0706   1.0000

     fekids4 |  -0.1915  -0.1426   0.2307   1.0000

     fekids5 |   0.2436   0.1338  -0.0389  -0.0934   1.0000

There, now we have negative correlations for pairs of items that we might have thought should have negative correlations.

Of course, we can't just throw away people just because they only used two answer categories. What to do? There is an underutilized solution, but, alas, I don't have time to type it out here. Plus, I wouldn't just want to hand it to the many statistical spies who monitor this weblog.

jeremy freese's weblog

Monday, January 05, 2004

(a stats thing, boring and trivial, so much so that i don't really know why i'm posting it)

No comments:

Followers

Blog Archive

additional manifestations

beloved tools

old website stuff

some fellow travelers

drawing credit

Contributors