Tuesday, June 16, 2009

Failing Grade on Significant Figures

This analysis is second hand (I have not read the journal article); it is based entirely on the article in today's Inside Higher Ed titled "Failing Grade on Alcohol".

According to the press release, the study came up with an allegedly shocking result

Using figures from government databases and national surveys on alcohol use, researchers at the National Institute on Alcohol Abuse and Alcoholism (NIAAA) found that drinking-related accidental deaths among 18- to 24-year-old students have been creeping upward -- from 1,440 in 1998 to 1,825 in 2005.

that was published in the "Journal of Studies on Alcohol and Drugs".

What is wrong with this? Everything, if IHE described the study accurately.

The IHE reporter says the authors did the following:
Researchers from the National Institute on Alcohol Abuse and Alcoholism multiplied the number of 18- to 24-year-olds in the United States, as reported by the Census Bureau, by the estimated percentage of deaths among 18- to 24-year-olds that were alcohol-related, as provided by 331 medical examiner studies. That number was multiplied by 30 percent, since three-tenths of 18- to 24-year-olds are in college.

Do you see the mistakes?

The biggest error is that "three tenths" has only one significant figure, although it might have had two before they wrote that sentence. That means their values should be stated as 1400 and 1800 (appropriate in either case because many people think that a leading 1 does not convey a full significant figure). Their choice conveys a hugely false sense of precision, as if they KNOW it was 1825 students, rather than 1824 or 1826, that died - particularly when it is quite likely that they don't know whether it was 1700 or 1900 that died as a result of alcohol use in that year.

The important point is that they don't know whether the people who died were actually students.

But there are several other errors inherent in this approach.

Are there equal percentages of men and women in college within this age group? The data I showed the other day suggest they aren't. Are men and women equally likely to show up in those morality statistics? Unlikely, given what I see in our newspaper.

There is also an unstated uncertainty in using a small sample of coroner data. (How many counties have a college in their coroner district?) What was the standard deviation in the mean value they used? Could it be that they are comparing 1400 +/- 200 with 1800 +/- 200 to get an increase of 400 +/- 400? Or worse?

Finally, all of this might be dwarfed by the use of a highly subjective term like "alcohol related". That box can get checked on a traffic incident report if one of the drivers reports having had a drink recently, regardless of whether there is a legal finding of intoxication or even fault for the crash. That means a totally sober student could have been counted if s/he was killed in a car accident with someone who had had a few drinks with dinner.

That doesn't even count concerns such as those of one person quoted in the article, who pointed out that many college students live within walking distance (or a short drive) of the parties they go to, in contrast to others in the same age group.

At least the abstract of the article said the "aim of this study was to estimate" [emphasis added] the mortality from alcohol use in the 18-24 age group, but that didn't stop them from giving a 4 sig fig value for a 1 sig fig result. This is particularly pathetic when the lead author believes "that it would have been better to have data on every injury death, but maintains that from the information he used, the results were a conservative estimate of the number of deaths" according to the author of the IHE article.

That means he knows that both the 1400 and the 1800 values were very uncertain, and that the true value of the first value might even be bigger than the second. This doesn't stop him from quoting "exact" numbers and omitting any estimate of the uncertainty in the calculated values. Surely, having risen to the position of Division Director at a national institute, he knows how to use a stat package to estimate the uncertainty? Oh, right. I almost forgot about this true story. Maybe there is a chance that he used the raw numbers from his calculation without a hint of a clue of how to estimate the uncertainty in the number of deaths from the uncertainties in the coroner data and in the fraction attending college.

1 comment:

Anonymous said...

just goes to show that 5 out 4 people do not understand stats.