And it's fucking annoying.
There's so many issues with how this is done I don't know where to start - so I'll use some examples.
From the Daily Mail, "Women really DO fancy rich men more as scientists find a bigger salary adds two to his ‘out-of-ten’ rating (but it’s looks not wealth that matters the other way round)"
From Yahoo (which is really just copy/pasting a Reuters article): "Leaving the house linked to longevity in older adults"
The Daily Mail article tries to summarize the results of a study and its discussion as reported with its author in The Times (I'm not able to overview the original article due to a paywall). Problem is, is that only at the very end of the article - in a very vague format - does it even mention the rest of the literature on the topic. They say,
"Although several studies in the past have considered whether women care about the salary of a partner, there has been no study comparing men and women in this area until now."
I can't begin to emphasize how bullshit this reference is. For one, what other studies are these? What were the results of them? Did they support the results of this study, or run contrary to it? Were their methodologies superior or inferior to this one?
In most instances of social science, you can't make a wide, sweeping conclusion from the results of only one study - they often have small samples that are unrepresentative (i.e. college students at the university the researcher works at), a variety of uncontrolled factors, among other study, subject, and field specific concerns. Yet absolutely none of this is referred to in the article. Instead, an all encompassing conclusion is made based on one study's conclusion.
This leads me to my second issue - there's no mention of the methodology. Sometimes you will find studies give a brief overview of the methodology - the Yahoo article I'm going to talk about momentarily does just this. The problem is, even when these sorts of flashy news articles do in fact do it, the Average Joe has no idea how to assess whether the methodology can be considered solid, or if there's some grave flaw in it. Not to mention, they really only detail the experimental methodology, and not the statistical one - such as what analytical methods were used, the specific tests used (such as say what self report measures of self esteem), in what way was the data presented in, etc. This leads me to my third issue.
These sorts of reports never report the results accurately - while they may say, in the case of the Daily Mail article, that a '10x larger salary adds two points to men's attractiveness rating,' this really isn't helpful since it tells us nothing about the relevant statistics. After all, the practical importance of this rests on the specific scales used - would the additional two points be essentially nothing, such as say with SAT scores, or would they matter a great deal, such as in the Ten Item Personality Measure? Although, it does refer to it in the title, it really tells us nothing about the scale itself. So let's talk about that.
The whole ordeal is referring to a study by Guanlin Wang and colleagues (Speakman apparently isn't even the study's main author) in the journal "Evolution and Human Behavior." The scale they used is one developed by them for the study that's a 1 - 9 scale (not 1 - 10 as Daily Mail said). They don't note the problems with this - that such a scale hasn't been normalized on any population, and in fact there's little information given on how the attractiveness was gauged on it, beyond that they presented a series of images for the subjects to rate. For another, though, the study methodology is essentially that they presented a series of cards over two intervals - the first one without income at the bottom of each card, and the second (of the same cards) with income at the bottom of it. This is already problematic since there may be bias for having been exposed to the same cards beforehand - it would've been far better to use a different set of cards to minimize potential bias.
There's also another thing I find questionable about the study - it's that, for calculating the rank order of the given card results, they, for some odd reason, used different equations. For women, the equation 1+(n-1)*0.4 was used, and for men the equation 1+(n-1)*4/7, where n is, as the authors themselves say, "the rank order of the image from the least attractive to the most attractive." There's no rationale given for this differential use, which makes the study extremely questionable to me.
Another issue I have is the interpretation of the results. For a brief rundown - I'll give more detail of this later down the line of this blog - of what I'm talking about, statistical significance refers to the probability that the data, given the null hypothesis is true, is due to chance. The null hypothesis is the hypothesis being tested against - it typically assumes that there's no difference between the comparison groups, no effect of the variable being tested, etc. This probability is represented with something called a p value. For instance, a p value of 0.05 means that there's a 1 / 20 chance that the results are due to chance, a p value of 0.01 means that there's a 1 / 100, etc. While typically, a p value of 0.05 and less is used to represent a statistically significant result (i.e. there's a very low probability the results are due to chance), this is arbitrarily selected, and it's generally recommended to use one of 0.001 instead, which gives a 1 / 1000 chance the results are, well due to chance, as it can sufficiently reject the null hypothesis.
However, within the study, the keen eyed reader will be able to pretty quickly see that none of the results pass this criteria, - they're all only less than 0.01 or 0.05, not 0.001 - meaning they're indistinguishable from chance, and thus it cannot be concluded an effect was found.
There's even more to note beyond my own criticisms of the study - as the authors themselves note important limitations of it. Specifically, they point out that they only used young faces, - and not a wide variety of them - they didn't allow subjects to rate faces equally attractive (though they note that a separate study of theirs found no real difference between ranking and rating, so this isn't likely to be an issue), there was no assessment of sexual orientation which may have biased the results, the likely lack of this information being in day-to-day situations thus not allowing much practical interpretation, and, due to the specific way they assessed attractiveness and salaries, that the effects may be limited - such that people of certain levels of attractiveness may 'transcend,' in a sense, levels of economic hierarchy. They also point out that, compared to the other countries' samples, the US's one was small, thus meaning that the differences between the US and these countries could very well be an artifact of its small sample size.
From all this, it's pretty clear that nothing can be said of what the Daily Mail did of the study - results were insignificant, there was an abundance of methodological issues, etc. Yet, was any of this even hinted at in the article? No. Not a single fucking word mentioned limitations. It may be that the Times article they referenced mentioned all this - I can't view it so I don't know. But even if it did, the fact that DM decides to act as if the study's results are clear as day and robust in spite of this just shows the abundant issue with this.
The Yahoo article, on the other hand, is a bit better as mentioned before - but it's still shit. The statistical methods aren't referred to, no information on other studies, no information on the actual data of the results (at least the Daily Mail gave some idea of it), and nothing on any possible limitations of the study at hand. So let's take a look at them, why not!
The study, by Jeremy Jacobs and colleagues published in the American Geriatrics Society, is fairly solid in its methodology - it uses well validated scales such as the Mini-Mental State Examination, Brief Symptom Inventory, etc. It utilizes hazard ratios, which assess the relative risk of a certain outcome, such as one of 1.50 would be a 50% increased risk of an outcome. The results, however, were nonsignificant in the last, and most robust, model. I have no criticisms of the study, however - so it's generally a well designed one. While the results were misinterpreted, nonetheless it's understandable given the widespread misinterpretation of p values mentioned before.
Nonetheless, though, none of that information would've been prevalent in the initial Yahoo article, which just showcases my point - it's treating one study as if it's conclusive rather than overviewing the literature on this and detailing the specific methodological weaknesses and strengths of studies. Someone would probably say that's a bit hefty to ask of journalists - but that's the exact problem here, since if science isn't going to be reported to the general public accurately, then what's the point at all? It just gives the illusion of a grand, overarching and truthful conclusion that results from just one fucking study rather than a tedious, but rewarding and informative, overview of the relevant literature. All it does is misinform the public rather than inform, and gives weak and inaccurate information.
So journalists, unless you aim to do a detailed overview of the relevant literature and methodology of studies on a given topic, please stop your bullshit. Thanks.
No comments:
Post a Comment