p values – not a lost cause

An interesting piece in the news section of Nature this week about the problems of statistical tests using p values to determine significance.
http://www.nature.com/news/scientific-method-statistical-errors-1.14700?WT.mc_id=PIN_NatureNews
The underlying nature of the piece is that p values are unreliable. The headlines of the figure forming the major argument in the text states: “A P value measures whether an observed result can be attributed to chance. But it cannot answer a researcher’s real question: what are the odds that a hypothesis is correct? Those odds depend on how strong the result was and, most importantly, on how plausible the hypothesis is in the ¬first place.”
However, this argument appears to relate to p-values calculated from a test such as a chi-squared or Fisher’s exact, and only these tests, as I’ll explain. Such a test may be something along these lines. You set up a classic choice experiment with a fish and some bait. The fish swims through a tube and reaches a branch in the tube (left or right). The food is in the left branch. You run the experiment 100 times and the fish turns left 73 times. Is this down to chance?
If only chance operated, then in theory, the fish has a 50:50 chance of going either way each time (ignore memory or any other preferences for now). In practice, it is unlikely to go left exactly 50 times and right exactly 50 times. If the left:right ratio was 49:51, it would be a good fit. So would 48:52, but would 73:27? You can test this statistically, and it will tell you if these results are down to chance or not. For this kind of test, the argument in the article holds – see figure here:
http://www.pinterest.com/pin/449093394064037711/
This is a weak test, with no replication. There are a whole range of reasons the fish might turn left more, which have nothing to do with the hypothesis of sensing food. However, even if you overcome the shortcomings of the method (the scientist’s job, not the statistician), it is easy to see how a different fish might respond differently (unlike most tests, replication isn’t really ingrained in these in the same way). If we got a p value < 0.05, however, it would tell us that we are 95% certain that this fish turned left more than what would be expected by our guess of what should happen by chance (i.e. chance may not actually be 50:50, even if we think it is). In this sense, the test is robust.
In addition to this, most statistical tests are different. Most are based around linear models (ANOVAs or regressions). The purpose of an ANOVA is to tell us if there is a difference in sample means between categories (are there more snails in site 1, than site 2 or site 3?). If we count every snail at the site, then we have the true population, and there’s no need for a test. However, if we sample, and work out the number per quadrat (the replicate), then we are only estimating the population mean with a sample mean (the mean is the average number per quadrat).
All an ANOVA does is to work out if having a mean for each category gives a better fit to the data than an overall mean for all categories. If so, we can conclude there is a ‘significant difference in the means’. So a p < 0.05 for an ANOVA means we are 95% sure that at least one of the category means is different from the others. Given you have taken a representative sample of sufficient size, this is robust, and repeatable, although you wouldn’t necessarily expect to get the exact same p-value each time, and with a 95% confidence, consideration of type 1 and 2 errors are of course relevant).
Equally, a regression is very similar to an ANOVA. A p < 0.05 means that if you draw a line of best fit through the points, it is a better fit to the data than a horizontal line running through the mean of the samples.
For these tests the assumption that: “A p value measures whether an observed result can be attributed to chance” is not the case. It measures if there is a better way of fitting a model to the data than just looking at how samples cluster around the mean.
There are some other issues in the article, such as the lack of association between significance and effect size in tests such as ANOVA, which do need addressing (although a nice graph of the results helps in this case). The r2 in a regression also gives a good measure of fit, normally providing more meaning than a p value. Additional information is good, but this doesn’t mean the p value is bad.
However, overall, I simply don’t buy the argument that p values are unreliable. Knowing what a statistics test does is important. Interpreting results and hypotheses in the knowledge of what a stats test does is important. But p values are very useful.
As scientists, we normally want to know if an effect can be attributed to chance. Surely we all know that this doesn’t prove our hypothesis? This just sounds like the classic rehash of not understanding cause and effect, which if you’re not easily offended is best described here:

However, removing p values removes the ability to test if something occurs by chance. I’ve recently been working with GLMMs, and the fact that there is no reliable p value is an issue for me. I can’t tell if what I have found is likely to be a real effect or not. I’m sure those with much more understanding of GLMMs can tell me how to proceed. But let’s be honest, I’m lost, and I’m sure anyone who is a non-statistician reading my paper would be as well. With a bit of common sense, they shouldn’t be lost if there are p-values.
For science to progress, we need to have a greater understanding of what statistics do, but we also need to make sure that these statistics are easy to use and easy to understand for the average scientist, even if not for the general public (but wouldn’t it be nice if we could explain it to the public too?). The p value is fit for purpose, we just need to understand what it tells us. Losing it would put scientific understanding back a long way.

ResearchBlogging.org

Regina Nuzzo (2014). Statistical Errors Nature, 506, 150-152 DOI: 10.1038/506150a

Posted in Uncategorized | Comments Off on p values – not a lost cause

Small snails and evolution of a theory

I’ve heard Richard Dawkins give an example of ‘how science works’ several times over the past ten years or so (admittedly on telly or the radio). He tells a story of how evidence changes people’s beliefs in science with a story something along these lines (apologies if this isn’t 100% correct):

At the end of a research seminar at Oxford, a distinguished professor who has always had a fundamentally opposite viewpoint to the speaker, gets up, shakes the hand of the presenter, and says that his view has now changed, based on the evidence that has just been provided.

It’s a great story about how science should work. If it’s true, however, it is probably the only time it has ever happened. In reality, most scientists seem very stuck in their own beliefs about how various things work (evolutionary units of selection – as a current example). In truth, most undergraduate students are pretty good at seeing through these polarised arguments. I’ve been asked many times after lectures presenting two opposing sides to a theory “but isn’t it a bit of both?”, and the answer is normally ‘yes’.

In my own little research world, there has been considerable debate about aggregation in intertidal snails. It’s obvious (to me at least) that aggregations prevent desiccation stress once the tide has gone out. This photo really proves the point. However, it’s been difficult to collect data to prove this.

IMG_2776

Recently, I (along with co-authors) wrote the ‘rant’ below. It spells out why you can’t measure the benefits of aggregation, as we are measuring the wrong thing (it also contains a nice analogy involving beer, a well-known Plymouth pub and a park bench). We need to measure the ‘rate’ of desiccation, not water content.

http://onlinelibrary.wiley.com/doi/10.1111/j.1439-0485.2012.00513.x/abstract

However, I am happy to be proved wrong, and I think this even more recent paper proves that while I wasn’t wrong, there is a more correct explanation. In fact, I’m happy to say that our explanation is around 30% of the answer, and this is most likely the remaining 70%.  Snails in aggregations are able to keep their operculum open, and hence continue to breath, for longer, because they don’t face a lower rate of desiccation.

http://link.springer.com/article/10.1007%2Fs00227-012-2164-6#page-1

So, a clear benefit of aggregation, a good explanation of why the water content of the snails in aggregations isn’t higher, and a scientist admitting that their theory has been outclassed by another. Pretty much a perfect outcome there.

Posted in Uncategorized | Comments Off on Small snails and evolution of a theory

Radiation, really big squids and cause and effect

You may have heard the hype around the Fukushima radiation issues in the US at the moment, with a good summary of what many people think given here:

http://www.infowars.com/government-media-cover-up-fukushima-radiation-wave-hitting-us/

It’s clear that Fukushima was a disaster, especially for those living close by in Japan, and there are some severe environmental issues in this area, which will be very long lasting. However, there are a number of people who believe there are effects of this radiation far beyond Japan, and manifest themselves in some pretty weird ways, even though there is little scientific evidence to support it.

A few weeks ago, there was a report of a giant squid washed up in California, initiated on this website:

http://www.lightlybraisedturnip.com/giant-squid-in-california/

I have to admit, I really like this story for a number of reasons. Firstly, people believed it. Have a look at some of the other stories on a site called ‘lightly braised turnip’ and see why it may not be true… Secondly, I like this as it really does raise some important points about how people react to the unknown. We know very little about the sea, especially the deep sea, and there are some pretty big squid in the sea. However, I don’t see the potential of these squid to demolish any beachfront blocks of flats in the near future. Thirdly, anyone believing this has a minor issue understanding the role radiation has on mutations (largely random, so why gigantism would occur across species is rather unclear).

Finally though, this story, and all the other ‘radiation’ attributed things illustrate an important point. It is easy to make the ‘effect’ fit the ‘cause’.  What I mean is this: lots of odd things happen, lots of highly improbable things happen. The fact that lots of highly improbable things happen isn’t unusual, it’s because a huge amount of things happen. Far more things happen which don’t seem odd, than do seem odd. We, as humans, focus on the odd things, and try to work out why they occur.

If I told you odd things happen on Friday (and you believed me, of course), then anything odd on a Friday, you would simply put down to the cause of it being Friday.  However, there are probably no more odd things happening on a Friday than any other day, but now they seem related, because they have a cause.

The same with the radiation in the Pacific. Giant oarfish – two of them, Siamese whales. Seems odd, must be the radiation. However, tests for radiation levels don’t indicate a cause for concern.

A simple lesson in probability there, but I’ve also learned two things:

1) Ramblings like this have no place in scientific journals (see here)

2) Perhaps I need that proofreader (see here)

Posted in Uncategorized | Comments Off on Radiation, really big squids and cause and effect

Proofreading

I get to review a lot of scientific papers which aren’t always written in the best English. In fairness, it isn’t easy to write in a language which isn’t your own, especially some of the oddities of English grammar, and there is no way that I could write anything in another language. However, this spam email from a proof reading company made me laugh. Most (but not all) is technically correct, but reads very oddly. Any prizes for finding the mistake?

Capture

Posted in Uncategorized | Comments Off on Proofreading

What’s this all about?

It’s still almost the start of the year, and a good time to start new resolutions.

The resolution wasn’t strictly to start a blog, but was to provide a way of disseminating marine ecology a bit more. Sometimes it might be my work and observations, but there’s far more interesting stuff than just what I do (which often involves very small snails…)

It’s also a chance to stop publishing so many papers. Not to stop altogether, which is never a good career move… but to concentrate on what’s more important, and to use this as an outlet for some more minor results.

That’s about it for now. I need to do a bit of research into some of this nuclear fallout and gigantism stuff that’s almost certainly nonsense, but big in the US at the moment.

Posted in Uncategorized | Comments Off on What’s this all about?

Rick Stafford’s Blog

Just set up this site, as of the 14th Jan 2014. More to come very soon

Posted in Uncategorized | Comments Off on Rick Stafford’s Blog