{"id":23,"date":"2014-02-13T11:23:02","date_gmt":"2014-02-13T11:23:02","guid":{"rendered":"http:\/\/blog.rickstafford.com\/?p=23"},"modified":"2014-02-13T11:35:02","modified_gmt":"2014-02-13T11:35:02","slug":"p-values-not-a-lost-cause","status":"publish","type":"post","link":"http:\/\/blog.rickstafford.com\/?p=23","title":{"rendered":"p values &#8211; not a lost cause"},"content":{"rendered":"<p>An interesting piece in the news section of Nature this week about the problems of statistical tests using p values to determine significance.<br \/>\n<a href=\"http:\/\/www.nature.com\/news\/scientific-method-statistical-errors-1.14700?WT.mc_id=PIN_NatureNews\">http:\/\/www.nature.com\/news\/scientific-method-statistical-errors-1.14700?WT.mc_id=PIN_NatureNews<\/a><br \/>\nThe underlying nature of the piece is that p values are unreliable. The headlines of the figure forming the major argument in the text states: \u201cA P value measures whether an observed result can be attributed to chance. But it cannot answer a researcher\u2019s real question: what are the odds that a hypothesis is correct? Those odds depend on how strong the result was and, most importantly, on how plausible the hypothesis is in the \u00acfirst place.\u201d<br \/>\nHowever, this argument appears to relate to p-values calculated from a test such as a chi-squared or Fisher\u2019s exact, and only these tests, as I\u2019ll explain. Such a test may be something along these lines. You set up a classic choice experiment with a fish and some bait. The fish swims through a tube and reaches a branch in the tube (left or right). The food is in the left branch. You run the experiment 100 times and the fish turns left 73 times. Is this down to chance?<br \/>\nIf only chance operated, then in theory, the fish has a 50:50 chance of going either way each time (ignore memory or any other preferences for now). In practice, it is unlikely to go left exactly 50 times and right exactly 50 times. If the left:right ratio was 49:51, it would be a good fit. So would 48:52, but would 73:27? You can test this statistically, and it will tell you if these results are down to chance or not. For this kind of test, the argument in the article holds \u2013 see figure here:<br \/>\n<a href=\"http:\/\/www.pinterest.com\/pin\/449093394064037711\/\">http:\/\/www.pinterest.com\/pin\/449093394064037711\/<\/a><br \/>\nThis is a weak test, with no replication. There are a whole range of reasons the fish might turn left more, which have nothing to do with the hypothesis of sensing food. However, even if you overcome the shortcomings of the method (the scientist\u2019s job, not the statistician), it is easy to see how a different fish might respond differently (unlike most tests, replication isn\u2019t really ingrained in these in the same way). If we got a p value &lt; 0.05, however, it would tell us that we are 95% certain that this fish turned left more than what would be expected by our guess of what should happen by chance (i.e. chance may not actually be 50:50, even if we think it is). In this sense, the test is robust.<br \/>\nIn addition to this, most statistical tests are different. Most are based around linear models (ANOVAs or regressions). The purpose of an ANOVA is to tell us if there is a difference in sample means between categories (are there more snails in site 1, than site 2 or site 3?). If we count every snail at the site, then we have the true population, and there\u2019s no need for a test. However, if we sample, and work out the number per quadrat (the replicate), then we are only estimating the population mean with a sample mean (the mean is the average number per quadrat).<br \/>\nAll an ANOVA does is to work out if having a mean for each category gives a better fit to the data than an overall mean for all categories. If so, we can conclude there is a \u2018significant difference in the means\u2019. So a p &lt; 0.05 for an ANOVA means we are 95% sure that at least one of the category means is different from the others. Given you have taken a representative sample of sufficient size, this is robust, and repeatable, although you wouldn\u2019t necessarily expect to get the exact same p-value each time, and with a 95% confidence, consideration of type 1 and 2 errors are of course relevant).<br \/>\nEqually, a regression is very similar to an ANOVA. A p &lt; 0.05 means that if you draw a line of best fit through the points, it is a better fit to the data than a horizontal line running through the mean of the samples.<br \/>\nFor these tests the assumption that: \u201cA p value measures whether an observed result can be attributed to chance\u201d is not the case. It measures if there is a better way of fitting a model to the data than just looking at how samples cluster around the mean.<br \/>\nThere are some other issues in the article, such as the lack of association between significance and effect size in tests such as ANOVA, which do need addressing (although a nice graph of the results helps in this case). The r2 in a regression also gives a good measure of fit, normally providing more meaning than a p value. Additional information is good, but this doesn\u2019t mean the p value is bad.<br \/>\nHowever, overall, I simply don\u2019t buy the argument that p values are unreliable. Knowing what a statistics test does is important. Interpreting results and hypotheses in the knowledge of what a stats test does is important. But p values are very useful.<br \/>\nAs scientists, we normally want to know if an effect can be attributed to chance. Surely we all know that this doesn\u2019t prove our hypothesis? This just sounds like the classic rehash of not understanding cause and effect, which if you\u2019re not easily offended is best described here:<br \/>\n<iframe loading=\"lazy\" width=\"640\" height=\"480\" src=\"http:\/\/www.youtube.com\/embed\/pQjqxayxwt4?feature=oembed\" frameborder=\"0\" allowfullscreen><\/iframe><br \/>\nHowever, removing p values removes the ability to test if something occurs by chance. I\u2019ve recently been working with GLMMs, and the fact that there is no reliable p value is an issue for me. I can\u2019t tell if what I have found is likely to be a real effect or not. I\u2019m sure those with much more understanding of GLMMs can tell me how to proceed. But let\u2019s be honest, I\u2019m lost, and I\u2019m sure anyone who is a non-statistician reading my paper would be as well. With a bit of common sense, they shouldn\u2019t be lost if there are p-values.<br \/>\nFor science to progress, we need to have a greater understanding of what statistics do, but we also need to make sure that these statistics are easy to use and easy to understand for the average scientist, even if not for the general public (but wouldn\u2019t it be nice if we could explain it to the public too?). The p value is fit for purpose, we just need to understand what it tells us. Losing it would put scientific understanding back a long way.<\/p>\n<p><span style=\"float: left; padding: 5px;\"><a href=\"http:\/\/www.researchblogging.org\"><img alt=\"ResearchBlogging.org\" src=\"http:\/\/www.researchblogging.org\/public\/citation_icons\/rb2_large_gray.png\" style=\"border:0;\"\/><\/a><\/span><\/p>\n<p><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&#038;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&#038;rft.jtitle=Nature&#038;rft_id=info%3Adoi%2F10.1038%2F506150a&#038;rfr_id=info%3Asid%2Fresearchblogging.org&#038;rft.atitle=Statistical+Errors&#038;rft.issn=&#038;rft.date=2014&#038;rft.volume=506&#038;rft.issue=&#038;rft.spage=150&#038;rft.epage=152&#038;rft.artnum=http%3A%2F%2Fwww.nature.com%2Fnews%2Fscientific-method-statistical-errors-1.14700%3FWT.mc_id%3DPIN_NatureNews&#038;rft.au=Regina+Nuzzo&#038;rfe_dat=bpr3.included=1;bpr3.tags=Research+%2F+Scholarship%2CEcology+%2F+Conservation\">Regina Nuzzo (2014). Statistical Errors <span style=\"font-style: italic;\">Nature, 506<\/span>, 150-152 DOI: <a rev=\"review\" href=\"http:\/\/dx.doi.org\/10.1038\/506150a\">10.1038\/506150a<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>An interesting piece in the news section of Nature this week about the problems of statistical tests using p values to determine significance. http:\/\/www.nature.com\/news\/scientific-method-statistical-errors-1.14700?WT.mc_id=PIN_NatureNews The underlying nature of the piece is that p values are unreliable. The headlines of the &hellip; <a href=\"http:\/\/blog.rickstafford.com\/?p=23\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/blog.rickstafford.com\/index.php?rest_route=\/wp\/v2\/posts\/23"}],"collection":[{"href":"http:\/\/blog.rickstafford.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/blog.rickstafford.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/blog.rickstafford.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/blog.rickstafford.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=23"}],"version-history":[{"count":5,"href":"http:\/\/blog.rickstafford.com\/index.php?rest_route=\/wp\/v2\/posts\/23\/revisions"}],"predecessor-version":[{"id":29,"href":"http:\/\/blog.rickstafford.com\/index.php?rest_route=\/wp\/v2\/posts\/23\/revisions\/29"}],"wp:attachment":[{"href":"http:\/\/blog.rickstafford.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=23"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/blog.rickstafford.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=23"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/blog.rickstafford.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=23"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}