A small set observations with a few extreme observations plus subjective splitting of a data set into two subsets to be fitted separately to a linear regression model resulted in very clear cut conclusions and striking figures. However, none of this is solid evidence, or evidence at all supporting the paper’s conclusions. This series of articles, not only discusses the problems in the paper, but more importantly, it traces the review process that allowed it to be published in Nature.

A new analysis of the data appears in an article at “Ask a Swiss” but still based on model fitting. They detect a significant change in slope, but still we do not have confidence bands available.

In most situations pseudo-random numbers produced by computer software (“random” number generators) are good enough as long as we are careful when choosing the seed forthe generator. Sometimes, it can be even an advantage to be able to reproduce sequences of pseudo-random numbers by setting the seed value. Frequently, the seed is obtained from the clock of the computer, e.g. using the seconds or milliseconds digits from current time. This is still not truly random, as random numbers cannot be generated by any deterministic process. True random numbers can be only be generated by a random physical process.

The site random.org is a service which provides true random numbers for free (at least if below a quota). R package random provides an interface to this service.

I earlier mentioned that a high-ranking journal in Psychology called “Basic and Applied Social Psychology” has banned the use of P-values. Today, I came across some additional material on this question. First of all, the controversial editorial where the decision was announced.

A paper, published in this journal, giving guidelines on the best way of presenting results without use of P-values. The paper by Geoff Cumming, titled “The New Statistics: Why and How” makes a good argument for using confidence intervals and other descriptive statistics in place of P-values.

He also has a series of videos in YouTube from which the three linked to below are related to the use (and misuse) of P-values. For my liking he does not make a clear enough distinction between the problem inherent to P-values (that they discard a lot of information to reach a true/false decision) and those problems due to the misuse and misinterpretation of tests of significance. He does mention the difference, but you need to keep your eyes and ears open to get this out of his presentations.

In addition a blog and podcast of a round table complete the discussion of this issue giving a bit wider account of the controversy surrounding the use of P-value.