Statistics for Experimental Biologists

Home

Topic index

Key books

External links

Book

Is the information below useful? All of Chapter 3 is devoted to the question of pseudoreplication ("What is N?").




Pseudoreplication and how to avoid it


The term pseudoreplication was coined by Hurlbert (1984), which he defined as "...a particular combination of experimental design (or sampling) and statistical analysis which is inappropriate for testing the hypothesis of interest." Pseudoreplication typically occurs when the number of observations or the number of data points are treated inappropriately as independent replicates. Observations may not be independent if (1) repeated measurements are taken on the same subject, (2) the data have a hierarchical structure (e.g. in cell culture experiments), (3) observations are correlated in time, or (4) observations are correlated in space.

Pseudoreplication is discussed at length in Lazic (2010), which is available at BMC Neuroscience and therefore won't be repeated here. To give an idea of what pseudoreplication is, an example from Shipley (2000) will be used. Suppose you want to test whether people with blue eyes have longer hair than people with green eyes. So you take 20 hairs from the head of a blue-eyed person, 20 hairs from a green-eyed person, and perform an independent samples t-test with a total sample size of n=40. The problem is that the 20 hairs from a person do not provide 20 pieces of independent information, as would taking a single hair from 20 different people. The p-value associated with the t-test is meaningless.

Recently Schank & Koehnle (2009) argued that "pseudoreplication is a pseudoproblem". Despite their provocative title, they agree with many of Hurlbert's points. Hurlbert outlined three different types of pseudoreplication and offered several definitions, and Schank & Koehnle mostly objected to Hurbert's suggestion that observations close in space or time are by their very nature statistically dependent. They argue that whether observations are dependent is an empirical question, and recommend analysing such data with hierarchical/mixed models, which is the same recommendation as Lazic (2010). Pseudoreplication is most certainly a problem in laboratory experiments, and it is difficult to detect in manuscripts due to inadequate reporting of experimental details and the statistical methods used.



References


Hurlbert SH (1984). Pseudoreplication and the design of ecological field experiments. Ecol Monogr 54(2):187–211. [PDF]

Lazic SE (2010). The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neurosci 11:5. [Pubmed]

Schank JC, Koehnle TJ (2009). Pseudoreplication is a pseudoproblem. Journal of Comparative Psychology 123(4):421–433. [Pubmed]

Shipley B (2000). Cause and Correlation in Biology: A User's Guide to Path Analysis, Structural Equations and Causal Inference. Cambridge University Press: Cambridge. [Amazon]