Statistics for Experimental Biologists

Home

Topic index

Key books

External links

Book

Interested in learning how to use R to design and analyse experiments? This book covers all the basics.




Why biologists should use R

There are numerous statistical software packages available, each with their advantages and disadvantages. People end up using either the one they were first taught, the one everyone else in the lab uses, or worse, MS Excel! R is a powerful open-source statistical package that is used by many statisticians and is becoming increasingly popular in other fields. The main drawback of R is that it takes some time to learn (it is euphemistically referred to as being "expert friendly"). This may turn some people off, but if you are a younger scientist, then it is certainly worth spending the time to learn how to use professional statistics software—you'll be using it for the rest of your career.

Advantages of R:

Free
One of the biggest benefits of R is that it is completely free, and can be downloaded from www.r-project.org. R can run on all major operating systems (Windows, Mac, Linux, Unix, etc.), and there are many free manuals, reference sheets, mailing lists, blogs, and a wiki which will help get you started.
Excellent graphics
R can produce an enormous variety of production-quality graphical output in all of the standard formats (PDF, PS, EPS, JPEG, PNG, TIFF, etc.). It is also possible to create specialised graphs from scratch.
Can import files from other statistical programs
If you currently use other statistical software such as Minitab, S, SAS, SPSS, or Stata, these data files can be imported into R, allowing for an easy transition from your current favourite program. In addition, other file types such as tab-delimited text, CSV, and even Excel can easily be imported.
New version every six months
A new version is released every six months, which means that any bugs are quickly fixed.
Large user community
R has a respected group of core developers who maintain and upgrade the basic R installation, but anyone can contribute add-on packages which provide additional functionality, such as specialised statistical tests or graphical functions. There are thousands of such packages available. In addition, there are online user groups where you can post questions and others will answer them (but please search the archive to make sure someone has not already asked the same question and had it answered).
GUIs available
R has a command line interface, which some new users may find a bit intimidating. However, if you prefer a more intuitive point-and-click graphical user interface (GUI), these are available as well. RCommander (Rcmdr) is highly recommended, because once you've selected an analysis from the menu, it prints the commands so that you can see and learn how to use them. In addition, RCommander's interface is similar to other statistical programs. More advanced users, or those planning on writing lengthy functions can also run R from Emacs.
Learn to program
R is not just a statistical package, but a programming environment, allowing users complete control over all aspects of data analysis. Learning R means you learn the basics of computer programming, which is something that every scientist should know. Writing programs or scripts allows you to automate repetitive tasks, which will reduce errors and free up time for more important things.
Speak the language of your bioinformatics/statistics colleagues
Modern biomedical research often involves collaboration between individuals with different scientific backgrounds and expertise. Knowing the basics of a tool that is commonly used by the bioinformatics, computational biology, and statistics communities will allow you to communicate better with your collaborators and to share information and data more easily.
Respect
All the cool kids are doing it.