t-test
Problem
You want to test whether two samples are drawn from populations with different means, or test whether one sample is drawn from a population with a mean different from some theoretical mean.
Solution
Sample data
We will use the built-in sleep
data set.
sleep # extra group ID # 0.7 1 1 # -1.6 1 2 # -0.2 1 3 # -1.2 1 4 # -0.1 1 5 # 3.4 1 6 # 3.7 1 7 # 0.8 1 8 # 0.0 1 9 # 2.0 1 10 # 1.9 2 1 # 0.8 2 2 # 1.1 2 3 # 0.1 2 4 # -0.1 2 5 # 4.4 2 6 # 5.5 2 7 # 1.6 2 8 # 4.6 2 9 # 3.4 2 10
Sometimes it is useful to work with wide-formatted data, so we'll make a wide version of the sleep
data.
sleep.wide <- data.frame(ID=1:10, group1=sleep$extra[1:10], group2=sleep$extra[11:20]) # ID group1 group2 # 1 0.7 1.9 # 2 -1.6 0.8 # 3 -0.2 1.1 # 4 -1.2 0.1 # 5 -0.1 -0.1 # 6 3.4 4.4 # 7 3.7 5.5 # 8 0.8 1.6 # 9 0.0 4.6 # 10 2.0 3.4
Comparing two groups: independent two-sample t-test
Suppose the two groups are independently sampled; we'll ignore the ID variable for the purposes here.
The t.test
function can operate on long-structered data like sleep
, where one column (extra
) records the measurement, and the other column (group
) specifies the grouping; or it can operate on two separate vectors.
# Welch t-test # These two commands have the same effect. t.test(extra ~ group, sleep) t.test(sleep.wide$group1, sleep.wide$group2) # Welch Two Sample t-test # # data: extra by group # t = -1.8608, df = 17.776, p-value = 0.07939 # alternative hypothesis: true difference in means is not equal to 0 # 95 percent confidence interval: # -3.3654832 0.2054832 # sample estimates: # mean in group 1 mean in group 2 # 0.75 2.33
By default, t.test
does not assume equal variances; instead of Student's t-test, it uses the Welch t-test by default. Note that in the Welch t-test, df=17.776, because of the adjustment for unequal variances. To use Student's t-test, set var.equal=TRUE
.
# Student t-test # These two commands have the same effect. t.test(extra ~ group, sleep, var.equal=TRUE) t.test(sleep.wide$group1, sleep.wide$group2, var.equal=TRUE) # Two Sample t-test # # data: extra by group # t = -1.8608, df = 18, p-value = 0.07919 # alternative hypothesis: true difference in means is not equal to 0 # 95 percent confidence interval: # -3.363874 0.203874 # sample estimates: # mean in group 1 mean in group 2 # 0.75 2.33
Paired-sample t-test
You can also compare paired data, using a paired-sample t-test. You might have observations before and after a treatment, or of two matched subjects with different treatments.
Again, the t-test
function can be used on a data frame with a grouping variable, or on two vectors. It relies the relative position to determine the pairing. If you are using long-format data with a grouping variable, the first row with group=1 is paired with the first row with group=2. It is important to make sure that the data is sorted and there are not missing observations; otherwise the pairing can be thrown off.
# These two ways of doing it have the same effect. # 1. Use long-format data with grouping variable # 2. Use two vectors, in this case from wide-format data frame t.test(extra ~ group, sleep, paired=TRUE) t.test(sleep.wide$group1, sleep.wide$group2, paired=TRUE) # Paired t-test # # data: extra by group # t = -4.0621, df = 9, p-value = 0.002833 # alternative hypothesis: true difference in means is not equal to 0 # 95 percent confidence interval: # -2.4598858 -0.7001142 # sample estimates: # mean of the differences # -1.58
The paired t-test is equivalent to testing whether difference between each pair of observations has a population mean of 0. (See below for comparing a single group to a population mean.)
t.test(sleep.wide$group1 - sleep.wide$group2, mu=0, var.equal=TRUE) # One Sample t-test # # data: sleep.wide$group1 - sleep.wide$group2 # t = -4.0621, df = 9, p-value = 0.002833 # alternative hypothesis: true mean is not equal to 0 # 95 percent confidence interval: # -2.4598858 -0.7001142 # sample estimates: # mean of x # -1.58
Comparing a group against an expecteed population mean: one-sample t-test
Suppose that you want to test whether the data in column extra
is drawn from a population whose true mean is 0. In this case, the group
and ID
columns are ignored.
t.test(sleep$extra, mu=0) # # One Sample t-test # # data: sleep$extra # t = 3.413, df = 19, p-value = 0.002918 # alternative hypothesis: true mean is not equal to 0 # 95 percent confidence interval: # 0.5955845 2.4844155 # sample estimates: # mean of x # 1.54
To visualize the groups, see ../../Graphs/Plotting distributions (ggplot2), ../../Graphs/Histogram and density plot,and ../../Graphs/Box plot.