par(mfrow = c(1,2))
plot(model, which = 1)
plot(model, which = 2)
16 ANOVA
The model for a factorial experiment with two treatment factors was:
\[ Y_{ijk} = \mu + A_i + C_j + (AC)_{ij} + e_{ijk} \]
If we move \(\mu\) to the left-hand side of the equation, we get:
\[ Y_{ijk} - \mu = A_i + C_j + (AC)_{ij} + e_{ijk} \]
Now, each of the terms on the RHS is a deviation from a mean.
- The main effects come from the overall mean \(\mu\),
- The interaction effects from \(\mu + A_i + C_j\),
- The error terms from the treatment means.
We can square and sum the corresponding observed deviations and obtain sums of squares. For a balanced factorial experiment, the total sum of squares on the LHS can be split into four parts, corresponding to:
- Main effects of factor A,
- Main effects of factor C,
- Interaction between A and C effects,
- Error.
\[ SS_{total} = SS_A + SS_C + SS_{AC} + SS_E \]
The degrees of freedom for these sums of squares are:
\[ abn - 1 = (a - 1) + (b - 1) + (a - 1)(b - 1) + ab(n - 1) \]
where \(n\) is the number of replicates per treatment. The degrees of freedom on the right-hand side add up to the total degrees of freedom. Once again, we summarise all this in a table.
ANOVA Table
The following table summarizes the partitioning of variation:
Source | SS | df | MS | F |
---|---|---|---|---|
A Main Effects | \(SS_A = nb \sum_i (\bar{Y}_{i..} - \bar{Y}_{...})^2\) | \((a - 1)\) | \(MS_A\) | \(\frac{MS_A}{MS_E}\) |
C Main Effects | \(SS_C = na \sum_j (\bar{Y}_{.j.} - \bar{Y}_{...})^2\) | \((b - 1)\) | \(MS_C\) | \(\frac{MS_C}{MS_E}\) |
AC Interactions | \(SS_{AC} = n \sum_{ij} (\bar{Y}_{ij.} - \bar{Y}_{i..} - \bar{Y}_{.j.} + \bar{Y}_{...})^2\) | \((a - 1)(b - 1)\) | \(MS_{AC}\) | \(\frac{MS_{AC}}{MS_E}\) |
Error | \(SS_E = \sum_{ijk} (Y_{ijk} - \bar{Y}_{ij.})^2\) | \(ab(n - 1)\) | \(MSE\) | - |
Total | \(SS_{total} = \sum_{ijk} (Y_{ijk} - \bar{Y}_{...})^2\) | \(abn - 1\) | - | - |
There are three F-tests in this ANOVA table.
- \(H_{AB} : (\alpha\beta)_{ij} = 0\) for all \(i\) and \(j\) (Factors A and B do not interact)
- \(H_A : \alpha_i = 0 \quad i = 1, \dots, a\) (Factor A has no main effects)
- \(H_B : \beta_j = 0 \quad j = 1, \dots, b\) (Factor B has no main effects)
The alternative hypothesis is, in each case, that at least one of the parameters considered is non-zero.
While discussing interactions, we saw that sometimes, with strong interaction effects, the main effects of a factor may disappear (be close to zero). But this does not mean that the factor has no effect. On the contrary, it has an effect on the response; the effects just differ over the levels of the other factor and may average out.
Therefore, we usually start by testing the interaction effects. If there is evidence for the presence of interactions, we have to examine the main effects with this in mind, i.e., be careful with the interpretation of the main effects. Some people say that it becomes meaningless to test for main effects if there is evidence of interactions. However, this depends on what we want to know. The main effects still tell us whether or not the average response changes with changing levels of the factor.
The F-ratio always has the mean square for error in the denominator. As before, it is a ratio of two variance estimates, and in each case, it can be seen as a signal-to-noise ratio: how large are the effects relative to the experimental error variance?
16.1 Back to the example
Before we inspect the ANOVA table for the working example, we need to check the assumptions about the errors after model fitting. We do this by inspecting the residuals.
There are no clear violations, in the first plot, the residuals appear to be centered around zero and the spread is reasonably equal across groups. The second plot is a Q-Q plot of the residuals which shows nothing worrisome. Remember we can also plot a histogram of the residuals to check normality.
For the independence assumption, we construct the dot chart once again but with the residuals.
dotchart(model$residuals) # note the different way of extracting residuals!
The y-axis is messy but we can ignore that, it shows the index of each observation and there are 200 hence why it overlaps so much. The residuals look uniform, there are no systematic patterns or trends in the plot.
Let’s see what the ANOVA table looks like for our working example.
summary(model)
Df Sum Sq Mean Sq F value Pr(>F)
Speed 1 933 933.1 4.201 0.041724 *
Content.Type 1 2708 2708.5 12.195 0.000592 ***
Speed:Content.Type 1 177 176.7 0.796 0.373485
Residuals 196 43532 222.1
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Verify that the degrees of freedom are what you expected! First, we look at the interaction. The F-value is quite small which leads to a large p-value of 0.37. This means that we really have no evidence against the null hypothesis that the factors interact. There is some evidence for a main effect of Speed but there is much stronger evidence as indicated by the small p-value for a main effect of lecture modality.