15 Model for Factorial Experiments
Assuming we have a continuous response variable for which we assume a normal distribution, no blocking factors and a factorial experiment with two treatment factors, the following model is plausible:
\[ Y_{ijk} = \mu + A_i + C_j + (AC)_{ij} + e_{ijk} \]
where \(Y_{ijk}\) is the \(k^{th}\) observation on the \((ij)^{th}\) treatment combination and
\[ \begin{aligned} Y_{ij} &= \text{observation on treatment } i \text{ in block } j \\ \mu &= \text{general/overall mean} \\ A_i &= \text{main effect of the } i^{th} \text{ level of A} \\ A_i &= \text{main effect of the } j^{th} \text{ level of C} \\ (AC)_{ij} &= \text{interaction between the }i^{th}\text{ level of A and the }j^{th}\text{ level of C.} \\ e_{ijk} &= \text{random error with } e_{ijk} \sim N(0, \sigma^2) \\ \end{aligned} \]
Note that (AC) is a single symbol and does not mean the interpaction is the product of the two main effects.
\(\mu + A_i + C_j + (AC)_{ij}\) is the structural part of the model which describes the mean or expected response with treatment \(ij\), i.e. at the \(i^{th}\) level of factor A and \(j^{th}\) level of factor C. Depending on the estimates for the main effects, each treatment will have a different estimated mean response. For every level of A there is a main effect, the \(A_i\)’s. For every level of factor B there is a main effect, \(B_i\). For every combination of A and B levels there is an interaction effect, \((AC)_{ij}\). So the model implies that each treatment mean is made up of an overall mean, two main effects and and interaction term.
15.1 Replication
Replication is crucial in any experiment! Without replication, we cannot estimate the experimental error variance (\(\sigma^2\)), which is essential for assessing variability and conducting hypothesis tests.
If we only have one observation per treatment, that observation becomes the treatment mean. Since we cannot compute deviations from the treatment mean, there is no estimate of error variance. This means that while we can technically estimate the model parameters, the model itself is practically useless—we cannot perform hypothesis tests without an estimate of error variance. And if we can’t test anything, what’s the point?
In factorial experiments, the situation gets even worse when we don’t replicate treatments. Specifically, we can’t calculate the interaction effect. In general, the interaction effect is calculated as the difference between the treatment mean and the sum of the main effects and the overall mean:
\[(AC)_{ij} = \bar{Y}_{ij} - (\mu + A_i + C_j) \]
Now consider the first plot in the series below. There is only one observation in the hypothetical treat \(i = 1\) and \(j = 3\). That means that the treatment mean \(\bar{Y}_{ij}\) is just the mean of this single observation. We can’t calculate any deviations from this mean with only one observation as we usually would for the error variance. We always need to calculate an error term and this is always calculated as the deviation of the observation to the next closest mean. With only one observation per treatment, the next closest mean to that observation is the sum of the main effects: \(\mu + A_i + B_j\). But wait, that means the error term is also the interaction term since the treatment mean and the observation are the same? Jup! Now you see the problem. With no replication, the error term and the interaction effect are confounded.
Have a look at the second plot. Now we have five observations within the treatment. We can calculate the 5 error terms:
\[r_{13k} = Y_{13k} - \bar{Y}_{13}\]
and we can calculate the interaction effect:
\[ (\hat{AB})_{13} = \bar{Y}_{13}- (\hat{\mu} + \hat{A}_1 + \hat{B}_3)\]
They are no longer the same thing, they are separable! The interaction effect for this treatment is quite big if you look at the difference visually. For the last plot, there are also five observations, but now the deviation of the treatment mean from the sum f the main effects is almost zero; it’s just due to random variation. The interaction effect is too small to detect statistically.
15.2 Parameter estimation
The maximum likelihood/least squares estimates are found by minimizing the error or residual sum of squares:
\[ S = \sum_{ijk} (Y_{ijk} - \mu - \alpha_i - \beta_j - (\alpha\beta)_{ij})^2 \]
The solutions to these equations are the least squares estimates:
\[ \hat{\mu} = \bar{Y}_{...} \]
\[ \hat{A}_i = \bar{Y}_{i..} - \bar{Y}_{...}, \quad i = 1, \dots, a \]
\[ \hat{C}_j = \bar{Y}_{.j.} - \bar{Y}_{...}, \quad j = 1, \dots, c \]
\[ (\hat{AC})_{ij} = \bar{Y}_{ij.} - \bar{Y}_{i..} - \bar{Y}_{.j.} + \bar{Y}_{...}, \quad i = 1, \dots, a, \quad j = 1, \dots,c \]
The main effects are as before, except that now they refer to differences between row or column means ([Figure 6.3]) and the overall mean. The interaction effects are estimated as the differences between treatment means and the sum of the main effects.
15.3 Back to the example
Let’s fit this model to the data of the playback and lecture modality experiment. This time, we have access to the actual data collected! Let’s explore the data and check our assumptions. The assumptions are the same as in a one-way ANOVA. That is normality of errors, equal population variances, independent errors and no outliers.
As always we start by reading in our data, checking that it has been read in correctly and looking at some descriptive statistics.
<- read.csv("Datasets/Exp2DataPlayback.csv")
data
head(data)
Participant.ID Condition Speed Content.Type Accuracy
1 945445adf5 1x Audio-Visual 2 Audio-Visual 42
2 23afb88ef3 1x Audio 1 Audio-Only 56
3 1bc24e0480 1x Audio 1 Audio-Only 62
4 4fbdbd41a5 1x Audio 1 Audio-Only 44
5 442adf227a 1x Audio 1 Audio-Only 56
6 3ca9d09e2e 1x Audio-Visual 2 Audio-Visual 48
The data set contains 5 columns:
Participant.ID
– This column contains a unique identification code for each participant in the study.
Condition
– Indicates the experimental condition or treatment, which includes both playback speed (1x or 2x) and content type (Audio-Only or Audio-Visual).
Speed
– A numeric column that explicitly represents the playback speed, with1
for normal speed (1x) and2
for double speed (2x).
Content.Type
– Specifies whether the participant received Audio-Only or Audio-Visual content.
Accuracy
– The participant’s performance score, representing comprehension accuracy.
summary(data)
Participant.ID Condition Speed Content.Type
Length:200 Length:200 Min. :1.0 Length:200
Class :character Class :character 1st Qu.:1.0 Class :character
Mode :character Mode :character Median :1.5 Mode :character
Mean :1.5
3rd Qu.:2.0
Max. :2.0
Accuracy
Min. :14.00
1st Qu.:42.00
Median :54.00
Mean :52.78
3rd Qu.:64.00
Max. :90.00
From the summary, you should notice a few things:
All the columns are read in as character values except
Speed
andAccuracy
. We need the relevant columns to factors if we want to use them in our analysis.Speed
andAccuracy
seem to be read in as numeric values. This makes sense forAccuracy
but notSpeed
!Speed
is a categorical variable with levels 1x and 2x, we need to correct this.
$Condition <- factor(data$Condition)
data$Content.Type <- factor(data$Content.Type)
data$Speed <- factor(data$Speed)
data
summary(data)
Participant.ID Condition Speed Content.Type
Length:200 1x Audio :50 1:100 Audio-Only :100
Class :character 1x Audio-Visual:50 2:100 Audio-Visual:100
Mode :character 2x Audio :50
2x Audio-Visual:50
Accuracy
Min. :14.00
1st Qu.:42.00
Median :54.00
Mean :52.78
3rd Qu.:64.00
Max. :90.00
Great, now we can see that each treatment was replicated 50 times as we expected. To check our assumptions we start as always by plotting the response against treatments.
boxplot(Accuracy ~ Condition, data = data,
ylab = "", main = "", las = 1)
# we could also have specified the first argument as Accuracy ~ Content.Type * Speed
stripchart(Accuracy ~ Condition, data = data, vertical = TRUE, add = TRUE, method = "jitter")
There are no clear signs of deviation from normality, the box-plots look relatively symmetric. We could plot Q-Q plots for each treatment as well. Let’s do that for two of the treatments.
par(mfrow=c(1,2)) # splitting the plotting window into 1 row with 2 columns
qqnorm(data$Accuracy[data$Condition == "1x Audio"], pty = 4, col ="blue", main = "1x Audio")
qqline(data$Accuracy[data$Condition == "1x Audio"], col = "red")
qqnorm(data$Accuracy[data$Condition == "1x Audio-Visual"], pty = 4, col ="blue", main = "1x Audio-Visual")
qqline(data$Accuracy[data$Condition == "1x Audio-Visual"], col = "red")
No worrisome patterns! Next, the box-plots also suggest that there are no outliers and there are no clear indications that the assumption of equal population variance is not reasonable. Let’s have a look a the standard deviations.
sort(tapply(data$Accuracy, data$Condition, sd))
1x Audio-Visual 1x Audio 2x Audio 2x Audio-Visual
13.71690 14.01084 14.93064 16.76386
The ratio of the smallest to largest is roughly 1.22 which is much smaller than five. Lastly, we need to check the assumption of independence. We start by assuming that the order in which the data are in the data set is the order in which the measurements were taken and we construct a Cleveland dot chart.
dotchart(data$Accuracy, ylab = "Order of observation", xlab ="Post treatment test score")
No clear patterns in the measurements, so no reason to suspect any dependence between successive measurements. The students were randomly assigned to each group and there are no other reasons to believe that independence was violated based on the description of the experiment.
With all the assumptions checked, we can move onto fitting the linear model to our data and inspecting the model estimates. Here is the model equation:
\[ \text{Accuracy}_{ijk} = \mu + \text{Speed}_i + \text{Content.Type}_j + \text{(Speed:Content.Type)}_{ij} + e_{ijk} \]
where,
\[ \begin{aligned} i &= 1,2\text{ and }j = 1,2\\ e_{ijk} &= \text{random error with } e_{ijk} \sim N(0, \sigma^2) \\ \end{aligned} \]
In R, we fit the model like this:
<- aov(Accuracy ~ Speed + Content.Type + Speed:Content.Type, data = data)
model model.tables(model, type = "means", se = TRUE)
Tables of means
Grand mean
52.78
Speed
Speed
1 2
54.94 50.62
Content.Type
Content.Type
Audio-Only Audio-Visual
49.10 56.46
Speed:Content.Type
Content.Type
Speed Audio-Only Audio-Visual
1 50.32 59.56
2 47.88 53.36
Standard errors for differences of means
Speed Content.Type Speed:Content.Type
2.108 2.108 2.981
replic. 100 100 50
R allows a convenient shorthand for this type of model. Instead of typing out all three terms, you can shorten the right hand side of the formula to Speed*Content.Type
. The *
operator indicates to R that we want main effects and interaction effects. Try it yourself to see that you get the same result.
We extract the treatment means as before. The grand mean is shown first. Now with a factorial treatment structure, we get the mean values for each level of the treatment factors included and the treatment means. In the output below, we see the means of the 1x and 2x speed followed by the means for the levels of content type. Lastly, the treatment means are presented in a 2 by 2 matrix format. The treatment “1x Audio Only” had a mean accuracy of 50.32, “2x Audio-Only” mean response is 47.8, and so on. And then lastly, the standard errors for differences between means.
We can also extract the estimated effects as before.
model.tables(model, type = "effects", se = TRUE)
Tables of effects
Speed
Speed
1 2
2.16 -2.16
Content.Type
Content.Type
Audio-Only Audio-Visual
-3.68 3.68
Speed:Content.Type
Content.Type
Speed Audio-Only Audio-Visual
1 -0.94 0.94
2 0.94 -0.94
Standard errors of effects
Speed Content.Type Speed:Content.Type
1.490 1.490 2.108
replic. 100 100 50
First we get the main effects for Speed
and Content.Type
. Then we get the interaction effects and standard errors. Let’s check that we understand how these interaction effects are calculated. Remember:
\[(AB)_{ij} = \bar{Y}_{ij} - (\mu + A_i + B_j) \]
So for treatment \(i = 1\) and \(j=1\), the equation becomes:
\[\hat{(AB)}_{11} = \bar{Y}_{11} - (\hat{\mu} + \hat{A}_1 + \hat{B}_1) \]
We go by the dimensions of the matrix returned by R, so then treatment \(i = 1\) and \(j=1\) is “1x Audio-Only”. Substituting the estimated values:
\[ \begin{aligned} (AB)_{11} &= 50.32 - (52.78 + 2.16 -3.68) \\ &= -0.94 \end{aligned} \]
Which is what R outputs as well. Now, we want to ask is there evidence for an interaction effect? To do this we need to construct the ANOVA table.