<- 1:4
units rbind(sample(units,4), rep(c("A","B","C","D")))
[,1] [,2] [,3] [,4]
[1,] "3" "4" "2" "1"
[2,] "A" "B" "C" "D"
So far, we have examined completely randomised designs where randomisation of experimental units to treatments was completely unrestricted. With complete randomisation, all other variables (the environment that we can never control completely) that might affect the response are, on average, equal in all treatment groups. This allows us to be confident that differences in group means are due to the treatments.
However, there is often important variation in additional variables that we are not directly interested in. If we can group our experimental units with respect to these variables to make them more similar, we achieve a more powerful design. This is the idea of blocking.
If blocks are used effectively, we can separate variability due to treatments, blocks, and errors, reducing unexplained variability. That is, variability between blocks can be estimated and removed from the residual error. Essentially, we compare treatments over more similar experimental units than in a completely randomised design. With reduced error variance, our test becomes more powerful.
Blocking is also useful when we want to demonstrate that treatment differences hold over a wider range of conditions. For example, in the social media multitasking example, the experiment was conducted on first year students. Strictly speaking, the results then only apply to first year students and extrapolation to students in different years of their degree is limited. Alternatively, we could choose students from first, second and third year (for example) and apply one replicate of each treatment within year. In this case, year of study would be the blocking factor.
More generally, we often want to show that our results hold for different species, age groups, or biological sexes. In such cases, we could use species, age, or sex as blocks. While blocks are typically used to control for variation in variables we are not directly interested in, sometimes these factors may also be of interest in their own right.
When is a factor a treatment, and when is it a block?
A good way to distinguish between them is by asking whether we can manipulate the factor and randomly assign experimental units to its levels.
We generally cannot manipulate the age or sex of an individual, but we can manipulate, for example, the food they receive. So, age and sex are blocking factors, whereas food type is a treatment.
We can manipulate the level of social media multitasking, but we cannot manipulate the year of study of students. So, level of social media multitasking is a treatment, while year of study is a block.
Although we can always estimate differences between blocks, we need to be much more cautious when inferring causality from block-level differences or from any factor that we cannot randomise (as is the case in observational studies).
Suppose we study whether different teaching methods (interactive vs. traditional) affect student performance, conducted in public and private schools.
If private school students perform better, we cannot conclude school type caused the difference due to potential confounders such as socioeconomic background or teacher quality.
Even though we control for school type, observed differences may be due to these external factors, not just the school itself. We cannot be sure that the observed differences are really only due to school type.
Sometimes, however, blocking variables can also be randomised. Suppose a study is testing two medications (A vs. B) for blood pressure, experiments are conducted in two labs (Lab 1 & Lab 2).
Patients could have been randomly assigned to labs, but if logistical constraints prevent this, lab is used as a block. Since we only care about medication effects, lab differences are treated as a nuisance variable. The real difference is interest. We are not interested block effects on the response, only treatment effects. Blocking factors are used to control for known sources of variation that might obscure the treatment effect.
Any variable that might affect the response besides treatment factor should be considered for blocking. Common blocking factors include:
For example, if we are testing the effectiveness of a new advertising campaign, it would be useful to block by city or region to control for differences in local economies, purchasing behavior, or media consumption. Similarly, if an experiment measures the impact of dynamic pricing on sales, it is a good practice to replicate the price changes across multiple days, blocking for daily or weekly variations in consumer spending habits. This way time accounts for these differences rather inflating the error variance.
Likewise, if we are studying the effect of sports training programs on player performance, and athletes train in different facilities with varying equipment, we could assign a block to each training center to ensure that facility-related differences are accounted for. This prevents training location from being mistaken as a treatment effect, allowing a clearer evaluation of the actual program’s impact.
The key takeaway is that reducing error variance increases the power of the experiment. Thoughtful blocking design helps achieve this by accounting for known sources of variation.
There are a few different types of randomised block designs depending on the availability of experimental units and size of the blocks. Here we will consider the best case scenario, where blocks are big enough to contain an equal amount of experimental units such that each treatment occcurs exactly once within a block. If we have a single treatment factor with \(a\) levels, then we have \(a\) experimental units per block. This design is said to be balanced, each block is the same with respect to treatments. In balanced block designs, the treatment and block effects can be completely separated (are independent) . This greatly simplifies the interpretation of results.
As in CRD, randomisation is still a crucial component of the design. The difference is that now \(a\) treatments are assigned randomly to the \(a\) experimental units within a block, i.e. randomisation is not complete over ALL experimental units but restricted within each block. Within each block, the experimental units are equally likely to receive any of the \(a\) treatments. You can see this as CRD within each block!
Let’s see how we could randomise treatment within blocks using R. Imagine we had four treatments (A,B,C and D). We randomise the treatments to the units within one block like this:
<- 1:4
units rbind(sample(units,4), rep(c("A","B","C","D")))
[,1] [,2] [,3] [,4]
[1,] "3" "4" "2" "1"
[2,] "A" "B" "C" "D"
The third unit receive treatment A, the second receives B and so on. We then repeat this for every block.
The Pygmalion effect is a psychological phenomenon that suggests when people are held to high expectations, they tend to perform better. This applies to, for example, teachers and students, managers and employees or coaches and athletes. It is named after a mythological king of Cyprus, Pygmalion, who fell in love with a sculpture he created of his ideal woman.
This article explains the concept nicely and also briefly discusses the study we will use later. https://thedecisionlab.com/biases/the-pygmalion-effect
Many experiments have found results to support this type of self-fulfilling prophecy. Typically, they involve putting someone in charge of a group of people, then privately telling the leader that say a few of these people are exceptional (these people were randomly selected though). Then, later the performance of the group is measured and if the Pygmalion effect is present, the individuals who were marked as exceptional should have performed better.
Back in 1990, one researcher in this field, noticed that experiments like these might involve something that is called interpersonal contrasts. When some individuals are singled out for high expectations, others might feel neglected. This could potentially skew the results by making the others look good even though it was just the rest that performed poorly. The researcher wanted to conduct an experiment to test the Pygmalion effect without interpersonal contrasts.
They achieved this by applying the high expectation to an entire group and not selected individuals within a group. Let’s have a look the exact experiment!
A platoon is a military unit typically consisting of 30 to 50 soldiers, led by a platoon leader (usually a lieutenant). Several platoons form a company, which is a larger military unit consisting of three to five platoons, commanded by a company leader (usually a captain).
First things first! We need to determine the design so we can use the appropriate analysis. The researcher was interested in determining the effect of the Pygmalion effect on performance. This indicates to use that there is a single treatment factor and that it is whether or not the Pygmalion effect was applied (we’ll call this the Pygmalion Treatment) and the response is some measure of performance. The text gives four possible responses! The four areas in which the performance was tested. We’ll start with the first one as our response. So far we know:
The name of the treatment factor is not always obvious. It is usually something that describes the collection of similar treatments created in response to some research hypothesis or what has been manipulated. In biological or ecological studies, it can be quite clear. For example, if we had treatments high, medium and low rainfall, “Rainfall” is the variable we manipulated.
Also, as before, I’ve modified the example slightly. In the original study, there three platoons per company with two serving as control and one company only had two. So we have simplified the design so that it is balanced.
Now, the treatments were randomly assigned to platoons within a company. This gives away two things, (1) the experimental unit is an entire platoon and (2) treatments were assigned randomly within a company, i.e. a block! They were not interested in the effect of company on performance but merely wanted to account for possible differences between platoons in companies. So here Company is a blocking variable and the 10 companies are the blocks. Finally, on what was the response measured? The soldiers! They are then the observational units. The paper doesn’t state how many soldiers were in each platoon and it doesn’t really matter since the scores have to be combine to have one measurement per experimental unit.
Here is the final summary of the design:
You will typically have access to the data as well to help you identify some aspects of the design. There are two ways the data might be represented, in long format:
Company | Treat | Score |
---|---|---|
C1 | Pygmalion | 80.0 |
C1 | Control | 63.2 |
C2 | Pygmalion | 83.9 |
C2 | Control | 63.1 |
C3 | Pygmalion | 68.2 |
C3 | Control | 76.2 |
C4 | Pygmalion | 76.5 |
C4 | Control | 59.5 |
C5 | Pygmalion | 87.8 |
C5 | Control | 73.9 |
C6 | Pygmalion | 89.8 |
C6 | Control | 78.9 |
C7 | Pygmalion | 76.1 |
C7 | Control | 60.6 |
C8 | Pygmalion | 71.5 |
C8 | Control | 67.8 |
C9 | Pygmalion | 69.5 |
C9 | Control | 72.3 |
C10 | Pygmalion | 83.7 |
C10 | Control | 63.7 |
or in wide format:
Company | Pygmalion | Control |
---|---|---|
C1 | 80.0 | 63.2 |
C2 | 83.9 | 63.1 |
C3 | 68.2 | 76.2 |
C4 | 76.5 | 59.5 |
C5 | 87.8 | 73.9 |
C6 | 89.8 | 78.9 |
C7 | 76.1 | 60.6 |
C8 | 71.5 | 67.8 |
C9 | 69.5 | 72.3 |
C10 | 83.7 | 63.7 |
The long format represents each observation as a separate row, with treatments recorded in a single column. This format is useful for statistical modeling and visualization since it keeps data structured for comparisons across treatments. You might have noticed that so far all the data sets we have used when fitting models using aov
have been in long format. You will struggle to fit the model with a data set in wide format!
The wide format organizes data so that each unit (e.g., a company) appears in a single row, with treatments as separate columns. This format is often preferred for paired comparisons and summary tables.
Both formats contain the same information but serve different purposes depending on the type of analysis being performed.