Statistical Experiments
There are two types of Data Collection
Variables in Experiments
- Response variable (or dependent variable) - the outcome of interest, measured on each subject or entry participating in the study
- Explanatory variable (or predictor, or independent variable) - a variable that we think may help to explain the value of the response variable.
Experiment
- Experiment - when a researcher manipulates the explanatory variable to see the effect on the response.
- So they create the data
Correlation and Causation
- with this type of studies it is possible to show the causal relationship between the variables
Example
- Suppose we run a sunscreen study and collected some data
- We saw that the more sunscreen is used, the more chances to have skin cancer
- does sunscreen causes the cancer?
- cannot say it here because the study is observational
- e.g. in this case we don't see the exposure to sun - it's correlated with both sunscreen and cancer variables
- but if we do a randomized experiment, we can see if there's any causal relationship
Randomized Experiments
Randomized Experiments
- individuals are assigned to groups
- researches assigns treatments to the groups
- typically assignment is done at random - which is why it's called "Randomized Experiments"
Principles of Experimental Design
- Controlling
- Randomization
- Replication
- Blocking
Controlling
- We want to see if there's any causal relationship between the variables
- so do the best to control any other difference in the group
- to make sure there's nothing else that might interfere with the experiment (no Confounding Variables)
- e.g. the exposure to sun in the previous example
Example
- specify that a pill must be taken with exactly 200ml glass of water
- not with a sip or 1 liter
Randomization
- Assign cases to treatment groups at random
- This way accounting for variation that cannot be controlled by the researcher
- it keeps uncontrolled differences even and prevents from adding accidental Bias
Replication
- Make sure the experiment may be run again and the findings can be replicated
Blocking
- Researchers sometimes may suspect that some variables (not only treatment) may influence the response
- in such a case, group individuals into blocks and then randomize within the blocks
- this way ensuring that there's equal number of patients within each group
Example
- first divide patients into low-risk, mid-risk and high-risk groups
- then randomize within each risk group
Reducing Bias
To reduce the bias in the human experiments, split the patients into two groups:
- treatment group - receives the medicine
- control group - receive placebo
Double-Blind Setup
- but if a doctor knows that this patient is going to receive a placebo, it may impose some emotional effect on the doctors - it's difficult to quantify
- which is why both patients and the doctors are kept uninformed of what type of medicine they receive
- it's called a double-blind setup
Sources