Statistical Experiments

There are two types of Data Collection

Variables in Experiments

  • Response variable (or dependent variable) - the outcome of interest, measured on each subject or entry participating in the study
  • Explanatory variable (or predictor, or independent variable) - a variable that we think may help to explain the value of the response variable.


  • Experiment - when a researcher manipulates the explanatory variable to see the effect on the response.
  • So they create the data

Correlation and Causation

  • with this type of studies it is possible to show the causal relationship between the variables


  • Suppose we run a sunscreen study and collected some data
  • We saw that the more sunscreen is used, the more chances to have skin cancer
  • does sunscreen causes the cancer?
  • cannot say it here because the study is observational
  • e.g. in this case we don't see the exposure to sun - it's correlated with both sunscreen and cancer variables
  • but if we do a randomized experiment, we can see if there's any causal relationship

Randomized Experiments

Randomized Experiments

  • individuals are assigned to groups
  • researches assigns treatments to the groups
  • typically assignment is done at random - which is why it's called "Randomized Experiments"

Principles of Experimental Design

  • Controlling
  • Randomization
  • Replication
  • Blocking


  • We want to see if there's any causal relationship between the variables
  • so do the best to control any other difference in the group
    • to make sure there's nothing else that might interfere with the experiment (no Confounding Variables)
    • e.g. the exposure to sun in the previous example


  • specify that a pill must be taken with exactly 200ml glass of water
  • not with a sip or 1 liter


  • Assign cases to treatment groups at random
  • This way accounting for variation that cannot be controlled by the researcher
  • it keeps uncontrolled differences even and prevents from adding accidental Bias


  • Make sure the experiment may be run again and the findings can be replicated


  • Researchers sometimes may suspect that some variables (not only treatment) may influence the response
  • in such a case, group individuals into blocks and then randomize within the blocks
  • this way ensuring that there's equal number of patients within each group


  • first divide patients into low-risk, mid-risk and high-risk groups
  • then randomize within each risk group

Reducing Bias

To reduce the bias in the human experiments, split the patients into two groups:

  • treatment group - receives the medicine
  • control group - receive placebo

Double-Blind Setup

  • but if a doctor knows that this patient is going to receive a placebo, it may impose some emotional effect on the doctors - it's difficult to quantify
  • which is why both patients and the doctors are kept uninformed of what type of medicine they receive
  • it's called a double-blind setup