Observation Studies

There are two types of Data Collection

In Observation Studies we observe existing characteristics of a subset of individuals in a population

  • typically done via surveys, by following smb, etc
  • this method doesn't directly interfere with how the data appear (in contrast to Statistical Experiments)

the goal is to

  • draw conclusions about the population or
  • find differences between 2 or more groups or
  • find out about the relationships between variables


  • Prospective Study
    • collect the data as an event unfolds
  • Retrospective Study
    • use the data of some event that already took place

Finding Relationships

Types of variables:

  • outcome - the variables of our interest
  • explanatory - the variables that are used to analyze and explain the outcome

Types of Relationships

The relationships between the explanatory variable and the outcome

  • independent: there is no association between the variables
  • association: the variables are dependent, but it's not clear what kind of relationship there is
    • causes: changes in the explanatory variables case the outcome to change
    • reverse causation: changes in outcome cause the explanatory variable to change
    • coincidence: just pure chance
    • common cause: some other variable causes both the explanatory variables and the outcome to change (see also Confounding Variables)

Correlation and Causation

  • with this type of studies it is possible to find association relationship between the variables
  • but it's not possible to show the causation here - need to run a controlled Statistical Experiment for that
  • beware of Confounding Variables


  • Suppose we run a sunscreen study and collected some data
  • We saw that the more sunscreen is used, the more chances to have skin cancer
  • does sunscreen causes the cancer?
  • cannot say it here because the study is observational - we didn't run a controlled Statistical Experiment to make sure there are no other variables that might have caused it
  • e.g. in this case we don't see the exposure to sun - it's correlated with both sunscreen and cancer variables