ML Wiki
Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

Observational Studies

Observation Studies

There are two types of Data Collection

In ‘‘Observation Studies’’ we observe existing characteristics of a subset of individuals in a population

  • typically done via surveys, by following smb, etc
  • this method doesn’t directly interfere with how the data appear (in contrast to Statistical Experiments)

the goal is to

  • draw conclusions about the population or
  • find differences between 2 or more groups or
  • find out about the relationships between variables

Types

  • Prospective Study
    • collect the data as an event unfolds
  • Retrospective Study
    • use the data of some event that already took place

Finding Relationships

Types of variables:

  • outcome - the variables of our interest
  • explanatory - the variables that are used to analyze and explain the outcome

Types of Relationships

The relationships between the explanatory variable and the outcome

  • '’independent’’: there is no association between the variables
  • '’association’’: the variables are dependent, but it’s not clear what kind of relationship there is
    • '’causes’’: changes in the explanatory variables case the outcome to change
    • '’reverse causation’’: changes in outcome cause the explanatory variable to change
    • '’coincidence’’: just pure chance
    • '’common cause’’: some other variable causes both the explanatory variables and the outcome to change (see also Confounding Variables)

Correlation and Causation

  • with this type of studies it is possible to find association relationship between the variables
  • but it’s not possible to show the causation here - need to run a controlled Statistical Experiment for that
  • beware of Confounding Variables

Example

  • Suppose we run a sunscreen study and collected some data
  • We saw that the more sunscreen is used, the more chances to have skin cancer
  • does sunscreen causes the cancer?
  • cannot say it here because the study is observational - we didn’t run a controlled Statistical Experiment to make sure there are no other variables that might have caused it
  • e.g. in this case we don’t see the exposure to sun - it’s correlated with both sunscreen and cancer variables

Sources