# ML Wiki

## Statistical Experiments

There are two types of Data Collection

### Variables in Experiments

• Response variable (or dependent variable) - the outcome of interest, measured on each subject or entry participating in the study
• Explanatory variable (or predictor, or independent variable) - a variable that we think may help to explain the value of the response variable.

### Experiment

• Experiment - when a researcher manipulates the explanatory variable to see the effect on the response.
• So they create the data

### Correlation and Causation

• with this type of studies it is possible to show the causal relationship between the variables

Example

• Suppose we run a sunscreen study and collected some data
• We saw that the more sunscreen is used, the more chances to have skin cancer
• does sunscreen causes the cancer?
• cannot say it here because the study is observational
• e.g. in this case we don't see the exposure to sun - it's correlated with both sunscreen and cancer variables
• but if we do a randomized experiment, we can see if there's any causal relationship

## Randomized Experiments

Randomized Experiments

• individuals are assigned to groups
• researches assigns treatments to the groups
• typically assignment is done at random - which is why it's called "Randomized Experiments"

### Principles of Experimental Design

• Controlling
• Randomization
• Replication
• Blocking

### Controlling

• We want to see if there's any causal relationship between the variables
• so do the best to control any other difference in the group
• to make sure there's nothing else that might interfere with the experiment (no Confounding Variables)
• e.g. the exposure to sun in the previous example

Example

• specify that a pill must be taken with exactly 200ml glass of water
• not with a sip or 1 liter

### Randomization

• Assign cases to treatment groups at random
• This way accounting for variation that cannot be controlled by the researcher
• it keeps uncontrolled differences even and prevents from adding accidental Bias

### Replication

• Make sure the experiment may be run again and the findings can be replicated

### Blocking

• Researchers sometimes may suspect that some variables (not only treatment) may influence the response
• in such a case, group individuals into blocks and then randomize within the blocks
• this way ensuring that there's equal number of patients within each group

Example

• first divide patients into low-risk, mid-risk and high-risk groups
• then randomize within each risk group

## Reducing Bias

To reduce the bias in the human experiments, split the patients into two groups:

• treatment group - receives the medicine
• control group - receive placebo

### Double-Blind Setup

• but if a doctor knows that this patient is going to receive a placebo, it may impose some emotional effect on the doctors - it's difficult to quantify
• which is why both patients and the doctors are kept uninformed of what type of medicine they receive
• it's called a double-blind setup