# ML Wiki

## Simulation For Proportions

Sometimes Statistical Inference can be done without applying theoretical models, but instead with using brute force: generating the data ourselves.

Consider Proportions test

## One-Sample Test

It's the same as One-Sample test for the normal approximation models:

• we have a sample and want to check if the true proportion parameter agree with some hypothetical parameter $p_0$
• and then we want to check if the data we observed align with this hypothesis

Test

• $H_0: p = p_0$
• $H_A: p \ne p_0$ or $H_A: p < p_0$ or * $H_A: p > p_0$
• $p$ - the true proportion, $p_0$ - the null value

But instead of using some theoretical model,

• we ourselves generate the null distribution
• and then see how unusual the observed value is w.r.t. the generated null distr.

### Example

Consider the following example:

• medical consultant helps patients
• he claims that with his help the ratio of complications is lower than usually
• (i.e. lower than 0.10)
• is it true?

We want to test a hypothesis:

• $H_0: p_A = 0.10$ - ratio of complications without a specialist
• $H_A: p_A < 0.10$ - specialist helps, the complications ratio is lower than usual
• note that we can't really check the claim because we have Observational Studies - to really check the claim we need to conduct a Statistical Experiment

Observed data:

• 3 complications in 62 cases
• $\hat{p} = 0.048$
• is it only due to chance?

Normal Model

What we can do?

• There is still a way to evaluate the $p$-value for this $p_A = 0.10$ - via simulations
• Simulate many draws from the population and build a Sampling Distribution (under $H_0$)
• then compute the probability of observing such $\hat{p}$ in this distribution

Test

• Assume that the help of the specialist gives nothing
• i.e. 10% of cases will still have complications
• under this assumptions we try to simulate 62 clients

Simulation

• repeat many times (e.g. 5-10k) to build a Sampling Distribution
• draw a sample from the Binomial Distribution with $p=0.10$ and $n=62$
• calculate $\hat{p}_\text{sim}$ from this sample
• draw a histogram
• and shade bars that support the $H_A$ - ones with $hat{p}_\text{sim} < 0.048$
• the shaded area represents the $p$-value - the probability of observing such small $\hat{p}$ only due to chance

This is the histogram of the Sampling Distribution we obtained:

From 10k draws 487 turned out to be below $\hat{p}$

• which means $p$-value is $487/10000 = 0.0487 < 0.05$
• so we reject $H_0$ in favor of $H_A$ and conclude that there's indeed some relation between the participation of the consultant and the complications ratio

R code:

n = 62
p = 0.10
m = 10000

set.seed(31313)
samp.dist = rbinom(n=m, size=n, prob=p) / n

p.hat = 0.048
sum(samp.dist <= p.hat)
p.val = sum(samp.dist <= p.hat) / length(samp.dist)
p.val

ac = cut(samp.dist, breaks=18)
means = tapply(samp.dist, ac, mean)
levels(ac) = round(means, digits=3)

tbl = table(ac) / length(samp.dist)
tbl
cl = rep('grey', length(tbl))
cl[1:4] = 'black'

barplot(tbl, col=cl, las=2)