Simulation For Proportions

r simulations statistical-tests statistics

Sometimes Statistical Inference can be done without applying theoretical models, but instead with using brute force: generating the data ourselves.

Consider Proportions test

there are a set of assumptions that have to be met to use the Normal Approximation
what if one of them is not met, e.g. Success-Failure condition?
use Exact Binomial Proportion Tests - apply the Binomial Model directly
or simulate draws from the binomial model and obtain the Sampling Distribution (or the null distribution)

One-Sample Test

It’s the same as One-Sample test for the normal approximation models:

we have a sample and want to check if the true proportion parameter agree with some hypothetical parameter $p_0$
and then we want to check if the data we observed align with this hypothesis

Test

But instead of using some theoretical model,

Consider the following example:

medical consultant helps patients
he claims that with his help the ratio of complications is lower than usually
- (i.e. lower than 0.10)
is it true?

We want to test a hypothesis:

$H_0: p_A = 0.10$ - ratio of complications without a specialist
$H_A: p_A < 0.10$ - specialist helps, the complications ratio is lower than usual
note that we can’t really check the claim because we have Observational Studies - to really check the claim we need to conduct a Statistical Experiment

Observed data:

Normal Model

the Success-Failure condition is not met: $p_A \cdot 62 = 0.10 \approx 6.2 < 10$
- under $H_0$ we’d expect to see only 6.2 complications
thus cannot use Normal Approximation and perform a Binomial Proportion Test

What we can do?

There is still a way to evaluate the $p$-value for this $p_A = 0.10$ - via simulations
Simulate many draws from the population and build a Sampling Distribution (under $H_0$)
then compute the probability of observing such $\hat{p}$ in this distribution

Test

Simulation

repeat many times (e.g. 5-10k) to build a Sampling Distribution
- draw a sample from the Binomial Distribution with $p=0.10$ and $n=62$
- calculate $\hat{p}_\text{sim}$ from this sample
draw a histogram
and shade bars that support the $H_A$ - ones with $hat{p}_\text{sim} < 0.048$
the shaded area represents the $p$-value - the probability of observing such small $\hat{p}$ only due to chance

This is the histogram of the Sampling Distribution we obtained:

From 10k draws 487 turned out to be below $\hat{p}$

which means $p$-value is $487/10000 = 0.0487 < 0.05$
so we reject $H_0$ in favor of $H_A$ and conclude that there’s indeed some relation between the participation of the consultant and the complications ratio

R code:

```text only n = 62 p = 0.10 m = 10000

set.seed(31313) samp.dist = rbinom(n=m, size=n, prob=p) / n

p.hat = 0.048 sum(samp.dist <= p.hat) p.val = sum(samp.dist <= p.hat) / length(samp.dist) p.val

ac = cut(samp.dist, breaks=18) means = tapply(samp.dist, ac, mean) levels(ac) = round(means, digits=3)

tbl = table(ac) / length(samp.dist) tbl cl = rep(‘grey’, length(tbl)) cl[1:4] = ‘black’

barplot(tbl, col=cl, las=2) ```

✏️ Edit on GitHub