# ML Wiki

## Sample Size

When we need to estimate the size of a sample?

## Confidence Intervals

### Binomial Proportion Confidence Intervals

The best $n$?

• CI for $p$
• $\beta = z_{\alpha/2} \sqrt{p(1-p)/n}$,
• we want margin error be $\beta = 0.03$
• for 95% CI $\alpha = 0.025$

We plug everything in and calculate

• $0.03 = 1.96 \sqrt{p(1-p)/n}$
• use $p = 0.5$ (it's the worst-case scenario)
• $n = \left(\cfrac{1.96 \cdot 0.5}{0.03}\right)^2 \approx 1067$

What if we want margin error 0.01?

• $0.01 = 1.96 \sqrt{0.5 \cdot 0.5 / n}$ or
• $n \approx 9604 = 9 \cdot 1067$ !

To cut the margin error in half we need 4 times bigger sample size!

• So lovering the margin is expensive

What if we want 99% CI instead of 95%?

• $z_{\alpha/2} = z_{0.005} \approx 2.576$
• $0.03 = 2.576 \sqrt{0.5 \cdot 0.5 / n}$
• $n \approx 1843$

For 90% CI $n \approx 753$

### R

Example 1

p = 0.5
ME = 0.03
z = qnorm(0.025, mean=0, sd=1, lower.tail=F)
n = z^2 * p * (1 - p) / ME^2


Example 2:

• What sample size is needed to attain a margin error of 0.5% for 99% CI?
p = 0.5 // worst-case estimate
ME = 0.005
cl = 0.99
al = (1 - cl) / 2

z = qnorm(al, mean=0, sd=1, lower.tail=F)
n = z^2 * p * (1 - p) / ME^2


## Controlling False Negatives

Sample Size controls Type II Errors - False Negatives

### $Z$ Statistics for Means

Suppose we want to have a 95% confidence interval

• $Z = 1.96$
• $\text{ME}_{0.95} = Z \cdot \text{SE} = 1.96 \cfrac{\sigma}{\sqrt{n}}$
• we want $\text{ME} \leqslant 4$
• so $1.96 \cdot \cfrac{\sigma}{\sqrt{n}} \leqslant 4$, and we want to get $n$ from this inequality
• NOTE: need to know $\sigma$, otherwise we should use $T$ statistics instead of $Z$ and estimate $\sigma$ by $s$
• e.g. suppose that we know that the whole country $\sigma$ is 25, so it might be a good estimate for $\sigma$ within a company

We get:

• $1.96 \cdot \cfrac{\sigma}{\sqrt{n}} \approx 1.96 \cdot \cfrac{25}{\sqrt{n}}$
• $1.96 \cdot \cfrac{25}{4} \leqslant \sqrt{n}$
• $\left( 1.96 \cdot \cfrac{25}{4} \right)^2 \leqslant n$
• $150.06 \leqslant n$

$\Rightarrow$ we need $n \geqslant 151$ to have ME of 4