Goal of Inferential Statistics - to make conclusion about the whole population based on a sample
The sampling distribution represents the distribution of point estimates based on samples of fixed size from the same population
set.seed(134) rbinom(10, size=10, prob=0.5)
We get different results each time:
Trial | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
Outcome | 4 | 6 | 7 | 4 | 5 | 3 | 4 | 6 | 3 | 6 |
Since we know that theoretically this Random Variable follows Binomial Distribution, we can model the sampling distribution as
d = dbinom(1:10, size=10, prob=0.5) bp = barplot(d) axis(side=1, at=bp[1:10], labels=1:10)
This sampling distribution is used for Binomial Proportion Confidence Intervals and for Binomial Proportion Test
load(url('http://s3.amazonaws.com/assets.datacamp.com/course/dasi/ames.RData')) area = ames$Gr.Liv.Area sample_means50 = rep(NA, 5000) for (i in 1:5000) { samp = sample(area, 50) sample_means50[i] = mean(samp) } hist(sample_means50, breaks=13, probability=T, col='orange', xlab='point estimates of mean', main='Sampling distribuion of mean')
There's another example that shows that the more data we have, the more accurate our point estimates are
library(openintro) data(run10Samp) time = run10Samp$time avg = sapply(X=1:100, FUN=function(x) { mean(time[1:x]) }) plot(x=1:100, y=avg, type='l', col='blue', ylab='running mean', xlab='sample size', bty='n') abline(h=mean(time), lty=2, col='grey')
So it illustrates that the more sample size is, the better we can estimate the parameter