07 Sampling using R
Margin of Error
It measures the maximum amount by which sample result is expected to differ from the population.
Confidence Interval
One of the biggest goal of statistics is to measure the population parameters based on sample statistic.
For example of population parameters:
Average income across India
Life expectancy across India
Support for incumbent party across the nation
Number of people watched olympics on TV
Feedback of customers for a given product
When you take a sample statistic such as the sample mean or sample percentage and add/subtract a margin of error, you come up with confidence interval.
The confidence interval depends on
Sample size
Data variability in the sample
Level confidence, generally speaking 95% confidence level is a good starting point.
Hypothesis Testing
A hypothesis test is a statistical procedure in which data are collected from a sample and measured against a claim about a population parameter.
For example, if a pizza delivery chain claims to deliver all pizzas within 30 minutes of placing the order, on average, you could test whether this claim is true by collecting a random sample of delivery times over a certain period and looking at the average delivery time for that sample. To make your decision, you must also take into account the amount by which your sample results can change from sample to sample (which is related to the margin of error).
A few techniques of hypothesis tests
T-tests: compares two population means)
Paired t-tests: looks for before/after data, and tests of claims made about proportions or means for one or more populations.
The claim on trial is called null hypothesis (H0) and the scenario at which the claim is untrue is called alternative hypothesis (HA).
H0: pizza delivery time is <= 30 mins
HA: pizza delivery time is > 30 mins
Another example, suppose you want to verify whether a coin in unbiased.
H0: p(Head) = 0.5
p-values:
The p-value is defined as the probability of obtaining a result equal to or "more extreme" than what was actually observed, when the null hypothesis is true. Being a probability, it ranges between 0 to 1.
Small p-value (p < cutoff): strong evidence against null hypothesis, so you reject the null hypothesis
Large p-value (p > cutoff): weak evidence against the null hypothesis, so you accept it.
Marginal p-value (p = cutoff):
Statisticians typically consider 0.01 as p-value cutoff.