07 Sampling using R

Margin of Error

It measures the maximum amount by which sample result is expected to differ from the population.

Confidence Interval

One of the biggest goal of statistics is to measure the population parameters based on sample statistic.

For example of population parameters:

  • Average income across India

  • Life expectancy across India

  • Support for incumbent party across the nation

  • Number of people watched olympics on TV

  • Feedback of customers for a given product

When you take a sample statistic such as the sample mean or sample percentage and add/subtract a margin of error, you come up with confidence interval.

The confidence interval depends on

  • Sample size

  • Data variability in the sample

  • Level confidence, generally speaking 95% confidence level is a good starting point.

Hypothesis Testing

A hypothesis test is a statistical procedure in which data are collected from a sample and measured against a claim about a population parameter.

For example, if a pizza delivery chain claims to deliver all pizzas within 30 minutes of placing the order, on average, you could test whether this claim is true by collecting a random sample of delivery times over a certain period and looking at the average delivery time for that sample. To make your decision, you must also take into account the amount by which your sample results can change from sample to sample (which is related to the margin of error).

A few techniques of hypothesis tests

  • T-tests: compares two population means)

  • Paired t-tests: looks for before/after data, and tests of claims made about proportions or means for one or more populations.

The claim on trial is called null hypothesis (H0) and the scenario at which the claim is untrue is called alternative hypothesis (HA).

H0: pizza delivery time is <= 30 mins

HA: pizza delivery time is > 30 mins

Another example, suppose you want to verify whether a coin in unbiased.

H0: p(Head) = 0.5

p-values:

The p-value is defined as the probability of obtaining a result equal to or "more extreme" than what was actually observed, when the null hypothesis is true. Being a probability, it ranges between 0 to 1.

Small p-value (p < cutoff): strong evidence against null hypothesis, so you reject the null hypothesis

Large p-value (p > cutoff): weak evidence against the null hypothesis, so you accept it.

Marginal p-value (p = cutoff):

Statisticians typically consider 0.01 as p-value cutoff.