Nayland College - Mathematics
3.10 Inference HOME | Achievement Objectives | Overview | Statistical Cycle | Introduction & Research Question | Discussing Samples | Sample Variability | Sampling Methods | Informal Confidence Interval | Bootstrapping Confidence Interval | Comparing two populations | Using iNZight | Writing a report | Assessment Practice
Dolphins video looking at identifying new sub species
Reminder of showing all components in statistics cycle (required for assessment)
Overview of using a given sample from an unknown population to make inferences about the population
Writing a Research Question - practice
3.10 Inference Video 1 - Getting Started by Jeremy Brocklehurst
Otago University video series Maui Dolphin subspecies activity
The key difference from AS 2.9 is that we do not know the population, we are only dealing with a sample.
Often in the media there are claims that two groups are different. "Best Buy pies have less fat than Nayland canteen pies"
To mathematically test such a claim we need to investigate two populations (Best Buy Pies & Nayland canteen pies) with the variable 'fat content'
The fat content of all pies in the two groups (populations) cannot be measured so we need to take a sample from each population. We also want to take as small a sample as possible (as the pies will be destroyed in the testing process)
We take a sample from each group and measure the fat content of each pie. Even though these samples would vary each time we can use the sample information to produce an estimate of the population median for each group. We cannot be exactly sure of the population median but we can get a range or interval within which we can be reasonably sure the population median would lie. This is called the confidence interval for the population median.
At Level 2 we used an approximate method for producing the informal confidence interval for an estimate of the actual population median. AS 2.9
At Level 3 we will use a better method called bootstrapping (re-sampling) to produce an approximate 95% confidence interval for the population median. This process involves repeated re-sampling from a sample and does not require a sample of over 30. Because the re-sampling is repeated many times (1000) we need the use of computer software such as iNZight (free download)
Producing a confidence interval for the difference between two population medians is a way to mathematically make comparisons between two unknown population medians (an inference), and be able to test the claim above.
This process will also involve the PPDAC statistical cycle
Populations usually have unknown parameters (unknown mean, median, standard deviation etc) because the population is too large or difficult to measure, or the measurement of items require their destruction or death
Each time we sample from a population our sample varies, and we get different sample statistics (mean median etc) and sample displays (box plots) We can try to use these to estimate the true population parameters, but sampling errors can be large (especially for small sample sizes)
We can use our sample to construct a confidence interval within which we can be reasonably sure the actual population mean (or median) will lie. Even though every sample taken will vary, and every confidence interval will be different overall most (95%) or the confidence intervals will contain the true population mean (or median)
A larger sample results in a narrower confidence interval.
The greater the sample side the better our estimate of the population mean (or median)
A confidence interval does NOT correct for poor sampling methods, biased samples, or samples that do not represent the population well or contain errors. If the sample data is poor, the resulting confidence interval will be poor
To compare two populations to determine if the population means are different we construct a confidence interval for each population. If the confidence intervals overlap then we cannot say the population means are different. If the confidence intervals do not overlap then we can be reasonably sure that the population means are different.