Nayland College - Mathematics Home . Year 9 . Year 10 . Level 1 . Level 2 . L3 Statistics . L3 Maths . L3 Calculus . About . Links

2.9 Inference Revision

13

Practice Example

Steps:
1) Check out the variables & population information
2) Upload the data set into iNZight and investigate
3) Decide on your variable & groups
4) Make a copy of the guided google doc write up and do your introduction
5) Use iNZight to make a random sample, create a boxplot & sample statistics
7) Discuss your boxplot and dot plot
8) Discuss sample variability
9) Use your sample statistics to calculate an informal confidence interval
10) Discuss your informal confidence interval
11) Write a conclusion

Then get some feedback and sort out what you don't know before the actual assessment

Auckland University Student Survey:
Data set, Variable Info
The data was collected on an internet-based survey on Stage 1 students at Auckland University in 2009, involving weights, shoes, work, drinking etc

Qualifications & Work Practice:
Data set, Variable Info
This is 2,000 records from Statistics New Zealand. Data sampled from this database can be treated as if it was sampled from “the New Zealand population aged between 25 and 64 years of age who participate in paid work” in 2003.

SURF Households Savings Survey
Data set, Variable Info
The 2001 HSS was a cross-sectional nationwide survey that collected information on the net worth (assets minus liabilities) of New Zealanders.

Practice Assessments

2.9 A (Word, 135 KB) | 2.9 B (Word, 291 KB) | 2.9 D (Word, 157 KB)

Annotated student responses with explanatory notes (link to NZAQ)

NOTE: these NZQA exemplars have not been corrected for the latest information on the NZQA clarifications page

Inference (2.9A) - ASSESSMENT RESOURCE A
Inference (2.9B) - ASSESSMENT RESOURCE B

Read over, make notes and learn what is expected, but don't take them as fully correctly graded.

POSSIBLE WRITE UP

Problem/Question

Formed a Comparative Research Question

I wonder if the median number of hours worked per week of the male New Zealand workforce tends to be greater than the median hours worked per week of the female workforce in NZ.

Variables defined with units

Hours worked are obviously measured in hours

Population defined (and any assumptions)

Even though it is not specified we assume that the data is collected form a range of people in the NZ workforce.

A prediction is made: ‘I think that...’

I think that males tend to work a greater number of hours a week than females.

Reason why making your prediction: Why ‘I think that...’

This is probably because females tend to have more time taken up by children so cannot work as many hours each week. More mothers may be involved in part time work.

Plan

Use iNZight to analyse and display and put on informal confidence interval bars (save the display image)

Use the sample statistics to calculate the Informal confidence interval.

Data

Did you have to ‘clean’ the data’ (discuss)

I checked to see if there was any missing data or abnormal values (such as work time in minutes instead of hours)

Comment on the sampling method,

I took a simple random sample from the population. Each member of the population was allocated a random number, then the data was sorted into gender groups then by random number. I then took the first 30 makes and the first 30 females as my sample. This ensured each member of the population had an equal chance of being selected.

Comment on the effect of sample size.

A sample of 30 values from each group (mates and females) is the minimum number needed so that the sample is representative of the population

Analysis

 minimum LQ median UQ maximum IQR mean Sample size female 2 15 36.5 48.75 70 33.75 31.13 30 male 6 20 32 40 70 20 30.63 30

Informal Confidence Interval for male workers is
36.5 +/- 1.5*33.75/sqrt(30) = 27.2 to 45.7 hours per week

Informal Confidence Interval for female workers is
32 +/- 1.5*20/sqrt(30) = 26.5 to 37.5 hours per week

Discussing of sampling variability, including the variability of estimates (ESSENTIAL for Achieve)

If I repeated the sampling process the sample (and box plot & summary statistics) would be different. This is because a sample is a random group from the population. Because my sample size is 30 the sample should represent the population reasonably well, so even though different samples will be different from each other, generally the samples will be similar (and similar box plots & statistics)

Discussion of sample statistics comparing two samples.

From the sample - the mean hours worked per week for females (31.1 hr) is slightly greater than the male mean hours worked per week (30.6hr) although the difference is not great.

From the sample - the median hours worked per week for females (36.5 hr) is greater than the male median hours worked per week (32hr) indicating that females could work for longer per week than males

Discussion of each box plot/dot plot ‘outliers & extreme values’

There are no outliers or extreme data values from either sample group (male & female) as the maximum (& minimum) data values are within 1.5 box widths above (& below) the upper (& lower) quartile

Discussion of box plots ‘shift & overlap & middle 50%’

The is a lot of overlap between the male middle 50% box in the box plot and the female middle 50%. In fact the male 50% is contained within the female upper & lower Quartiles. This shows little difference in the number of hours worked per week for males & females.

There is little shift when comparing the two box plots.

The middle 50% of the female hours worked per week does seem more spread out than the male hours worked (Inter quartile range of 33.7 hr for females & 20 for males) This indicates greater variation in the number of hours worked by females with some part time workers and some working long hours.

Discussion of each box plot/dot plot ‘distribution, groups, cluster, gaps, symmetry, skew’ and possible causes

There is a group of males who work 40hr per week which is not surprising as the traditional working week is 40hr. There is also a reasonably even distribution form 10 to 40 hours for male workers, with a few working long hours each week (up to 70 hr)

The females seem more grouped, with a group working part time (2 to 10 hr per week) who maybe working mothers.

There is another group of females who work about 20 hr per week. these could be mothers with kids in school so they can work during school hours.

There is another group of females working 40 to 60 hr per week who are full time workers.

Discussion of what each informal confidence interval means (in context & referring back to the population and how sure you are)

From the informal confidence interval we can be reasonably sure that the median for the whole population of males is between 27.2hr and 45.7 hours work per week (for workers in NZ)

From the informal confidence interval we can be reasonably sure that the median for the whole population of females is between 27.2 to 45.7 hours work per week (for workers in NZ)

Discussion of informal confidence interval overlap / or not (in context & referring back to the population and how sure you are)

Because these informal confidence intervals overlap we have no evidence that suggests the median hours worked per week for males is different from the median hours worked per week for females.

Conclusion

I thought that males would tend to work for a greater number of hours a week than females. From my sample and analysis we cannot make this conclusion. There seems to be no measurable difference in the medians between the males & females, although there is some indication that females have more spread in the number of hours worked per week - than males.

Discuss errors, bias, omissions, improvements, further research...

Sample Size
As my sample for each group was 30 people the Informal confidence interval was quite wide, and as a result I could not determine if there was a difference between them. If I repeated the sampling with a larger sample size - say 100 males & 100 females, then the informal confidence interval would be narrower for each population median estimate. This may then give a result where there is no overlap between the informal confidence intervals, and so we could be reasonably sure the two population medians would be different

Sampling
There are other sampling methods that I could have used. The first one is cluster sampling. You sample different levels of groups, for example select a region to sample then a school to sample. The problem with cluster sampling is that it may not represent the whole population if the sample happens to choose the same things, and therefore leaves something out. Systematic sampling is where you make a system to how you sample. For example, you decide to select every fourth student in a school when they are lined up. The disadvantage of this, is that it is time consuming. Stratified sampling is where a population has sub groups that could easily be missed out, so we randomly sample from each group. The desision as to sampling method is more related to the issues with the population we were provided with - see below.

Population
There is no indication how the population provided was obtained. The population was not all the adult workforce in New Zealand, so we can assume it was also a sample from the actual whole population. How was this sample obtained? Is it representative of the whole country? If there is a bias or the sample is not representative then my analysis will be biased as well.