Nayland College

Nayland College - Mathematics

Home . Year 9 . Year 10 . Level 1 . Level 2 . L3 Statistics . L3 Maths . L3 Calculus . About . Links

NZAMT NZQA NZ Grapher NZ Maths Census at School Study It

3.9 Bivariate HOME | Achievement Objectives | Overview | Data sets & Variable Types | Introduction | Scatter plots | Excel | iNZight | Correlation Coefficient & Linear Model | The effect of Groups & Unusual Values | Predictions | Causality | Non-Linear Models | Discussion & Conclusion | Report Writing

Correlation Coefficient 'r' and The Linear Model

7

Correlation Coefficient

We can see by looking at the graph whether there is a strong or weak correlation between two variables, and whether that correlation is positive or negative. However, there is a mathematical way of working it out, and that is to calculate the correlation coefficient.

The correlation coefficient 'r' is a measure of the strength and direction of the linear association between two quantitative variables.

This is also known as Pearson's Correlation Coefficient, (Karl Pearson 1857-1936) represented by the letter r, and it is a single number which ranges from -1 (strong negative correlation) to +1 (strong positive correlation).

Correlation coefficients which are close to -1 or +1 indicate a strong correlation. Values close to 0 indicate a weak correlation, with 0 itself indicating no correlation at all.

(Java app) Regression by eye A scatter plot is displayed and you can draw in regression lines by hand. You can then compare your lines to the best least squares fit. You can also try to guess the correlation coefficient, r. (link to www.ruf.rice.edu) (or link) another version and another

(Java app) Guess the correlation coefficient competition Four scatter plots and 4 correlation coefficients and your task is to match the coefficients to the plots. New plots can be generated and a running score is kept.

 

Line of best fit

If appropriate a line of best fit can be drawn through the points on a scatter plot.

Linear Regression is the process for fitting the line (least squares regression) Technology easily fits the line of best fit.
How to do this in iNZight or EXCEL

Visually judge the fit of the line to the data (Discuss in context)

Discuss what the linear model represents and what the gradient indicates (Discuss in context)
eg:
"The regression model equation indicates that the energy content increases at a rate of 64.4kj for every 1 gram increase in fat.
The model predicts energy content (kj) = 64.4 x fat content (g) + 545kj"

 

Starter 3

Class notes, Class notes
Class site


p53 Ex 'D' Drawing Scatter plots
p63 Ex 'H' Interpreting 'r'

McDonalds Example: Scatter plots

McDonalds Example: Regression

EXTENSION:
Find out about the Coefficient of Determination R2 which is the % of the variation in 'y' which can be explained by the model (including non-linear models)

More on Correlation coefficient (link to jerrydallal.com)

Wikipedia information

link to jerrydallal.com with a series of scatterplots with r = 0.7

 

 

 

Quizlet on Scatterplots & Correlation

(extension) Calculation and graph of Residuals.

Class Exemplar:
Hawai'i Island Chain: Data csv, Information page |
Google Doc write up

Class Exemplar:
American New Cars 1993
Data csv, Information pdf |
Google Doc write up

Booklet pg6

 

Key Concepts: correlation coefficient 'r'

  1. Correlation Coefficient 'r' has no units

  2. It is only designed to measure linear relationship (It is NOT appropriate for curved relationships/models)

  3. Scaling data has no effect on 'r'

  4. The order of the data does not effect 'r'

  5. The order of the variables has no effect on 'r'

  6. Both variables must be quantitative

  7. Correlation coefficient is NOT resistant to outliers (see outliers)

 

Always plot the data and decide VISUALLY, before rushing into linear model and 'r' calculation!

 

 

No linear relationship, but
there is a relationship! 

Reasonable linear relationship, but

there is a better non-linear relationship!

Pearson product-moment correlation coefficient

Which is 'obviously' the same as....

This can be used to calculate 'r'

eg. The old percentage assessment system

 

Student

Stats
x

Calc
y

xy

x2

y2

Bill

72%

65%

     
Ted

58%

52%

     
B Jelly

85%

90%

     
D Boot

12%

8%

     
D Mouse

34%

41%

     
Jim

25%

28%

     

Σ

         

 

 

 

 

 

 

Or it is much easier to use the EXCEL CORREL function

 

Activity: Construct a scatter plot of the data

Form an 'aim for an investigation' relating to the data

Use the 'correl' Excel function to find the correlation coefficient for the scatter plot (learn how to use the correlation function)

Correl Function Spreadsheet to check out (sigma Ex13.02 #3)

Describe the relationship between the variables including the 'r' value

back to top

Adding a trend line in iNZight

Make a scatterplot in iNZight

(Achieve) Add a linear trend line

To add the line of fit:

'Add to plot'
'Trend Curves'
'OK'
'Linear'

To add the equation of the line of fit:

 

(Merit) Add non linear trend lines (find out more)

Comparing groups

 

back to top

 

McDonalds Example: Scatterplots

The scatterplot of the energy content verses the fat content indicates that the higher the fat content (explanatory variable) the greater the energy content of a product (dependant variable).

The correlation coefficient of 0.9456 indicates a very strong positive correlation between fat content (g) and energy content (kj)

There is a reasonably even scatter of the data with one possible outlier of the 'chocolate sundae' (4.5g fat, 1200kj energy)

The scatterplot of the energy content verses the carbohydrate content indicates that the higher the carbohydrate content (explanatory variable) the greater the energy content of a product (dependant variable).

The correlation coefficient of 0.6175 indicates a positive correlation between carbohydrate content (g) and energy content (kj)

There is an uneven scatter of data, with data values above 40g of carbohydrate having a greater scatter from the positive trend than those below

There is a stronger relationship between the fat content and energy content, than between the carbohydrate content and the energy content.

This would indicate that the fat content is a better indicator for energy content (as expected because fat is a more concentrated form of energy)

 

Sigma practice


Ex 13.02 pg 257

Ex 13.03 pg 262

#1) Data Set
#2) Data Set
#3) Data Set
#4) Data Set
#6) Data Set
#7) Data Set

 

Sigma Ex 13.03
Q1 Ans
Q2 Ans
Q3 Ans
Q4 Ans
Q5 Ans
Q6 Ans
Q7 Ans

 

back to top