AS 91581 Achievement Standard.

Slides:



Advertisements
Similar presentations
Literacy and thinking bivariate data investigations
Advertisements

Bivariate Data Report Structure.
Bi-Variate Data PPDAC. Types of data We are looking for a set of data that is affected by the other data sets in our spreadsheet. This variable is called.
What does integrated statistical and contextual knowledge look like for 3.10? Anne Patel & Jake Wills Otahuhu College & Westlake Boys High School CensusAtSchool.
Correlation and Linear Regression
Slide 1 Investigating Bivariate Measurement Data using iNZight Statistics Teachers’ Day 22 November 2012 Ross Parsonage.
Jared Hockly - Western Springs College
Mathematics and Statistics A look at progressions in Statistics Jumbo Day Hauraki Plains College 15 th June 201 Sandra Cathcart.
Chapter 4 The Relation between Two Variables
Analysis. Start with describing the features you see in the data.
Experiments. Types of experiments ‘so far’ Paired comparison Happy experiment watching Goon video Two independent groups Different treatments for each.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Section 4.2 Fitting Curves and Surfaces by Least Squares.
Level 1 Multivariate Unit
Describing the Relation Between Two Variables
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Excellence Justify the choice of your model by commenting on at least 3 points. Your comments could include the following: a)Relate the solution to the.
Business Statistics - QBM117 Least squares regression.
Chapter 7 Scatterplots, Association, Correlation Scatterplots and correlation Fitting a straight line to bivariate data © 2006 W. H. Freeman.
Bivariate Data Analysis Bivariate Data analysis 3.
HAWKES LEARNING SYSTEMS math courseware specialists Discovering Relationships Chapter 5 Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Summarizing Bivariate Data
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.1 Scatterplots.
Time series Model assessment. Tourist arrivals to NZ Period is quarterly.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Chapter 12: Correlation and Linear Regression 1.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster.
Yr 7.  Pupils use mathematics as an integral part of classroom activities. They represent their work with objects or pictures and discuss it. They recognise.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Lecture 8 Sections Objectives: Bivariate and Multivariate Data and Distributions − Scatter Plots − Form, Direction, Strength − Correlation − Properties.
Investigating Bivariate Measurement Data using iNZight.
3.9 BIVARIATE DATA. Vocab ARMSPAN V’S HEIGHT Give data to me when finished.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Predictions 3.9 Bivariate Data.
Chapter 3: Describing Relationships
Understanding Standards Event Higher Statistics Award
Correlation and Simple Linear Regression
Simple Linear Regression
Chapter 3: Describing Relationships
Correlation and Simple Linear Regression
Least-Squares Regression
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Least-Squares Regression
1.11 Bivariate Data Credits 3 As91036.
11A Correlation, 11B Measuring Correlation
Chapter 3: Describing Relationships
Least-Squares Regression
Statistics Time Series
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Algebra Review The equation of a straight line y = mx + b
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Statistics Time Series
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Presentation transcript:

AS 91581 Achievement Standard

PPDAC cycle

AS A M E 3.9 One of the principles: Investigate bivariate measurement data Investigate bivariate measurement data, with justification Investigate bivariate measurement data, with statistical insight One of the principles: Grade distinctions should not be based on the candidate being required to acquire and retain more subject-specific knowledge.

Assessment is based on the PPDAC Pose a question Plan Data Analyse Conclude Back to question Assessment is based on the PPDAC

Statistical enquiry cycle (PPDAC) The Statistical Enquiry Cycle (PPDAC)

PPDAC is vital

Using the statistical enquiry cycle to … investigate bivariate measurement data involves: posing an appropriate relationship question using a given multivariate data set selecting and using appropriate displays identifying features in data finding an appropriate model describing the nature and strength of the relationship and relating this to the context using the model to make a prediction communicating findings in a conclusion From Explanatory Note 3

Using the statistical enquiry cycle to … investigate bivariate measurement data involves: posing an appropriate relationship question using a given multivariate data set selecting and using appropriate displays identifying features in data finding an appropriate model describing the nature and strength of the relationship and relating this to the context using the model to make a prediction communicating findings in a conclusion

Posing relationship questions Possibly the most important component of the investigation Time spent on this component can determine the overall quality of the investigation This component provides an opportunity to show justification (M) and statistical insight (E)

Posing relationship questions What makes a good relationship question? It is written as a question. It is written as a relationship question. It can be answered with the data available. The variables of interest are specified. It is a question whose answer is useful or interesting. The question is related to the purpose of the task. Think about the population of interest. Can the results be extended to a wider population?

Data

We need to understand the variables before we can make any progress. WHOA!

Consider the variables (using context))

Think about which variables could be related JUSTIFY YOUR REASONING

Developing question posing skills Pose several relationship questions (written with reasons/justifications) Possibly critique the questions The precise meaning of some variables may need to be researched

Different relationship questions Is there a relationship between variable 1 and variable 2 for Hector’s dolphins? What is the nature of the relationship between variable 1 and variable 2 for Hector’s dolphins? Can variable 1 be used to predict variable 2 for Hector’s dolphins?

Research about the situation helps develop ideas We might like to ask ourselves why we would want to know about any relationships that might exist” Research about the situation helps develop ideas

Reference to research

HINT: Groupings

A morphological study of skull and mandible features was undertaken to examine variation between the most genetically distinct population, occurring on the west coast of the North Island, and the populations around the South Island. Univariate and principal component analyses demonstrate that the North Island population can be differentiated from the southern populations on the basis of several skeletal characters.

Developing question posing skills It is expected that you do some research: Improve knowledge of variables and context May find some related studies that creates potential for integration of statistical and contextual knowledge  

Developing question posing skills Draw some scatter plots to start to investigate your questions Reduce, add to and/or prioritise their list of questions Possibly critique the questions again

Consider the variables (using context))

Appropriate displays Which variable goes on the x-axis and which goes on the y-axis? It depends on the question and on the variables of interest Is there a relationship between zygomatic width and rostrum length for Hector’s dolphins?

Variables on axes Is there a relationship between zygomatic width and rostrum length for Hector’s dolphins?”

This is a straight forward relationship question. Is there a relationship between zygomatic width and rostrum length for Hector’s dolphins?”

This is a ‘correlation’ type question. Is there a relationship between zygomatic width and rostrum length for Hector’s dolphins?”

There is no difference in the roles of each variable so it does not matter which variable goes on the x-axis and which goes on the y-axis. For this question it does not matter which variable goes on each axis. Is there a relationship between zygomatic width and rostrum length for Hector’s dolphins?”

Which variable goes on the x-axis and which goes on the y-axis? It depends on the question and on the variables of interest Is there a relationship between zygomatic width and rostrum length for Hector’s dolphins? Is there a relationship between rostrum width at midlength and rostrum width at base for Hector’s dolphins?

Variables on axes Is there a relationship between rostrum width at midlength and rostrum width at the base for Hector’s dolphins?

“I think that the width at the base could help determine (or influence) the width at midlength.” Is there a relationship between rostrum width at midlength and rostrum width at the base for Hector’s dolphins?

Rostrum length at base (RWB) the explanatory variable and rostrum width at midlength the response variable.. Is there a relationship between rostrum width at midlength and rostrum width at the base for Hector’s dolphins?

Which variable goes on the x-axis and which goes on the y-axis? It depends on the question and on the variables of interest Is there a relationship between zygomatic width and rostrum length for Hector’s dolphins? Is there a relationship between rostrum width at midlength and rostrum width at the base for Hector’s dolphins? For Hector’s dolphins, can rostrum length be used to predict mandible length?

Variables on axes For Hector’s dolphins, can rostrum length be used to predict mandible length?

This is a regression question For Hector’s dolphins, can rostrum length be used to predict mandible length?

It is clear that we are interested in how mandible length responds to rostrum length, so rostrum length is the explanatory variable and mandible length is the response variable.

Experiments: If the data comes from an experiment then some variables would be classed as input variables and others as output variables. It would make sense to see how an input variable affects an output variable. The input variable would be the explanatory variable and the output variable would be the response variable.

Using the statistical enquiry cycle to … investigate bivariate measurement data involves: posing an appropriate relationship question using a given multivariate data set selecting and using appropriate displays identifying features in data finding an appropriate model describing the nature and strength of the relationship and relating this to the context using the model to make a prediction communicating findings in a conclusion

Same approach whether linear or non-linear investigate bivariate measurement data involves: posing an appropriate relationship question using a given multivariate data set selecting and using appropriate displays identifying features in data finding an appropriate model describing the nature and strength of the relationship and relating this to the context using the model to make a prediction communicating findings in a conclusion

Features, model, nature and strength Generate the scatter plot using iNZight (or excel) RWM vs RWB Use iNZight Do RWM versus RWB in iNZight

Features, model, nature and strength Generate the scatter plot Let the data speak Use your eyes (visual aspects) Write about what you see, not what you don’t see DON’T fit a model yet

Features, model, nature and strength Template for features (but allow flexibility) Trend Association (nature) Strength (degree of scatter) Groupings/clusters Unusual observations Other (e.g., variation in scatter) If there are groupings evident in the plot then it may be better to explore the groups separately at this stage.

Trend From the scatter plot it appears that there is a linear trend between rostrum width at base and rostrum width at midlength. Use descriptions of variables rather than variable names- keeps it contextual. This is a reasonable expectation because two different measures on the same body part of an animal could be in proportion to each other. Key point is linear or non-linear. Use of descriptions of variables rather than variable names. Refer to the graph. Last paragraph: This illustrates some reflection.

Association The scatter plot also shows that as the rostrum width at base increases the rostrum width at midlength tends to increase. This is to be expected because dolphins with small rostrums would tend to have small values for rostrums widths at base and midlength and dolphins with large rostrums would tend to have large values for rostrums widths at base and midlength. A contextual description is preferable to one using technical terms. However it is appropriate to use terms such as positive, negative or no association, but they are better used after the contextual description. “Tends to” is a good term to use. Higher level considerations You should reflect on the nature of the relationship with respect to the context. At this stage you could acknowledge (if the data does not come from a randomised experiment) that they have found only a statistical relationship and that this does not necessarily imply a causal relationship between the variables. Alternatively, if the data comes from a suitable experiment they could make a causation claim in their conclusion. Students may acknowledge that other variables (which they must name) would impact on the (response) variable, and suggest how they might impact on the variable. For example, gender, age, etc.

Association The scatter plot also shows that as the rostrum width at base increases the rostrum width at midlength tends to increase. A contextual description is preferable to one using technical terms. However it is appropriate to use terms such as positive, negative or no association, but they are better used after the contextual description.

Higher level considerations This is to be expected because dolphins with small rostrums would tend to have small values for rostrums widths at base and midlength and dolphins with large rostrums would tend to have large values for rostrums widths at base and midlength. You should reflect on the nature of the relationship with respect to the context. At this stage you could acknowledge (if the data does not come from a randomised experiment) that they have found only a statistical relationship and that this does not necessarily imply a causal relationship between the variables.

Higher level considerations Alternatively, if the data comes from a suitable experiment you could make a causation claim in your conclusion. You may acknowledge that other variables (which they must name) would impact on the (response) variable, and suggest how they might impact on the variable. For example, gender, age, etc. and perhaps show these and compare.

Find a model Because the trend is linear I will fit a linear model to the data. You must state why you have selected this particular model. The line is a good model for the data because for all values of rostrum width at base, the number of points above the line are about the same as the number below it.

Find a model Because the trend is linear I will fit a linear model to the data. You must state why you have selected this particular model. Don’t show the equation yet. There are still features in the data to comment on.

Find a model Because the trend is linear I will fit a linear model to the data. You must state why you have selected this particular model. A discussion of fit throughout the range of x-values is required. If the number of observations is small that casts some doubt on the reliability of the model.

Find a model The line is a good model for the data because for all values of rostrum width at base, the number of points above the line are about the same as the number below it. A discussion of fit throughout the range of x-values is required. If the number of observations is small that casts some doubt on the reliability of the model.

Strength The points on the graph are reasonably close to the fitted line so the relationship between rostrum width at midlength and rostrum width at base is reasonably strong. This is supported by the correlation coefficient r = This must refer to visual aspects of the display. Appropriate descriptors are: strong, moderate or weak.

Strength The points on the graph are reasonably close to the fitted line so the relationship between rostrum width at midlength and rostrum width at base is reasonably strong. This is supported by the correlation coefficient r = You must refer to the degree of scatter about the trend or, equivalently, the closeness of the points to the trend.

Groupings None are apparent from the scatter plot.

If groupings are apparent then can a reason be found that explains the groupings? None are apparent from the scatter plot.

But the data set has an obvious grouping variable (Island). None are apparent from the scatter plot.

INZight

Groupings

Groupings Now it seems that NI Hector’s dolphins seem to have larger rostrum widths at base than SI ones. This is now helpful for doing predictions. There are some common values of RWB for both species of dolphin. This allows more comments to be made about the model fitted to all data points.

Groupings Now it seems that NI Hector’s dolphins seem to have larger rostrum widths at base than SI ones. Many NI points are above the line. Why is that? It could be the effect on the fitted line of the unusual point (RWB = 86, RWM = 50). Use iNZight to look at NI and SI dolphins separately (Subset by Island)

Groupings

This now provides an opportunity to comment on these models.

South Island dolphins: Model seems appropriate.

North Island: Only 13 data points – model may not be useful for reliable conclusions.

Points with lower RWB values tend to be below the line, points in the middle tend to be above. Could try a quadratic model but small number of data points is an issue

Unusual points One dolphin, one of those with a rostrum width at base of 86mm, had a smaller rostrum width at midlength compared to dolphins with the same, or similar, rostrum widths at base. Refer to actual data points. If not discussed earlier, comment on the effect any unusual values would have on the model.

Unusual points One dolphin, one of those with a rostrum width at base of 86mm, had a smaller rostrum width at midlength compared to dolphins with the same, or similar, rostrum widths at base. Refer to actual data points.

Unusual points One dolphin, one of those with a rostrum width at base of 86mm, had a smaller rostrum width at midlength compared to dolphins with the same, or similar, rostrum widths at base. Comment on the effect any unusual values would have on the model.

Anything else Variation in scatter?

Anything else Variation in scatter? Constant versus non-constant scatter relates to assumptions of the linear regression i.e. that the residuals are normally distributed.

Anything else Variation in scatter? Variation in scatter is relevant when discussing precision of predictions. Variation in scatter is more cognitively demanding because it involves “distribution reasoning”. Don’t be distracted by scarcity of data. This must refer to visual aspects of the display. Students should try to keep this contextual. For example: As the values of x increase the amount (or degree) of variation in y tends to increase (for fanning out)

Anything else Variation in scatter? Refer to visual aspects of the display and keep it contextual. For example: As the values of x increase the amount (or degree) of variation in y tends to increase (for fanning out) Don’t be distracted by scarcity of data.

Prediction

Try to use relevant values of x Try to use relevant values of x. In this case any RWB value from 81mm to 86mm is sensible.

Prediction Linear Trend RWM = 0.77 * RWB + -8.72 Summary for Island = 1 Linear Trend RWM = 0.48 * RWB + 19.19 Summary for Island = 2 Linear Trend RWM = 0.46 * RWB + 14.37

Use all three models with RWB = 85mm (all, NI only, SI only) Linear Trend RWM = 0.77 * RWB + -8.72 Summary for Island = 1 Linear Trend RWM = 0.48 * RWB + 19.19 Summary for Island = 2 Linear Trend RWM = 0.46 * RWB + 14.37 Using RWB = 85mm All points: RWM = 0.77 x 85 – 8.72 = 56.73 NI dolphins: RWM = 0.48 x 85 + 19.19 = 59.99 SI dolphins: RWM = 0.46 x 85 + 14.37 = 53.47

Interpret in context, with units. Linear Trend RWM = 0.77 * RWB + -8.72 Summary for Island = 1 Linear Trend RWM = 0.48 * RWB + 19.19 Summary for Island = 2 Linear Trend RWM = 0.46 * RWB + 14.37

Use sensible rounding. Using RWB = 85mm Linear Trend RWM = 0.77 * RWB + -8.72 Summary for Island = 1 Linear Trend RWM = 0.48 * RWB + 19.19 Summary for Island = 2 Linear Trend RWM = 0.46 * RWB + 14.37 Using RWB = 85mm All points: RWM = 0.77 x 85 – 8.72 = 56.73 NI dolphins: RWM = 0.48 x 85 + 19.19 = 59.99 SI dolphins: RWM = 0.46 x 85 + 14.37 = 53.47

Dangers – predicting outside the range of observed x-values. Linear Trend RWM = 0.77 * RWB + -8.72 Summary for Island = 1 Linear Trend RWM = 0.48 * RWB + 19.19 Summary for Island = 2 Linear Trend RWM = 0.46 * RWB + 14.37

How good is a prediction? Linear Trend RWM = 0.77 * RWB + -8.72 Summary for Island = 1 Linear Trend RWM = 0.48 * RWB + 19.19 Summary for Island = 2 Linear Trend RWM = 0.46 * RWB + 14.37 A prediction is an estimate so I need to consider bias and precision.

How good is a prediction? Linear Trend RWM = 0.77 * RWB + -8.72 Summary for Island = 1 Linear Trend RWM = 0.48 * RWB + 19.19 Summary for Island = 2 Linear Trend RWM = 0.46 * RWB + 14.37 Bias: If a model is good, then a prediction is likely to be accurate

How good is a prediction? Linear Trend RWM = 0.77 * RWB + -8.72 Summary for Island = 1 Linear Trend RWM = 0.48 * RWB + 19.19 Summary for Island = 2 Linear Trend RWM = 0.46 * RWB + 14.37 The number of data points used to form the model is relevant.

How good is a prediction? Linear Trend RWM = 0.77 * RWB + -8.72 Summary for Island = 1 Linear Trend RWM = 0.48 * RWB + 19.19 Summary for Island = 2 Linear Trend RWM = 0.46 * RWB + 14.37 Precision – Relates to the degree of scatter Don’t relate a prediction to observed y- values.

Statistical enquiry cycle (PPDAC)

Communicating findings in a conclusion Each component of the cycle must be communicated The question(s) must be answered

Summary Basic principles Each component Context Visual aspects Higher level considerations Justify Extend Reflect

Other issues (if time) The use or articles or reports to assist contextual understanding How to develop understanding of outliers on a model The place of residuals and residual plots Is there a place for transforming variables?