Download presentation
Presentation is loading. Please wait.
Published byEmily Marlene Harper Modified over 7 years ago
1
Statistical Inquiry with Bivariate Data AS 91036 Bernard Frankpitt Hagley College
2
History Linear regression by least squares algoritm
Minimize mean square of the residuals (MMSE) Pearson’s correlation Coefficient Use regression models for estimation Non-linear regression by (logarithmic) transformation Coefficient of determination Discuss difference between causality and correlation Discuss hidden variables
3
What’s New? The framework of the “Statistical Inquiry Process”
The “Is there a statistical relationship?” questions Clarity in purpose of statistical relationship in context Regression is only one tool to describe a statistical relationship The Level 1 standard does not mention regression Greater emphasis on qualitative description of relationship Greater emphasis on justification statements of from qualitative evidence from sample distributions.
4
Statistical Inquiry Process
Problem Introduce context, state investigative question, and define population Plan Steps from population to data: How did you gather the data? Data Steps from data to sample distribution: graphs, tables and statistics Analysis Gather evidence from characteristics of your sample distributions Conclusion Inference step: Use evidence to justify your answer in population context
5
Lets Do An Assessment !
6
Assessing 91036 (Level 1 Bivariate Investigation)
You need to provide students: A context that they understand well (teaching required!) An investigative relationship question A source from which they can gather data and demostrate that they can manage variation in their data source Appropriate technology for displaying and re-categorizing data The glossary in the senior curriulum guide in decyphering the red statistical key-words in the standard.
7
Example Context Investigative question Source for data Resources
You manage a fleet of cars in your company and you know the distance that each car has travelled. You want to use this information to estimate the value of your fleet. Investigative question What is the statistical relationship between the value of a car and the distance it has travelled? Source for data Use a Trade me as a data source. Resources Use iNZight-Lite to process your data, and write your report in Word.
8
Step 1: Look at the data
9
Step 1: Look at the data I took a sample of 70 cars from Trade Me I chose. I chose 7 makes, and 10 cars from each make. I sorted the cars with the “latest listing” order and recorded the asking price and stated kilometers and the maker’s name for each car. Using the listing order improves the randomness of the samplng process by mixing up the prices and distances of the first 10 cars in the listing. I expect that listing order is independent of price, age and model of the cars.
10
Step 1: look at the data
11
Step 1: Look at the data
12
Step 1: Look at the data I notice that the Expensive cars are all Lamborghinis, or cars that are practically new. I decide to take out the Lamborghinis along with cars that have travelled less than $2000 km. I am not likely to have expensive cars in the company fleet, and it is the older cars that I am having difficulty pricing.
13
Step 1: Look at the data
14
Step 1: Look at the data I notice that cars that have travelled more than km tend to be cheaper than cars that have travelled less than km There is still to much variation in the prices to say much more about the relationship A lot of the variation seems to come from the mix of models I will try plotting a scatter plot for each model individually I would do better to look at one model at a time. The fleet manager is likely to own only a few models. He can afford to investigate the relationship one model at a time.
15
Step 1: Look at the data
16
Step 1: Look at the data I notice that separating the makes reduces the variation a lot I will taking a new sample of a single model of Toyota I can justify this in the context because a firm is likely to use only one or two different models to cut down on operating costs I use the same method to choose a sample of 74 Corollas
17
Step 1: Look at the data:
18
Step 2: Analyse the sample distribution
I notice that in this sample the joint distribution of price and distance shows that there is a moderately strong statistical relationship between the two variables The Corollas’ prices tend to decrease as the the distance that they have travelled increases The relationship is non-linear. As the distance that a car has travelled increases from 0 to km, the typical price of a car decreases rapidly. For cars has that have travelled km the typical price is independent of distance travelled At any fixed distance between 0 and km, the distribution of car prices is skewed in the direction of lower prices.
19
Step 2: Analyse the sample distribution
20
Step 2: Analyse the sample distribution
21
Step 2: Analyse the sample distribution
I have grouped the distance measurements into four intervals [0,500000), [50000, ), [100000, ), [150000, ) and I have plotted box plots for the distributions of car prices for cars with distances in each of these intervals. Comparing the box plots I see that the relationship between price and distance is strongest for distances between 0 and km – the cars prices halve over this interval. The relationship weakens between km and is very weak for distances geater than km
22
Step 2: Analyse the sample distribution
23
Step 2: Analyse the sample distribution
24
An Achieved Write-UP
25
Step 3: Write up report (Achieved)
Problem: A manager of a fleet of cars knows the distance that each car has travelled, and wants to estimate the value of the fleet. He needs to answer the question: “What is the relationship between the cost of a used car and the distance it has travelled?” My population is used cars in New Zealand Works from a given relationship question
26
Step 3: Write up report: (Achieved)
Plan: I am using Trade Me as a source for the data to investigate this question. Trade me has over used cars listed and is a very popular way to advertise and sell cars. To get a good picture of used car prices, I decide to choose 10 cars from each of 7 makers. I pick a variety of cheaper and more expensive cars. When I list the cars on trade me I list them from the latest listing to the oldest listing. This is to make sure that the different prices are well mixed and the sampling method is more random. I record the asking price for each car and the distance that it has travelled on an Excel spreadsheet, and load the data into iNZight-lite Plans and conducts an investigation using bivariate data, gathers data, determines appropriate variables and manages sources of variation.
27
Step 3: Write up report: (Achieved)
Data: I use iNZight-lite to make a scatterplot of Price and Distance. Selects appropriate display
28
Step 3: Write up report: (Achieved)
Analysis: From the Scatterplot you can see that there is a relationship between Price and Distance travelled. The cost of a car decreases as the distance that it has travelled increases. For distances less than km the prices are up to $ For distances greater than km the prices are below $50 000, and by the time the distance is km the price is less than $ and there is not much change in the average price after km. The strength of the relationship is strong at low distances, but weak for distances above km Determines appropriate measures (to describe relationship)
29
Step 3: Write up report: (Achieved)
Conclusions: I conclude that there is a statistical relationship between price and distance travelled for all second-hand cars sold in New Zealand. If I could look at the prices of all cars in New Zealand, I would expect the the average price to drop quickly until the car has gone km. After that the average price changes more slowly, For all distances over km, the average prices of cars at each distance are more or less the same. Communicates relationship in the data in a conclusion
30
An Excellence Write-UP
31
Step 3: Write up report (Excellence)
Problem: A manager of a fleet of cars knows the distance that each car has travelled, and wants to use this information to prepare amortization tables for the value of the fleet. He needs to answer the question: “What is the relationship between the price of a used car and the distance it has travelled?” I am going to assume that fleet has only one model of car. This assumption makes the relationship between price and distance clearer, and is also a sensible way to purchase and manage cars for a fleet. My population is used Toyota Corollas in New Zealand
32
Step 3: Write up report (Excellence)
Plan: I am using Trade Me as a source for the data to investigate this question. Trade Me has over used cars listed. The listed price of a car on Trade Me is not necessarily the same as its value. To get a good picture of used car prices, I chose 10 cars from each of 7 makers for a preliminiary investigation. I picked a variety of makes, some cheap and some expensive. When I list all the cars of a single make on trade me, I list them in order from the newest listing to the oldest listing. I do because the order in which cars are listed is independent of their price, model, age and distance travelled. This reduced the chance that I introduced bias in my sample estimates of average prices from a bad samping method. I record the make of each car, its asking price and the distance that it has travelled on an Excel spreadsheet, and load the data into iNZight-lite
33
Step 3: Write up report (Excellence)
Data: I used iNZight light to make a scatterplot of Price and Distance.
34
Step 3: Write up report (Excellence)
At first glance the scatter plot shows a relationship between Price and distance travelled, but a closer examination shows that the relationship is largely the result of a cluster of expensive, new Lamborghinis. I decided to filter the data to remove the Lamborghinis and cars that have travelled less than 2000 km. These cars are essentially new and their value is close to their original price. A manager is not so interested in valuing these cars as he is in older cars.
35
Step 3: Write up report (Excellence)
The filtered data still has a lot of variation that masks any relationship between price and distance. I decide to group the scatter plot by make.
36
Step 3: Write up report (Excellence)
When I plot each brand separately I see that the relationship between price and distance is clearer if I use a single brand. I choose a new sample of 76 Toyota Corollas
37
Step 3: Write up report (Excellence)
38
Step 3: Write up report (Excellence)
Choosing a single model eliminates the variation that was masking the relationship. This justifies my choice in the Problem section to investigate only one model of car – the Toyota Corolla. There is still a lot of variation in price among cars with low distance travelled. In particular the is one car with about km that costs less than $1000. This variation could be caused by the way people assign reserve prices to the cars, but it could also be caused by the condition of the car, or its age. I decided not to filter out these outliers. This sample distribution provided the evidence that I used to complete the report.
39
Step 3: Write up report (Excellence)
Analysis I notice from this dot plot that the joint distribution of price and distance shows a moderately strong statistical relationship between the two variables The Corollas’ prices tend to decrease as the the distance that they have travelled increases The relationship is non-linear. The typical price decreases fast between 0 and km, and does not change much after km At any fixed distance between 0 and km, the distribution of car prices is skewed in the direction of lower prices.
40
Step3: Write up report (Excellence)
41
Step3: Write up report (Excellence)
I have grouped the distance measurements into four intervals [0,500000), [50000, ), [100000, ), [150000, ) and I have plotted box plots for the distributions of car prices for cars with distances in each of these intervals. Comparing the box plots I see that the relationship between price and distance is strongest for distances between 0 and km – the cars prices halve over this interval. The relationship weakens between and km and is very weak for distances geater than km. The distribution of car prices does not change with increased distance when the distance is over km.
42
Step 3: Write up report (Excellence)
Conclusions: My analyis shows me that there is a non-linear decreasing relationship between price and distance travelled for Toyota Corolla cars in New Zealand. The relationship is strongest for new cars which lose their value quickly during the first km of use. It is weakest for cars that have traveled more than km. The distribution of Toyota Corolla prices does not change much for distances between and km. The manager only needs to worry about amortization for cars that heve travelled less than km. A manager who had more than one model of car in the fleet would have to work out a separate relationship for each different model.
43
Thankyou Any Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.