How could data be used in an EPQ?

Slides:



Advertisements
Similar presentations
5 Number Summary Box Plots. The five-number summary is the collection of The smallest value The first quartile (Q 1 or P 25 ) The median (M or Q 2 or.
Advertisements

Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Measures of Central Tendency
Programming in R Describing Univariate and Multivariate data.
Welcome to Math 6 Statistics: Use Graphs to Show Data Histograms.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
VCE Further Maths Chapter Two-Bivariate Data \\Servernas\Year 12\Staff Year 12\LI Further Maths.
Statistics 2. Variables Discrete Continuous Quantitative (Numerical) (measurements and counts) Qualitative (categorical) (define groups) Ordinal (fall.
INVESTIGATION 1.
1 Chapter 4: Describing Distributions 4.1Graphs: good and bad 4.2Displaying distributions with graphs 4.3Describing distributions with numbers.
1 Further Maths Chapter 2 Summarising Numerical Data.
Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.
Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is.
Plan for Today: Chapter 11: Displaying Distributions with Graphs Chapter 12: Describing Distributions with Numbers.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
CCGPS Advanced Algebra Day 1 UNIT QUESTION: How do we use data to draw conclusions about populations? Standard: MCC9-12.S.ID.1-3, 5-9, SP.5 Today’s Question:
Descriptive Statistics(Summary and Variability measures)
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Central Tendency  Key Learnings: Statistics is a branch of mathematics that involves collecting, organizing, interpreting, and making predictions from.
Statistics Unit 6.
Descriptive Statistics ( )
UNIT ONE REVIEW Exploring Data.
Exploratory Data Analysis
Chapter 1: Exploring Data
Chapter 4 Review December 19, 2011.
Unit 4 Statistical Analysis Data Representations
4. Interpreting sets of data
Objective: Given a data set, compute measures of center and spread.
Unit 6 Day 2 Vocabulary and Graphs Review
CHAPTER 2: Describing Distributions with Numbers
Statistical Reasoning
Description of Data (Summary and Variability measures)
Numerical Descriptive Measures
CHAPTER 1 Exploring Data
Descriptive Statistics
Unit 4 Statistics Review
DAY 3 Sections 1.2 and 1.3.
Please take out Sec HW It is worth 20 points (2 pts
Statistics Unit 6.
Topic 5: Exploring Quantitative data
Histograms: Earthquake Magnitudes
Describing Distributions with Numbers
Numerical Descriptive Measures
Warmup Draw a stemplot Describe the distribution (SOCS)
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
CHAPTER 1 Exploring Data
Describing Quantitative Data with Numbers
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Honors Statistics Review Chapters 4 - 5
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Ten things about Descriptive Statistics
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Welcome.
STAT 515 Statistical Methods I Sections
Presentation transcript:

How could data be used in an EPQ? Nicholas Martindale 4th November 2017

Aims To help students feel confident in using data. To help students use and present data effectively.

Questions What is data? What questions can we ask with data? How can we use data to answer these questions?

1. What is data? A collection of facts or values which can be processed to provide information. Variables Observations

1. What is data? A collection of facts or values which can be processed to provide information. Numerical Variables (e.g. counts, percentages) Categorical Variables (e.g. names, groups)

2. What questions can we ask with data? Representative value: What is a typical value? Spread: How much variation is in the data? Composition: What’s in the data? Distribution: The shape of the data. Comparison: Differences between groups. Trend: Change over time. Relationship: How one thing depends on another. Summarising data Presenting data

3. Answering questions: summaries Representative value: What is a typical value? Spread: How much variation is in the data? Composition: What’s in the data? Distribution: The shape of the data. Comparison: Differences between groups. Trend: Change over time. Relationship: How one thing depends on another. Summarising data Presenting data

3. Answering questions: summaries Representative value: what is a typical value? A typical value can help us summarise a large amount of data with a single representative number. e.g. The mean number of students in primary schools is 271 There are different representative values you could choose from.

3. Answering questions: summaries Representative value: what is a typical value? Measure of Centre Definition Advantages Disadvantages Examples Mean Sum of values / Total number of values Very familiar. Uses all the data. Very large/small values can distort the answer. (1 + 2 + 3) / 3 = 2 (1 + 2 + 33) / 3 = 12 Median Middle value when in order Not affected by very. large/small values. Only depends on the middle values so may not be representative. 1, 2, 3 : median = 2 1, 2, 33: median = 2 0, 0, 0, 0, 0, 0, 2, 9, 9, 9, 9, 9, 9 median = 2 Mode Most common value The only average that can be used with non-numerical data. There may be no mode. There may be more than one mode. 1, 2, 3 : no mode 1, 2, 2, 3: mode = 2

3. Answering questions: summaries Spread: how much variation is in the data? A measure of spread can help us summarise how much the data varies or how unequal it is. e.g. The range of number of students in primary schools is 1455 – 5 = 1450. There are different measures of spread you could choose from.

3. Answering questions: summaries Spread: how much variation is in the data? Measure of Spread Definition Advantages Disadvantages Range Largest – Smallest Easy to calculate. Familiar to students. Not very informative. Distorted by very large/small values. Does not use all the data. Interquartile Range Upper Quartile – Lower Quartile Not distorted by extreme values. Standard Deviation Mean distance of each value from the overall mean Statistically sophisticated. Calculated in Excel/Google/R. Unfamiliar to students. Inappropriate for skewed data.

3. Answering questions: summaries Representative value: What is a typical value? Spread: How much variation is in the data? Composition: What’s in the data? Distribution: The shape of the data. Comparison: Differences between groups. Trend: Change over time. Relationship: How one thing depends on another. Summarising data Presenting data

3. Answering questions: advice on figures What advice do students need on the use of figures?

3. Answering questions: advice on figures How to present figures: Where to put figures: Think first about what you want Soon after they are referred to in the text. to show, then choose a graph. - Text size and font easy to read. How to refer to figures: - Appropriate, informative title. - All figures should have a reference number e.g. “Figure 2” - Clearly labelled axes including units. - All figures used should be referred to in the text e.g. “see Figure 2” Clearly labelled data (groups). Keep them uncluttered.

3. Answering questions: presenting data Which type of figure we use depends on the type of question we are trying to answer: Type of Question Recommended Types of Figure Composition: What’s in the data? Counts: Bar Chart Proportion: Pie Chart Distribution: What’s the shape of the data? Histogram Boxplot Comparison: Differences between groups Side-by-Side Bar Chart Side-by-Side Boxplot Trend: Changes over time Counts: Line graph, Multiple Bar Chart Distribution: Multiple Boxplots Relationship: How one thing depends on another Scatterplot

3. Answering questions: composition What’s in the data? We might want to show counts or proportions. Counts  Bar Charts Proportions  Pie Charts

3. Answering questions: composition Counts: Bar Chart Include 0 on the y-axis so that you don’t mislead the reader.

3. Answering questions: composition What’s wrong here?

3. Answering questions: composition Make sure to include 0 on the y-axis so that you don’t mislead the reader.

3. Answering questions: composition Proportions Pie Chart - Include percentage labels. 2.0% 3.4% 26.7% 68.0%

3. Answering questions: composition What’s wrong here?

3. Answering questions: composition The perspective in 3D distorts our perception of the relative sizes of the sectors. Avoid using 3D plots, they are often misleading.

3. Answering questions: distribution What’s the shape of the data?  Boxplots  Histograms

3. Answering questions: distribution Boxplot 50% of schools have a PTR of less (or more) than 19 (median = 19). 25% of schools have a PTR of 17 or less (lower quartile = 17) - 25% of schools have a PTR of 22 or more (upper quartile = 22) - All schools except outliers have a PTR between 8 and 31 (range of whiskers). Lower Quartile Upper Quartile Median

3. Answering questions: distribution Histogram - Same data as boxplot. Data is roughly symmetrical around 20. The mode PTR is about 20. There are very few schools with PTR less than 10 or greater than 30. Mode

3. Answering questions: distribution Skewed Data: when the data is not symmetrical Left Skew (long left tail) Right Skew (long right tail)

3. Answering questions: comparison How do groups in the data differ? Counts/Proportions  Side-by-Side Bar Charts Distributions  Side-by-Side Boxplots

3. Answering questions: comparison Side-by-Side Bar Charts Side-by-Side Boxplots

3. Answering questions: trend How does the data change over time? We might want to show how counts, proportions or distributions change over time. Counts  line graphs, multiple bar charts Proportions  stacked bar graphs Distributions  multiple boxplots

3. Answering questions: trend Counts over time: Line Graph

3. Answering questions: trend Proportions over time: Stacked Proportion Bar Chart Each bar represents all schools in a given year. The proportion of each type of school is represented by its height within the bar.

3. Answering questions: trend Distributions over time: Multiple Boxplots The median is increasing, so the typical school is growing larger over time. The interquartile range is increasing, so the difference in size between smaller and larger schools is increasing.

3. Answering questions: relationship Does the value of one variable depend on another? - We’ve already seen cases where the value of a numerical variable depends on the value of a categorical variable i.e. - % teachers over 50 depends on the type of school. - Number of schools depends on the phase and the type of school.

3. Answering questions: relationship Does the value of one variable depend on another? When we want to check if one numerical variable depends on the value of another numerical variable we check to see if they is a correlation between them. We use scatterplots to visually assess the relationship between two variables.

3. Answering questions: relationship Does the value of one variable depend on another? The value of a correlation is defined as being between -1 and 1. This “correlation coefficient” can be calculated easily in Excel or Google Sheets.

3. Answering questions: relationship Positive Correlation Correlation coefficient = 0.7 As the PTR increases the % of pupils achieving 5 A*-C also increases

3. Answering questions: relationship Negative Correlation Correlation coefficient = -0.8 As the % FSM increases the % of pupils achieving 5 A*-C decreases.

3. Answering questions: relationship No Correlation (or very little) Correlation coefficient = 0.06 There doesn’t seem to be a relationship between the % male teachers and the % 5 A*-C

Conclusion How to use data depends on the question being asked. Type of Question Recommended Use of Data Representative Value: What’s a typical value? Mean, Median, Mode Spread: How much variation is in the data? Range, Interquartile Range, Standard Deviation Composition: What’s in the data? Counts: Bar Chart Proportion: Pie chart Distribution: What’s the shape of the data? Histogram Boxplot Comparison: Differences between groups Side-by-Side Bar chart Side-by-Side Boxplot Trend: Changes over time Counts: Line graph, Multiple Bar chart Distribution: Multiple Boxplots Relationship: How one thing depends on another Scatterplot How to use data depends on the question being asked.