Download presentation
Presentation is loading. Please wait.
1
How could data be used in an EPQ?
Nicholas Martindale 4th November 2017
2
Aims To help students feel confident in using data.
To help students use and present data effectively.
3
Questions What is data? What questions can we ask with data?
How can we use data to answer these questions?
4
1. What is data? A collection of facts or values which can be processed to provide information. Variables Observations
5
1. What is data? A collection of facts or values which can be processed to provide information. Numerical Variables (e.g. counts, percentages) Categorical Variables (e.g. names, groups)
6
2. What questions can we ask with data?
Representative value: What is a typical value? Spread: How much variation is in the data? Composition: What’s in the data? Distribution: The shape of the data. Comparison: Differences between groups. Trend: Change over time. Relationship: How one thing depends on another. Summarising data Presenting data
7
3. Answering questions: summaries
Representative value: What is a typical value? Spread: How much variation is in the data? Composition: What’s in the data? Distribution: The shape of the data. Comparison: Differences between groups. Trend: Change over time. Relationship: How one thing depends on another. Summarising data Presenting data
8
3. Answering questions: summaries
Representative value: what is a typical value? A typical value can help us summarise a large amount of data with a single representative number. e.g. The mean number of students in primary schools is 271 There are different representative values you could choose from.
9
3. Answering questions: summaries
Representative value: what is a typical value? Measure of Centre Definition Advantages Disadvantages Examples Mean Sum of values / Total number of values Very familiar. Uses all the data. Very large/small values can distort the answer. ( ) / 3 = 2 ( ) / 3 = 12 Median Middle value when in order Not affected by very. large/small values. Only depends on the middle values so may not be representative. 1, 2, 3 : median = 2 1, 2, 33: median = 2 0, 0, 0, 0, 0, 0, 2, 9, 9, 9, 9, 9, 9 median = 2 Mode Most common value The only average that can be used with non-numerical data. There may be no mode. There may be more than one mode. 1, 2, 3 : no mode 1, 2, 2, 3: mode = 2
10
3. Answering questions: summaries
Spread: how much variation is in the data? A measure of spread can help us summarise how much the data varies or how unequal it is. e.g. The range of number of students in primary schools is 1455 – 5 = 1450. There are different measures of spread you could choose from.
11
3. Answering questions: summaries
Spread: how much variation is in the data? Measure of Spread Definition Advantages Disadvantages Range Largest – Smallest Easy to calculate. Familiar to students. Not very informative. Distorted by very large/small values. Does not use all the data. Interquartile Range Upper Quartile – Lower Quartile Not distorted by extreme values. Standard Deviation Mean distance of each value from the overall mean Statistically sophisticated. Calculated in Excel/Google/R. Unfamiliar to students. Inappropriate for skewed data.
12
3. Answering questions: summaries
Representative value: What is a typical value? Spread: How much variation is in the data? Composition: What’s in the data? Distribution: The shape of the data. Comparison: Differences between groups. Trend: Change over time. Relationship: How one thing depends on another. Summarising data Presenting data
13
3. Answering questions: advice on figures
What advice do students need on the use of figures?
14
3. Answering questions: advice on figures
How to present figures: Where to put figures: Think first about what you want Soon after they are referred to in the text. to show, then choose a graph. - Text size and font easy to read. How to refer to figures: - Appropriate, informative title. - All figures should have a reference number e.g. “Figure 2” - Clearly labelled axes including units. - All figures used should be referred to in the text e.g. “see Figure 2” Clearly labelled data (groups). Keep them uncluttered.
15
3. Answering questions: presenting data
Which type of figure we use depends on the type of question we are trying to answer: Type of Question Recommended Types of Figure Composition: What’s in the data? Counts: Bar Chart Proportion: Pie Chart Distribution: What’s the shape of the data? Histogram Boxplot Comparison: Differences between groups Side-by-Side Bar Chart Side-by-Side Boxplot Trend: Changes over time Counts: Line graph, Multiple Bar Chart Distribution: Multiple Boxplots Relationship: How one thing depends on another Scatterplot
16
3. Answering questions: composition
What’s in the data? We might want to show counts or proportions. Counts Bar Charts Proportions Pie Charts
17
3. Answering questions: composition
Counts: Bar Chart Include 0 on the y-axis so that you don’t mislead the reader.
18
3. Answering questions: composition
What’s wrong here?
19
3. Answering questions: composition
Make sure to include 0 on the y-axis so that you don’t mislead the reader.
20
3. Answering questions: composition
Proportions Pie Chart - Include percentage labels. 2.0% 3.4% 26.7% 68.0%
21
3. Answering questions: composition
What’s wrong here?
22
3. Answering questions: composition
The perspective in 3D distorts our perception of the relative sizes of the sectors. Avoid using 3D plots, they are often misleading.
23
3. Answering questions: distribution
What’s the shape of the data? Boxplots Histograms
24
3. Answering questions: distribution
Boxplot 50% of schools have a PTR of less (or more) than 19 (median = 19). 25% of schools have a PTR of 17 or less (lower quartile = 17) - 25% of schools have a PTR of 22 or more (upper quartile = 22) - All schools except outliers have a PTR between 8 and 31 (range of whiskers). Lower Quartile Upper Quartile Median
25
3. Answering questions: distribution
Histogram - Same data as boxplot. Data is roughly symmetrical around 20. The mode PTR is about 20. There are very few schools with PTR less than 10 or greater than 30. Mode
26
3. Answering questions: distribution
Skewed Data: when the data is not symmetrical Left Skew (long left tail) Right Skew (long right tail)
27
3. Answering questions: comparison
How do groups in the data differ? Counts/Proportions Side-by-Side Bar Charts Distributions Side-by-Side Boxplots
28
3. Answering questions: comparison
Side-by-Side Bar Charts Side-by-Side Boxplots
29
3. Answering questions: trend
How does the data change over time? We might want to show how counts, proportions or distributions change over time. Counts line graphs, multiple bar charts Proportions stacked bar graphs Distributions multiple boxplots
30
3. Answering questions: trend
Counts over time: Line Graph
31
3. Answering questions: trend
Proportions over time: Stacked Proportion Bar Chart Each bar represents all schools in a given year. The proportion of each type of school is represented by its height within the bar.
32
3. Answering questions: trend
Distributions over time: Multiple Boxplots The median is increasing, so the typical school is growing larger over time. The interquartile range is increasing, so the difference in size between smaller and larger schools is increasing.
33
3. Answering questions: relationship
Does the value of one variable depend on another? - We’ve already seen cases where the value of a numerical variable depends on the value of a categorical variable i.e. - % teachers over 50 depends on the type of school. - Number of schools depends on the phase and the type of school.
34
3. Answering questions: relationship
Does the value of one variable depend on another? When we want to check if one numerical variable depends on the value of another numerical variable we check to see if they is a correlation between them. We use scatterplots to visually assess the relationship between two variables.
35
3. Answering questions: relationship
Does the value of one variable depend on another? The value of a correlation is defined as being between -1 and 1. This “correlation coefficient” can be calculated easily in Excel or Google Sheets.
36
3. Answering questions: relationship
Positive Correlation Correlation coefficient = 0.7 As the PTR increases the % of pupils achieving 5 A*-C also increases
37
3. Answering questions: relationship
Negative Correlation Correlation coefficient = -0.8 As the % FSM increases the % of pupils achieving 5 A*-C decreases.
38
3. Answering questions: relationship
No Correlation (or very little) Correlation coefficient = 0.06 There doesn’t seem to be a relationship between the % male teachers and the % 5 A*-C
39
Conclusion How to use data depends on the question being asked.
Type of Question Recommended Use of Data Representative Value: What’s a typical value? Mean, Median, Mode Spread: How much variation is in the data? Range, Interquartile Range, Standard Deviation Composition: What’s in the data? Counts: Bar Chart Proportion: Pie chart Distribution: What’s the shape of the data? Histogram Boxplot Comparison: Differences between groups Side-by-Side Bar chart Side-by-Side Boxplot Trend: Changes over time Counts: Line graph, Multiple Bar chart Distribution: Multiple Boxplots Relationship: How one thing depends on another Scatterplot How to use data depends on the question being asked.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.