Download presentation
Presentation is loading. Please wait.
1
Edexcel: Large Data Set Activities
C Beale
2
Possible suggestions for use: Homework tasks End of unit assessments Lesson activities for use with ICT resources End of course revision activities Holiday project Recommended ICT: Excel Geogebra Tasks are designed to be completed within the file, with text and graphics inserted as required. Additional pages may be added wherever necessary. This is the question that your experiment answers Useful Documents Data-set Double click to open
3
Contents 1. Getting to know the data 2. Sampling 3. Grouped data 4
Contents 1. Getting to know the data 2. Sampling 3. Grouped data 4. Outliers, spread and skew 5. Correlation and regression 6. Hypothesis testing This is the question that your experiment answers
4
Task 1 – Getting to know the data
Types of Data Cleansing the Data Daily Mean Temp Look at the 14 different data-set variables and decide whether each is qualitative or quantitative, and if quantitative is it discrete or continuous Read the information sheet to find out how the data was collected and recorded. For each variable determine whether there may be any issues when analysing the data in its current format. Are there any gaps? What would be the best was to deal with these? How will these changes impact the reliability of the data? Make any necessary changes to the data in your spreadsheet. Use track changes in case you wish to amend these at a later date. Variable Qualitative y/n Quantitative d/c Mean Temp Total Rainfall Total Sunshine Mean Wind-speed Wind-speed Beaufort Max Gust Relative humidity Total Cloud Mean Visibility Mean Pressure Wind Direction Wind Cardinal Direction Max Gust Direction Gust Cardinal Direction Daily Total Rainfall Step 1 On the Review tab, select Track changes, Highlight changes Step 2 Tick the Track changes while editing box, and ensure the highlight changes on screen box is then ticked Press OK Daily Total Sunshine Step 3 Any cell which you amend will be marked with a blue triangle. Hovering over the cell will show the changes made
5
Task 1 – Getting to know the data (Continued)
Daily Mean Wind Speed Daily Max. Relative Humidity Daily Mean Pressure Wind Speed (Beaufort) Daily Mean Total Cloud Daily Mean Wind and Gust Direction Max. Gust Daily Mean Visibility Wind and Gust Cardinal Directions
6
Different Sampling Methods
Task 2 - Sampling Different Sampling Methods Simple Random Sample Quota Sample Using your cleansed data, you wish to take a sample of size 50 of UK rainfall for May – Oct 1987 to find an estimate for the mean total UK rainfall for the period. Taking your cleansed data, and treating the data set as the “population” carry out the following types of samples clearly describing your method: Simple random Systematic Stratified by Beaufort wind-speed conversion Quota Convenience Using each of your samples of size 50, calculate the mean total rainfall. Systematic Sample Convenience Sample How to perform a random sample in Excel Stratified by Beaufort wind-speed conversion Which of your samples do you think gave you the most accurate mean? Why?
7
Task 3 – Grouped Data Ungrouped Data Grouped Data
Step 2 Combine all the UK 2015 visibility data. Find Q1, Q2 and Q3 from your list of values. Calculate the mean and standard deviation Insert your lower and upper bounds for each of the 5 bars into the x co-ordinate column, and insert each bar’s frequency density into the corresponding two cells. Group the data into 5 classes of uneven width. Produce a Histogram Use interpolation to estimate the quartiles from your grouped data. Find estimates for the mean and standard deviation Excel Step 1 Step 2 Highlight data, insert, scatter with straight lines and markers Step 1 Click here to open the Geogebra graphing calculator You might find this useful Click the menu button at the top right. Select maths Calcs, Spreadsheet calc Geogebra Step 3 Highlight data, two variable regression analysis Populate the spread sheet with data for your class intervals. A histogram can be created in Excel or Geogebra, however since we have unequal class widths we will need to use a scatter diagram with lines to plot the vertices of each of the bars. Paste the data into one of the columns, click on the analysis button and select one variable Click the settings cog, and tick the line graph box Step 4 Step 3 Label your axes, titles etc and insert the histogram into the presentation Click the show statistics button at the top right of the screen. Extract the required info from the statistics box and insert below Q1 Q2 Q3 Q4 Mean S.D
8
Task 3 – Grouped Data Step 1 Step 3 Step 4 To find estimates for the quartiles, mean and standard deviation, calculate the midpoint of each of your classes, then repeat as per ungrouped data Click the show statistics button at the top right of the screen. Extract the required info from the statistics box and insert below Paste the data into one of the columns, click on the analysis button and select one variable Q1 Q2 Q3 Q4 Mean S.D Step 2 How accurate were your estimates compared to actuals? Why was this? Click the menu button at the top right. Select maths Calcs, Spreadsheet calc
9
Task 4 – Spread, Outliers and Skew
Comparing Data Step 2 Visibility Select one location of your choice. Use Geogebra to produce box plots to compare 1987 vs 2015 for the following variables: Mean Temperature Visibility Max gust Make comparisons, suggest possible explanations for any changes over time. Ensure outliers are dealt with appropriately. Discuss skew Which most closely fits a normal distribution? To determine how closely a variable fits a normal distribution, highlight the column containing data for one year. Select one variable analysis. Using this menu change the bar chart to a normal quantile plot. The closer the points fall to the line, the nearer to a normal distribution it is. You may wish to investigate how removing outliers affects this. Mean Temperature Maximum Gust Step 1 Copy the data for each year into adjacent cells. Highlight both columns, select multiple variable analysis. Insert the box plots into the power point. Use the statistics button to find the quartiles. Label them on each box plot.
10
Task 5 – Correlation and Regression
Problem / Question Step 3 Negative Correlation Identify a pair of variables you expect to have: Positive Negative No Correlation Produce a scatter graph for each, find the equation of the regression line and the correlation co-efficient. Discuss the relevance of each of these in the context of the variables. Use the statistics button to find the correlation co-efficient (r) Insert your scatter diagrams and comments onto the page Step 1 Positive Correlation No Correlation Copy the data into adjacent cells, highlight both and select two variable regression analysis. Step 2 To draw and find the equation of the regression line, change the regression model to linear
11
Task 6 – Hypothesis Testing
Climate Change – rainfall data Hypothesis Test Location 2 - Rainfall Hypothesis Test Location 1 - Temperature Choose two locations, one UK, one international. Use the 1987 data to count how many days it rained during the period recorded. Divide by the number of days to find the relative frequency, use this as an estimate for the probability of rain in your location. Count the number of days of rain in 2015 in each of your chosen locations. A change in probability of rainfall is evidence to suggest climate change. Investigate at 10% significance level whether to be concerned about climate change in your chosen locations. Climate Change –temperature data Hypothesis Test Location 1 - Rainfall Using your same locations, use the 1987 data to calculate an average temperature for each. Count how many days in 1987 exceeded average temperature. Divide by the number of days in the period to find the relative frequency and use this as an estimate for probability. Count the number of days in 2015 where the temperature exceeded your average temperature for 1987 An increase in probability of exceeding the average temperature is evidence to suggest climate change. Investigate at 10% significance level whether to be concerned about climate change in your chosen locations. Hypothesis Test Location 2 - Temperature
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.