1. Data Processing Sci Info Skills
Statistics – making sense of numbers Data Collection Data Organisation Data Interpretation Data Presentation
Types of statistics Descriptive the processed data provides a summary of the observations/measurements, such as averages, variations and graphs Inferential the processed data is used to make judgements or predictions, such as trends, indications of variations between different samples
Class Exercise 1.1 the average December temperature in Sydney has increased by 1C in the last 50 years it is expected that the average December temperature in Sydney will increase by another 1C within 25 years 25% of people surveyed at a shopping centre indicated that they were aware of increasing temperatures in Sydney A survey has a shown that 75% of Sydneysiders are ignorant of the changing climatic conditions in their city Descriptive Inferential Descriptive Inferential
Class Exercise 1.2 Identify the sample and the population in the following. (a) a bottle of water is taken from a dam to be tested Sample – the water in the bottle Population – all the water in the dam (b) the frog population of a large wetland is checked by looking at two separate hectares Sample – the two hectares Population – the whole wetlands
Class Exercise 1.2 (c) the levels of lead in fallout around a smelter are assessed by testing a selection of properties Sample – the selected properties Population - the whole area (d) people in shopping centre are asked their opinions … to determine the level of awareness in the community Sample – the people asked Population – the community
Variables characteristic being measured category - result of measurement is a “word”, e.g. yes (or no), truck, bird, sparrow, first (or second) etc numerical - measurement produces number could be limited to certain values (e.g. whole numbers) any value (e.g. mass of an object) Exercise 1.3 lead levels in fallout types of birds observed numbers of birds observed in different locations numerical – any value category numerical –set values
Presenting & organising data large quantities of raw data are not useful for presenting the results of the tests they need to organised to show the results in a smaller scale tables graphs averages comparisons
Tabulating data organising it so that it can be evaluated more easily generally some sort of table category data is most usually grouped (tallied) the number of times each different category occurs is the recorded result can also be used where the data is numerical only with fixed and pre-known values a large number of data points numerical (all values) data presents problem must be grouped into ranges information is lost, e.g. 0.1 and 4.9 both fit into 0-5 range
Grouping numerical data identify the minimum and maximum values decide how many groups are appropriate for the size of the dataset determine the groups (which should be equivalent ranges – for example, 0-5,6-10 etc, but not 0-5, 6-20) Class Exercise 1.4 You have a data set of 100 pH measurements of river water, ranging from 5 to 9. What would be an appropriate way of grouping them? 8 ranges of 0.5 e.g. 5.0-5.49, 5.50-5.99 etc
Frequencies number of times a particular value or range occurs is the frequency spread of data across the range of values is the distribution Is it evenly spread across the groups? Do certain groups have higher frequencies? Is there any pattern? frequency should considered in relation to total number of data values relative frequency – the proportion (often as a percentage) of the frequency of the total dataset
Excel & tally charts manually tallying - how many occurrences of each value – of large data sets is boring, tiring and potentially inaccurate Excel has some functions which help: COUNT(cell range) COUNTIF(range , criterion) FREQUENCY (range , group) – probably more trouble than it’s worth
COUNT ( ) returns the total number of cells with numerical data ignores blank cells and non-numerical values A B 1 10 2 3 7 4 5 n/a 6 *** 8 9 =count(A1:A9)
COUNTIF ( ) returns the number of cells meeting a given criteria criteria include =, > or < A B 1 10 2 3 7 4 5 n/a 6 *** 8 9 =countif(A1:A9,”>5”)
FREQUENCY(,) tally data into user-chosen groups entered as an array formula highlight a group of cells where you want the frequencies to appear type in the formula and then hit the key combination CTRL+SHIFT+ENTER A B 1 10 2 3 4 5 n/a 6 *** 7 8 9 values for groups 0-5, 6-10 =frequency(A1:A9,B6:B7)
Two-way frequency tables One sample set – two variables Sex of koala General state of health Male Female Healthy 45 28 Ill 21 9 Two sample sets – one variable Type of parkland Origin of plant Urban Undeveloped Native 37% 65 Introduced 51 20 Not identified 12 15
The typical value represents all the data values with one or two average – some way of representing the “most common” value variation – how much spread there is in data set category variables – class with highest frequency (mode) variation cannot be measured numerical variables mean – what we normally refer to as average mode – most common value (used in grouped data) median – the value in the middle when arranged in order range – highest – lowest standard deviation – calculation of difference of all points from mean mean & std dev normally used in scientific data
Assignment 1 large amount of data simple formulas required all questions and directions contained in Excel spreadsheet