Question 4 What are data and what do they mean to a scientist?
Dinner at the Urquhart House 4 Brought to you by the Briggs Multiracial Alliance 4 Sunday night 4 All food provided (probably Chinese) 4 Contact Mimi Reddy, for
Data, Statistics, and Spreadsheets 4 What are data? 4 What are statistics? 4 What are spreadsheets? 4 How can you analyze data with spreadsheets?
Data 4 Data are pieces of information 4 Data can be numbers, words, descriptions 4 Data have UNITS 4 The word data is PLURAL, datum is singular 4 Data about Willoughby: Age: 5 (years) Height: 47 (inches) Weight: 66 (pounds) Eyes: Blue Favorite word: Wrestle Favorite letter: W
Types of Data 4 Numbers – two types –Real #s – rational numbers – lbs –Integers – whole numbers – 18 months 4 Letters – called characters in programming –W is a character 4 Words – called strings in programming –“No thanks” is a strings, can be individual words or phrases
Statistics and Data 4 Test Scores: –Jeff: 88 –Mollie: 92 –Marcie: 88 –Dave: 47 –Karim: 99 –Willoughby: 42 –Benjamin: 0 4 What statistics can you calculate to describe these data? –Try to think of four things to describe the data 4 stop
Statistics 4 Statistics are derived from the data 4 Statistics are descriptions of data 4 Statistics are meant to simplify the data 4 Statistics can be misleading
Typical Statistics 4 Sample Size - number of individuals measured = n Sum = Average or Mean = /n 4 Median –Value of 50th percentile, half of values fall above, half below 4 Maximum, Minimum, Range (Max-Min) 4 Mode - most common value 4 Standard deviation 4 Variance (SD 2 )
Analyze these data... 4 Mean, max, min, range, median, mode sample size (n) Sum mean=average= /n denoted x 4 median = halfway 4 mode = most common
Spreadsheets 4 Spreadsheets are tables 4 Spreadsheets allow calculations and manipulations of data Calculations: mean, standard deviation Manipulations: sort,
Make a data table: 4 Fly 1, length 13.4 mm, velocity 27 Kph, age 21 days 4 Fly 2, length 9.4 mm, velocity 0 Kph, age 220 days 4 Fly 3, length 9.3 mm, velocity 44 Kph, age 1 days 4 Fly 4, length 13.4 mm, velocity 17 Kph, age 32 days 4 Fly 5, length 17.4 mm, velocity 33 Kph, age 11 days 4 How many columns? 4 How many rows? 4 #s go down or across?
Data Table Fly #LengthVelocityAge
Microsoft Excel 4 Typical spreadsheet program –Lotus is original commercial spreadsheet 4 Has similar controls to MS Word 4 Now allows graphing (charts) very restricted formats, hard to get exactly what you want 4 Excel tables and graphs can be copied into MS Word
Friday’s Assignment 4 We will work with Microsoft Excel to analyze some data 4 Groups of two will submit one finished spreadsheet for the assignment
Graphs 4 Many different types of graphs –Points –Lines –Bars –Pies
Point Graphs 4 Called X-Y Scatter in MS Excel 4 Plot points based on X and Y value 4 Can fit a “REGRESSION LINE” to the data –Line that best fits the data
X-Y Scatter
Bar Graphs 4 Categorize data into counts or percents 4 Categories can be descriptive categories (Windows 98, Windows 2000, …) 4 Can also be numeric categories –Height: 60-63, 63-66, etc. or just 61, 62, 63… –Count up number of people in each group 4 Histograms are a particular type of bar graph
Bar Graph
Histogram 4 X axis is categories 4 Y axis is a number or proportion of observations in that category
Histogram Bar Graph Number of Crashes
Regular Bar Graph vs. Histogram Bar Graph
Distributions 4 Special type of histogram with continuous numeric scale at bottom 4 Normal distribution is a key concept in statistics 4 Skewed distribution is one that is unbalanced
Sample distribution histograms Danyoungyoo, Katanchalee, and Srichawla, Robert D. Duval, PS 400 Lecture,
The NORMAL Distribution 4 A NORMAL DISTRIBUTION is the theoretical distribution of values given natural variation around a MEAN 4 It is balanced, humped distribution
Distributions 4 Skew is an imbalance in the distribution Danyoungyoo, Katanchalee, and Srichawla,
Hypothesis Testing 4 Statistical Tests are how scientists decide if data support their hypothesis 4 (NOT PROVE their hypothesis) 4 Four major statistical tests: T-test, X2 Test, Regression, ANOVA
Hypothesis 4 Processor speed has an effect on the performance of the computer. 4 Null Hypothesis –H 0 : Processor speed has NO EFFECT on the performance of a computer.
Statistical Tests and Probability 4 Statistical tests give a value 4 That value can be related to a probability 4 Probability is likelihood that NULL hypothesis is correct given the data you have 4 If P < 0.05 (1/20), then you conclude NULL hypothesis is FALSE
T-Test 4 Compares differences between two means 4 Formula: T = (x 1 -x 2 )/SEM –SEM is Standard Error of Mean [SD/(N-1)] 4 T Values: Difference between mean in comparison to the amount of spread in your data
T-Values 4 If T > 2.5 or 3.0, difference is usually significant (this depends on your sample sizes)