LSP 121 Intro to Statistics and SPSS. Statistics One of many definitions: The mathematics of collecting and analyzing data to draw conclusions and make.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Dot Plots & Box Plots Analyze Data.
Measures of Central Tendency and Variation 11-5
Percentiles and the Normal Curve
Chapter 3 Describing Data Using Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Copyright © 2014 Pearson Education. All rights reserved Measures of Variation LEARNING GOAL Understand and interpret these common measures of.
The goal of data analysis is to gain information from the data. Exploratory data analysis: set of methods to display and summarize the data. Data on just.
LSP 121 Week 2 Intro to Statistics and SPSS/PASW.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
A Simple Guide to Using SPSS© for Windows
Summarizing and Displaying Measurement Data. Thought Question 1 If you were to read the results of a study showing that daily use of a certain exercise.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
1. Statistics 2. Frequency Table 3. Graphical Representations  Bar Chart, Pie Chart, and Histogram 4. Median and Quartiles 5. Box Plots 6. Interquartile.
Statistics: Use Graphs to Show Data Box Plots.
Intro to SPSS Kin 260 Jackie Kiwata. Overview Intro to SPSS Defining Variables Entering Data Analyzing Data SPSS Output Analyzing Data Max, Min, Range.
Describing Data: Numerical
Statistics Used In Special Education
Objective To understand measures of central tendency and use them to analyze data.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
SW388R6 Data Analysis and Computers I Slide 1 Central Tendency and Variability Sample Homework Problem Solving the Problem with SPSS Logic for Central.
Intro to Statistics and SPSS. Mean (average) Median – the middle score (even number of scores or odd number of scores) Percent Rank (percentile) – calculates.
7.7 Statistics & Statistical Graphs p.445. What are measures of central tendency? How do you tell measures of central tendency apart? What is standard.
Stat 1510: Statistical Thinking and Concepts 1 Density Curves and Normal Distribution.
Copyright © 2010 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Lecture 3 Describing Data Using Numerical Measures.
The Standard Deviation as a Ruler and the Normal Model
What is SPSS  SPSS is a program software used for statistical analysis.  Statistical Package for Social Sciences.
The Central Tendency is the center of the distribution of a data set. You can think of this value as where the middle of a distribution lies. Measure.
1 Elementary Statistics Larson Farber Descriptive Statistics Chapter 2.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
Chapter 5: Exploring Data: Distributions Lesson Plan Exploring Data Displaying Distributions: Histograms Interpreting Histograms Displaying Distributions:
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Thursday August 29, 2013 The Z Transformation. Today: Z-Scores First--Upper and lower real limits: Boundaries of intervals for scores that are represented.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
UNIT #1 CHAPTERS BY JEREMY GREEN, ADAM PAQUETTEY, AND MATT STAUB.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Descriptive Statistics – Measures of Relative Position.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
Chapter 14 Statistics and Data Analysis. Data Analysis Chart Types Frequency Distribution.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
Unit 3 Guided Notes. Box and Whiskers 5 Number Summary Provides a numerical Summary of a set of data The first quartile (Q 1 ) is the median of the data.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
ISP 121 Week 4 Intro to Statistics. Descriptive Statistics Average, or mean Median – the middle score Percent Rank – calculates the position of a datapoint.
Descriptive Statistics ( )
Measures of Central Tendency and Variation
Business and Economics 6th Edition
Chapter 3 Describing Data Using Numerical Measures
NUMERICAL DESCRIPTIVE MEASURES
Chapter 3 Describing Data Using Numerical Measures
Density Curves and Normal Distribution
Topic 5: Exploring Quantitative data
Describing Distributions with Numbers
Drill {A, B, B, C, C, E, C, C, C, B, A, A, E, E, D, D, A, B, B, C}
pencil, red pen, highlighter, GP notebook, graphing calculator
Measures of Position Section 3.3.
Summary (Week 1) Categorical vs. Quantitative Variables
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
pencil, red pen, highlighter, GP notebook, graphing calculator
Review of 6th grade material to help with new Statistics unit
Business and Economics 7th Edition
Presentation transcript:

LSP 121 Intro to Statistics and SPSS

Statistics One of many definitions: The mathematics of collecting and analyzing data to draw conclusions and make predictions. It involves looking at quantified data and determining if there are any patterns. Patterns, if they exist, help you predict.

Descriptive Statistics (some of these are used as predictors) Mean - average Median – the middle score Percent Rank – calculates the position of a datapoint in a data set. More precisely, tells you approximately what percent of the data is less than the datapoint. Range – difference between the maximum and minimum values in the data set

The mean or the median? Advantages of the median are: · If one of the extreme values changes, then the median remains unaltered. Whereas the mean would be affected hugely. · If a set of numbers has a lop-sided pattern – if for example, most of the scores are small, several medium sized, but only one or two high – then the median may again be more appropriate than the mean, as its value will be close to the majority of numbers

Descriptive Statistics Lower quartile – or first quartile, it is the median of the data values in the lower half of a data set Middle quartile – or second quartile, this is the overall median Upper quartile – or third quartile, it is the median of the data values in the upper half of a data set Quartiles may help in seeing the variation in a data set

Quartiles For example (bank waiting times): Big Bank: Best Bank: lower quartilemedianupper quartile Big Bank range: 11.0 – 4.1 = 6.9 Best Bank range: 7.8 – 6.6 = 1.2

Descriptive Statistics The Five Number Summary consists of: – The minimum value – The lower quartile (first quartile) – The median (second quartile) – The upper quartile (third quartile) – The maximum value In SPSS, first quartile is 25 th percentile, second quartile is 50 th percentile, and third quartile is 75 th percentile

Standard Deviation Quartiles are OK for characterizing data, but standard deviation is preferred by statisticians It is a measure of how far data values are spread around the mean of a data set Don’t calculate by hand, use SPSS

Standard Deviation A simple way to estimate standard deviation is the range estimate rule Divide range by 4 Watch for outliers. These are too high or too low values. If a value is more than 2*std above or below the mean, it could possibly be an outlier. Calculate: mean + 2*STD and mean – 2*STD

Look for outliers, how? Find the mean Find the standard deviation high = mean + 2 * STD low = mean – 2 * STD e.g., mean = 124, STD = 32, then high = mean + 2*32 = = 188 low = mean – 2*32 = 124 – 64 = 60 look for values >188 and values <60

Estimate Standard Deviation Go back to Big Bank / Best Bank example – Big Bank: range = 6.9 – 6.9 / 4 = 1.7 – Actual standard deviation is 1.96 Best Bank: range = 1.2 – 1.2 / 4 = 0.3 – Actual standard deviation is 0.44

Normal ‘bell curve’ numbers, from -4 to 4, represent the standard deviations units

normal curve with std

region of bell curve: +/- 1 std (2 * % = 68.3%)

region of bell curve: +/- 2 std (2*13.59%+2*34.13% = 95.4%)

red: 2 std from the mean

Histograms Nice way to view a data set A histogram is a chart similar to a dotplot created by defining a set of bins and counting how many data points lie in each bin. Bars are drawn with height proportional to the number of data points in each bin.

Example Histogram

Statistics and SPSS While Excel can do some basic statistics, it is not considered a serious statistics tool You really should use something like SPSS (statistical package for social sciences) We will be using SPSS since DePaul has a site license for this application

Try this example Download the dataset Grades.xls from the QRC website (under older data) and start SPSS Import the Excel data into SPSS Change the variable names and set data to numeric (not text) Click on Analyze -> Descriptive Statistics -> Frequencies

Example continued When importing data, if the numeric fields show as ‘$’, ‘%’, or ‘#’, then PASW will have difficulty converting to numeric In most cases, SPSS will briefly display dollar signs indicating that conversion is taking place.

Example continued Using the grades for Exam 2, find the – 5 number summary (minimum, 1 st quartile, median, 3 rd quartile, maximum) – mean – range, and – standard deviation

SPSS results

Some interesting tools Random coin flipper simulation of rolling pairs of dice rk/javascript/dice2rol.htm rk/javascript/dice2rol.htm check for bell curve with dice ngscience/flash/sumdice.html ngscience/flash/sumdice.html

Pivot Tables/Crosstabs Next topic pivot tables and crosstabs

Pivot Tables Suppose you have just performed a survey. One of the questions you ask is, what type of home computer connection do you have? Answers can be: none, dial-up, dsl, cable, other, not sure.

Pivot Tables Here are some of your results Respondent IDCable Type no ds cm dk du du Where no = none; ds = dsl; cm = cable modem; du = dial up; dk = don’t know; ot = other

Frequency Tables SPSS can be used to count the occurences of data, similar to pivot table in Excel Enter or import data into SPSS Use Analyze -> Descriptive Statistics -> Frequencies Select variables, move from left box to the right. Uncheck Display Frequencies Table Run it

Crosstabulations (Crosstabs) Crosstabs are an extension of pivot tables Suppose you have asked a number of students: How many schools did you apply to? You get results something like the following (in a spreadsheet):

Crosstabs Respondent IDSexNumber of Schools 1 F3 2 M3 3 F4 4 F1 5 M2 6 M5 7 F4 8 F2 9 F3 10 M5 11 M6 download this from D2L, course practice files

Crosstabs Now open the data in SPSS (import survey1.xls from class D2L) Then pull down the menu Analyze and click on Descriptive Statistics, then Crosstabs What variable do you want in the row? The column? When ready, click OK to perform the crosstab

Crosstabs in Access You can also perform cross-tabulations using an Access (Microsoft database app) You need to create a crosstab query* In the Show Table dialog box, click the tab that lists the table whose data you want to work with. *query is a tool for extracting information from your database

Crosstabs in Access Add the fields to the Field row in the design grid. Note: Since we want to perform a crosstab query on ‘Sex’ and ‘Number of Schools’, bring the field ‘Sex’ down once and ‘Number of Schools’ down twice.

Crosstabs Click on the Query drop down menu and select Crosstab Query. Now, under Crosstab under the Sex column, click on Column Heading. Under the first Number Schools Crosstab, click on Row Heading. Under the second Number Schools Crosstab, click on Value. On this second Number Schools column, click on Group By and select Count. Run the Query