Data Analysis and Statistical Software I Quarter: Winter 02/03

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Chapter 2 Exploring Data with Graphs and Numerical Summaries
Chapter 5: Exploring Data: Distributions Lesson Plan Exploring Data Displaying Distributions: Histograms Interpreting Histograms Displaying Distributions:
Experimental Statistics I.  We use data to answer research questions  What evidence does data provide?  How do I make sense of these numbers without.
CHAPTER 1: Picturing Distributions with Graphs
Business Statistics for Managerial Decision Making
Chapter 1 Exploring Data
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
Chapter 1 – Exploring Data YMS Displaying Distributions with Graphs xii-7.
Statistics 3502/6304 Prof. Eric A. Suess Chapter 3.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.
Chapter 5: Exploring Data: Distributions Lesson Plan Exploring Data Displaying Distributions: Histograms Interpreting Histograms Displaying Distributions:
BPS - 5th Ed. Chapter 11 Picturing Distributions with Graphs.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Essential Statistics Chapter 11 Picturing Distributions with Graphs.
CHAPTER 1 Picturing Distributions with Graphs BPS - 5TH ED. CHAPTER 1 1.
Chapter 5: Exploring Data: Distributions Lesson Plan Exploring Data Displaying Distributions: Histograms Interpreting Histograms Displaying Distributions:
1 Never let time idle away aimlessly.. 2 Chapters 1, 2: Turning Data into Information Types of data Displaying distributions Describing distributions.
1 Take a challenge with time; never let time idles away aimlessly.
Statistics - is the science of collecting, organizing, and interpreting numerical facts we call data. Individuals – objects described by a set of data.
Picturing Distributions with Graphs BPS - 5th Ed. 1 Chapter 1.
Class Two Before Class Two Chapter 8: 34, 36, 38, 44, 46 Chapter 9: 28, 48 Chapter 10: 32, 36 Read Chapters 1 & 2 For Class Three: Chapter 1: 24, 30, 32,
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Exploratory Data Analysis
UNIT ONE REVIEW Exploring Data.
Prof. Eric A. Suess Chapter 3
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Chapter 1.1 Displaying Distributions with graphs.
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Warm Up.
Statistical Reasoning
Laugh, and the world laughs with you. Weep and you weep alone
CHAPTER 1: Picturing Distributions with Graphs
Chapter 5: Exploring Data: Distributions Lesson Plan
recap Individuals Variables (two types) Distribution
CHAPTER 1: Picturing Distributions with Graphs
DAY 3 Sections 1.2 and 1.3.
Describing Distributions of Data
Describing Distributions with Numbers
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Drill {A, B, B, C, C, E, C, C, C, B, A, A, E, E, D, D, A, B, B, C}
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
Daniela Stan, PhD School of CTI, DePaul University
Data Analysis and Statistical Software I Quarter: Spring 2003
Good Morning AP Stat! Day #2
Basic Practice of Statistics - 3rd Edition
Introduction to the Practice of Statistics
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
CHAPTER 1: Picturing Distributions with Graphs
Essential Statistics Describing Distributions with Numbers
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 5: Exploring Data: Distributions Lesson Plan
Chapter 1: Exploring Data
Displaying Distributions with Graphs
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Data Analysis and Statistical Software I Quarter: Winter 02/03 Daniela Stan Raicu School of CTI, DePaul University 11/24/2018 Daniela Stan - CSC323

Outline Introduction Individuals and Variables Exploratory Data Analysis Describing Distributions with Graphs Describing Distributions with Numbers 11/24/2018 Daniela Stan - CSC323

Introduction Data: - numbers, measurements, facts Where is the data coming from? - medical field - automotive industry - stock market & investment - census bureau - customer profiling; examples Statistics: - the science of collecting, organizing, and interpreting data; the goal is to gain understanding from data. 11/24/2018 Daniela Stan - CSC323

Individuals and Variables Individuals: - are the objects described by a set of data individuals ~ cases ~ records Variable: - any characteristic of an individual; it can take different values for different individuals. variable ~ attribute Observation ~ value of a variable Categorical Variables Types of Variables Quantitative Variables 11/24/2018 Daniela Stan - CSC323

Individuals and Variables (cont.) Example of individuals and variables Name Age Gender Marital Status John Smith 25 Male Single Joe Doe 32 Married Phillip Roberts 21 Sarah Lazar 26 Female The distribution of a variable gives: what values the variable takes; how often it takes these values: count percent or fraction 11/24/2018 Daniela Stan - CSC323

How We Describe The Variables? Exploratory Data Analysis Single variable Two or more variables Categorical variable Numerical variable Scatterplots Correlation Regression Stemplots Histograms Five number summary Standard deviation Bar graphs Pie charts 11/24/2018 Daniela Stan - CSC323

Bar Graphs and Pie Charts The distribution of the highest level of education for people aged 25 to 34 years: Education Count (millions) Percent Less than high school 4.7 12.3 High school graduate 11.8 30.7 Some College 10.9 28.3 Bachelor’s degree 8.5 22.1 Advanced degree 2.5 6.6 11/24/2018 Daniela Stan - CSC323

Bar Graphs and Pie Charts (cont.) Pareto chart Pie charts require that you include all categories that make up a whole 11/24/2018 Daniela Stan - CSC323

Stemplots Stemplot ~ stem-and-leaf plot To make a stemplot: Separate each observation into a stem consisting of all but the final (rightmost) digit and a leaf, the final digit. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. Write each leaf in the row to the right of its stem, in increasing order out from the stem. 11/24/2018 Daniela Stan - CSC323

Stemplots (cont.) Example: Problem 1.24 (page 29) Back-to-back stemplot How stemplots deal with large data sets? Splitting stems: One stem with leaves between 0 and 4 One stem with leaves between 5 and 9 How stemplots deal with observations with having many digits? Rounding 11/24/2018 Daniela Stan - CSC323

Stemplots (cont.) Advantages of stemplots: Describe the shape of a distribution for small numbers Disadvantages: Don’t work well with large data sets since they display the values of the variables Divide the observations into groups (stems) determined by the number system rather than by judgment 11/24/2018 Daniela Stan - CSC323

Histograms A histogram breaks the range of values of a variable into intervals and displays the count or percent of the observations that fall into each interval. Count ~ frequency Percent ~ relative frequency Example: Problem 1.34 (page 34) Disadvantages: How many intervals? What width for the histogram intervals? The original data cannot be recovered 11/24/2018 Daniela Stan - CSC323

Example: Weight Data 11/24/2018 Daniela Stan - CSC323

Weight Data: Stemplot (Stem & Leaf) 10 0166 11 009 12 0034578 13 00359 14 08 15 00257 16 555 17 000255 18 000055567 19 245 20 3 21 025 22 0 23 24 25 26 0 Weight Data: Stemplot (Stem & Leaf) Key 20|3 means 203 pounds Stems = 10’s Leaves = 1’s 11/24/2018 Daniela Stan - CSC323

Weight Data: Frequency Table * Left endpoint is included in the group, right endpoint is not. 11/24/2018 Daniela Stan - CSC323

Weight Data: Histogram 100 120 140 160 180 200 220 240 260 280 Weight * Left endpoint is included in the group, right endpoint is not. 11/24/2018 Daniela Stan - CSC323

Examining Distributions In any graph of data, look for the Overall pattern of a distribution described by: - Center ~ midpoint Spread ~ range between the smallest and largest value ~ variability Shape: 1. Symmetric or skewed 2. Unimodal (one major peak/mode) or multimodal Deviations from the pattern Outliers Example 11/24/2018 Daniela Stan - CSC323

Examining Distributions (cont.) The histogram of all 947 seventh grade students in Gary, Indiana, on the vocabulary part of the Iowa test. Shape? - symmetric - unimodal 11/24/2018 Daniela Stan - CSC323

Symmetric Histograms Bell-Shaped 11/24/2018 Daniela Stan - CSC323

Symmetric Histograms Mound-Shaped 11/24/2018 Daniela Stan - CSC323

Asymmetric Histograms Skewed to the Left 11/24/2018 Daniela Stan - CSC323

Asymmetric Histograms Skewed to the Right 11/24/2018 Daniela Stan - CSC323

Outliers An outlier is an individual value that falls outside the overall pattern Outlier ~ extreme observation How to deal with outliers? Sources: - outliers from equipment failure - errors in recording data - extraordinary occurrence Applications 11/24/2018 Daniela Stan - CSC323

Time Plots A time plot of a variable plots each observation against the time at which was measured. 11/24/2018 Daniela Stan - CSC323

Time Plots (cont.) Time series: data sets produced by measurements of a variable taken at regular intervals over time. Time plots can reveal the main features of a time series such as: Seasonal variation: a pattern that repeats itself as known regular intervals of time A trend: a persistent, long-term rise or fall 11/24/2018 Daniela Stan - CSC323

Describing Distributions with numbers Measuring center: the mean mean ~ average value If the n observations are x1, x2,…, xn, their mean is or, in more compact notation 11/24/2018 Daniela Stan - CSC323

Describing Distributions (cont.) Measuring center: the median M is the number such that half the observations are smaller and the other half are larger; median ~ middle value To find the median M of a distribution, follow the steps: 1. Arrange all n observations in order of size, from smallest to largest. 2. If n is odd, M is the center observation in the ordered list; the location is (n+1)/2 from the bottom of the list. 3. If n is even, M is the mean of the two center observations in the ordered list; the location is again (n+1)/2 from the bottom of the list. 11/24/2018 Daniela Stan - CSC323

Describing Distributions (cont.) Example 1.13 (textbook, page 40); Data: Fuel economy (miles per gallon) for 2001 model two-seater cars 11/24/2018 Daniela Stan - CSC323

Describing Distributions (cont.) Calculate median: 1. Arrange the data in increasing order: 13 13 16 19 21 21 23 23 24 26 26 27 27 27 28 28 30 30 68 2. Find the location of the median: (n+1)/2=(19+1)/2=10 13 13 16 19 21 21 23 23 24 26 26 27 27 27 28 28 30 30 68 The 10th position 11/24/2018 Daniela Stan - CSC323

Describing Distributions (cont.) How the median changes if we remove the last observation in the sorted list? How the median changes if the value of last observation is changed to 680? Calculate the mean: How the mean changes if we remove the outlier? How the mean changes if the value of last observation is changed to 680? 11/24/2018 Daniela Stan - CSC323

Describing Distributions (cont.) Mean versus Median 1. The mean is sensitive to the influence of extreme observations/outliers, or skewed distributions. 2. A resistant measure of any aspect of a distribution is relatively unaffected by changes in the numerical value of a small proportion of the total number of observations, no matter how large these changes are. 3. The mean is no a resistant measure of the center. 4. The median is a resistant measure of the center. 11/24/2018 Daniela Stan - CSC323

11/24/2018 Daniela Stan - CSC323

Median versus Average A recent newspaper article in California said that the median price of single-family homes sold in the past year in the local area was $136,000 and the average price was $149,160. How do you think these values are computed? Which do you think is more useful to someone considering the purchase of a home, the median or the average? From Seeing Through Statistics, 2nd Edition by Jessica M. Utts. 11/24/2018 Daniela Stan - CSC323

11/24/2018 Daniela Stan - CSC323

Describing Distributions (cont.) Measuring spread: the quartiles The pth percentile of a distribution is the value such that p percent of the observations fall at or below it. The 50th percentile = median, M The 25th percentile = first quartile, Q1 The 75th percentile = third quartile, Q3 11/24/2018 Daniela Stan - CSC323

Describing Distributions (cont.) To calculate the quartiles: 1. Arrange the observations in increasing order and locate the median M in the list of observations. 2. The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median. 3. The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median. Example: 1.13 13 16 19 21 21 23 23 24 26 26 27 27 27 28 28 30 30 M=?, Q1=?, Q3=? 11/24/2018 Daniela Stan - CSC323

Describing Distributions (cont.) The Five-Number Summary of a set of observations consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from the smallest to the largest. In symbols, the five number summary is Minimum Q1 M Q3 Maximum A boxplot is a graph of the five-number summary: A central box spans the quartiles Q1 and Q3 A line in the box marks the median M Lines extend from the box out to the smallest and largest observations 11/24/2018 Daniela Stan - CSC323

Describing Distributions (cont.) Example: Numerical Description of shopping data using SPSS 11/24/2018 Daniela Stan - CSC323

Recommended Problems Chapter 1: Section 1.1 IPS web site: http://www.whfreeman.com/ips4e 11/24/2018 Daniela Stan - CSC323