Daniela Stan, PhD School of CTI, DePaul University

Slides:



Advertisements
Similar presentations
Chapter 5: Exploring Data: Distributions Lesson Plan Exploring Data Displaying Distributions: Histograms Interpreting Histograms Displaying Distributions:
Advertisements

Experimental Statistics I.  We use data to answer research questions  What evidence does data provide?  How do I make sense of these numbers without.
CHAPTER 1: Picturing Distributions with Graphs
Business Statistics for Managerial Decision Making
Objectives (BPS chapter 1)
Chapter 1 Exploring Data
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
Chapter 1 – Exploring Data YMS Displaying Distributions with Graphs xii-7.
Statistics 3502/6304 Prof. Eric A. Suess Chapter 3.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.
EXPLORING DATA LESSON 1 – 1 Day 2 Displaying Distributions with Graphs Displaying quantitative variables.
Ch. 1 Looking at Data – Distributions Displaying Distributions with Graphs Section 1.1 IPS © 2006 W.H. Freeman and Company.
Chapter 5: Exploring Data: Distributions Lesson Plan Exploring Data Displaying Distributions: Histograms Interpreting Histograms Displaying Distributions:
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
CHAPTER 1 Picturing Distributions with Graphs BPS - 5TH ED. CHAPTER 1 1.
Chapter 5: Exploring Data: Distributions Lesson Plan Exploring Data Displaying Distributions: Histograms Interpreting Histograms Displaying Distributions:
1 Never let time idle away aimlessly.. 2 Chapters 1, 2: Turning Data into Information Types of data Displaying distributions Describing distributions.
1 Take a challenge with time; never let time idles away aimlessly.
Statistics - is the science of collecting, organizing, and interpreting numerical facts we call data. Individuals – objects described by a set of data.
Class Two Before Class Two Chapter 8: 34, 36, 38, 44, 46 Chapter 9: 28, 48 Chapter 10: 32, 36 Read Chapters 1 & 2 For Class Three: Chapter 1: 24, 30, 32,
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
UNIT ONE REVIEW Exploring Data.
Prof. Eric A. Suess Chapter 3
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Chapter 1.1 Displaying Distributions with graphs.
CHAPTER 1 Exploring Data
Describing Distributions Numerically
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Warm Up.
Statistical Reasoning
Laugh, and the world laughs with you. Weep and you weep alone
CHAPTER 1: Picturing Distributions with Graphs
Chapter 5: Exploring Data: Distributions Lesson Plan
CHAPTER 1 Exploring Data
recap Individuals Variables (two types) Distribution
CHAPTER 1: Picturing Distributions with Graphs
DAY 3 Sections 1.2 and 1.3.
Data Analysis and Statistical Software I Quarter: Winter 02/03
Describing Distributions of Data
Describing Distributions with Numbers
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Drill {A, B, B, C, C, E, C, C, C, B, A, A, E, E, D, D, A, B, B, C}
Warmup Draw a stemplot Describe the distribution (SOCS)
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
Data Analysis and Statistical Software I Quarter: Spring 2003
Good Morning AP Stat! Day #2
Introduction to the Practice of Statistics
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
CHAPTER 1: Picturing Distributions with Graphs
Honors Statistics Review Chapters 4 - 5
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 5: Exploring Data: Distributions Lesson Plan
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Daniela Stan, PhD School of CTI, DePaul University Data Analysis and Statistical Software I (323-21-403) Quarter: Autumn 02/03 Daniela Stan, PhD School of CTI, DePaul University 1/18/2019 Daniela Stan - CSC323

Outline Introduction Individuals and Variables Exploratory Data Analysis Describing Distributions with Graphs Describing Distributions with Numbers 1/18/2019 Daniela Stan - CSC323

Introduction Data: - numbers, measurements, facts Where is the data coming from? - medical field - automotive industry - stock market & investment - census bureau - customer profiling; examples Statistics: - the science of collecting, organizing, and interpreting data; the goal is to gain understanding from data. 1/18/2019 Daniela Stan - CSC323

Individuals and Variables Individuals: - are the objects described by a set of data individuals ~ cases ~ records Variable: - any characteristic of an individual; it can take different values for different individuals. variable ~ attribute Observation ~ value of a variable Categorical Variables Types of Variables Quantitative Variables 1/18/2019 Daniela Stan - CSC323

Individuals and Variables (cont.) Example of individuals and variables Name Age Gender Marital Status John Smith 25 Male Single Joe Doe 32 Married Phillip Roberts 21 Sarah Lazar 26 Female The distribution of a variable gives: what values the variable takes; how often it takes these values: count percent or fraction 1/18/2019 Daniela Stan - CSC323

How We Describe The Variables? Exploratory Data Analysis Single variable Two or more variables Categorical variable Numerical variable Scatterplots Correlation Regression Stemplots Histograms Five number summary Standard deviation Bar graphs Pie charts 1/18/2019 Daniela Stan - CSC323

Bar Graphs and Pie Charts The distribution of the highest level of education for people aged 25 to 34 years: Education Count (millions) Percent Less than high school 4.7 12.3 High school graduate 11.8 30.7 Some College 10.9 28.3 Bachelor’s degree 8.5 22.1 Advanced degree 2.5 6.6 1/18/2019 Daniela Stan - CSC323

Bar Graphs and Pie Charts (cont.) Pareto chart Pie charts require that you include all categories that make up a whole 1/18/2019 Daniela Stan - CSC323

Stemplots Stemplot ~ stem-and-leaf plot To make a stemplot: Separate each observation into a stem consisting of all but the final (rightmost) digit and a leaf, the final digit. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. Write each leaf in the row to the right of its stem, in increasing order out from the stem. 1/18/2019 Daniela Stan - CSC323

Stemplots (cont.) Example: Problem 1.24 (page 29) Back-to-back stemplot How stemplots deal with large data sets? Splitting stems: One stem with leaves between 0 and 4 One stem with leaves between 5 and 9 How stemplots deal with observations with having many digits? Rounding 1/18/2019 Daniela Stan - CSC323

Stemplots (cont.) Advantages of stemplots: Describe the shape of a distribution for small numbers Disadvantages: Don’t work well with large data sets since they display the values of the variables Divide the observations into groups (stems) determined by the number system rather than by judgment 1/18/2019 Daniela Stan - CSC323

Histograms A histogram breaks the range of values of a variable into intervals and displays the count or percent of the observations that fall into each interval. Count ~ frequency Percent ~ relative frequency Example: Problem 1.34 (page 34) Disadvantages: How many intervals? What width for the histogram intervals? The original data cannot be recovered 1/18/2019 Daniela Stan - CSC323

Examining Distributions In any graph of data, look for the Overall pattern of a distribution described by: - Center ~ midpoint Spread ~ range between the smallest and largest value ~ variability Shape: 1. Symmetric or skewed 2. Unimodal (one major peak/mode) or multimodal Deviations from the pattern Outliers Example 1/18/2019 Daniela Stan - CSC323

Examining Distributions (cont.) The histogram of all 947 seventh grade students in Gary, Indiana, on the vocabulary part of the Iowa test. Shape? - symmetric - unimodal 1/18/2019 Daniela Stan - CSC323

Outliers An outlier is an individual value that falls outside the overall pattern Outlier ~ extreme observation How to deal with outliers? Sources: - outliers from equipment failure - errors in recording data - extraordinary occurrence Applications 1/18/2019 Daniela Stan - CSC323

Time Plots A time plot of a variable plots each observation against the time at which was measured. 1/18/2019 Daniela Stan - CSC323

Time Plots (cont.) Time series: data sets produced by measurements of a variable taken at regular intervals over time. Time plots can reveal the main features of a time series such as: Seasonal variation: a pattern that repeats itself as known regular intervals of time A trend: a persistent, long-term rise or fall 1/18/2019 Daniela Stan - CSC323

Describing Distributions with numbers Measuring center: the mean mean ~ average value If the n observations are x1, x2,…, xn, their mean is or, in more compact notation 1/18/2019 Daniela Stan - CSC323

Describing Distributions (cont.) Measuring center: the median M is the number such that half the observations are smaller and the other half are larger; median ~ middle value To find the median M of a distribution, follow the steps: 1. Arrange all n observations in order of size, from smallest to largest. 2. If n is odd, M is the center observation in the ordered list; the location is (n+1)/2 from the bottom of the list. 3. If n is even, M is the mean of the two center observations in the ordered list; the location is again (n+1)/2 from the bottom of the list. 1/18/2019 Daniela Stan - CSC323

Describing Distributions (cont.) Example 1.13 (textbook, page 40); Data: Fuel economy (miles per gallon) for 2001 model two-seater cars 1/18/2019 Daniela Stan - CSC323

Describing Distributions (cont.) Calculate median: 1. Arrange the data in increasing order: 13 13 16 19 21 21 23 23 24 26 26 27 27 27 28 28 30 30 68 2. Find the location of the median: (n+1)/2=(19+1)/2=10 13 13 16 19 21 21 23 23 24 26 26 27 27 27 28 28 30 30 68 The 10th position 1/18/2019 Daniela Stan - CSC323

Describing Distributions (cont.) How the median changes if we remove the last observation in the sorted list? How the median changes if the value of last observation is changed to 680? Calculate the mean: How the mean changes if we remove the outlier? How the mean changes if the value of last observation is changed to 680? 1/18/2019 Daniela Stan - CSC323

Describing Distributions (cont.) Mean versus Median 1. The mean is sensitive to the influence of extreme observations/outliers, or skewed distributions. 2. A resistant measure of any aspect of a distribution is relatively unaffected by changes in the numerical value of a small proportion of the total number of observations, no matter how large these changes are. 3. The mean is no a resistant measure of the center. 4. The median is a resistant measure of the center. 1/18/2019 Daniela Stan - CSC323

1/18/2019 Daniela Stan - CSC323

1/18/2019 Daniela Stan - CSC323

Describing Distributions (cont.) Measuring spread: the quartiles The pth percentile of a distribution is the value such that p percent of the observations fall at or below it. The 50th percentile = median, M The 25th percentile = first quartile, Q1 The 75th percentile = third quartile, Q3 1/18/2019 Daniela Stan - CSC323

Describing Distributions (cont.) To calculate the quartiles: 1. Arrange the observations in increasing order and locate the median M in the list of observations. 2. The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median. 3. The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median. Example: 1.13 13 16 19 21 21 23 23 24 26 26 27 27 27 28 28 30 30 M=?, Q1=?, Q3=? 1/18/2019 Daniela Stan - CSC323

Describing Distributions (cont.) The Five-Number Summary of a set of observations consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from the smallest to the largest. In symbols, the five number summary is Minimum Q1 M Q3 Maximum A boxplot is a graph of the five-number summary: A central box spans the quartiles Q1 and Q3 A line in the box marks the median M Lines extend from the box out to the smallest and largest observations 1/18/2019 Daniela Stan - CSC323

Describing Distributions (cont.) Example: Numerical Description of shopping data using SPSS 1/18/2019 Daniela Stan - CSC323

Recommended Problems Chapter 1: Section 1.1 IPS web site: http://www.whfreeman.com/ips4e 1/18/2019 Daniela Stan - CSC323