Unit 2: Some Basics. The whole vs. the part population vs. sample –means (avgs) and “std devs” [defined later] of these are denoted by different letters.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

DESCRIBING DISTRIBUTION NUMERICALLY
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Descriptive Measures MARE 250 Dr. Jason Turner.
Class Session #2 Numerically Summarizing Data
Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Chapter 1 & 3.
2-5 : Normal Distribution
Unit 2: Some Basics. Example: Hair color at NYS Fair.
Chapter 1 Introduction Individual: objects described by a set of data (people, animals, or things) Variable: Characteristic of an individual. It can take.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
Looking at data: distributions - Describing distributions with numbers
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
REPRESENTATION OF DATA.
Objectives 1.2 Describing distributions with numbers
1.1 Displaying Distributions with Graphs
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 2 Descriptive Statistics.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 PROBABILITIES FOR CONTINUOUS RANDOM VARIABLES THE NORMAL DISTRIBUTION CHAPTER 8_B.
Stat 1510: Statistical Thinking and Concepts 1 Density Curves and Normal Distribution.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
1 MATB344 Applied Statistics Chapter 2 Describing Data with Numerical Measures.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Chapter 2 Describing Data.
Describing distributions with numbers
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Categorical vs. Quantitative…
Bellwork 1. If a distribution is skewed to the right, which of the following is true? a) the mean must be less than the.
To be given to you next time: Short Project, What do students drive? AP Problems.
Find out where you can find rand and randInt in your calculator. Write down the keystrokes.
Chapter 2 Descriptive Statistics Section 2.3 Measures of Variation Figure 2.31 Repair Times for Personal Computers at Two Service Centers  Figure 2.31.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
Numerical descriptors BPS chapter 2 © 2006 W.H. Freeman and Company.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
Descriptive Statistics Review – Chapter 14. Data  Data – collection of numerical information  Frequency distribution – set of data with frequencies.
Chapter 5 The Standard Deviation as a Ruler and the Normal Model.
Descriptive Statistics  Individuals – are the objects described by a set of data. Individuals may be people, but they may also be animals or things. 
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Figure 2-7 (p. 47) A bar graph showing the distribution of personality types in a sample of college students. Because personality type is a discrete variable.
Chapter 2: Methods for Describing Data Sets
Statistics Unit Test Review
1st Semester Final Review Day 1: Exploratory Data Analysis
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
DAY 3 Sections 1.2 and 1.3.
Topic 5: Exploring Quantitative data
Histograms: Earthquake Magnitudes
Displaying Distributions with Graphs
POPULATION VS. SAMPLE Population: a collection of ALL outcomes, responses, measurements or counts that are of interest. Sample: a subset of a population.
10.5 Organizing & Displaying Date
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
Presentation transcript:

Unit 2: Some Basics

The whole vs. the part population vs. sample –means (avgs) and “std devs” [defined later] of these are denoted by different letters description vs. inference –mean of a population describes it (partly) –mean of a sample gives a basis to infer (guess) mean of the population

Computers are ubiquitous in stats Warning: We’ll compute with Excel because you can see the data and it’s available all over campus... but its stats results are unreliable (see next slide) In the real world, use a statistical program (SAS, SPSS,...)

Data Types categorical vs. numerical (vs. ordinal, …) –ex of categorical: red, blue, green –ex of numerical: person’s height in inches –(ex of ordinal: str agree, agree, neutral,...) [numerical: discrete vs. continuous] Types of Statistics graphical numerical

Graphs for categorical data pie charts: for fractions of total data in each category bar (Excel: “column”) graphs: for counts or fractions of data in each category (mode = most frequent value)

Example: Hair color at NYS Fair

Numerical variables: Graphs of frequencies I dot plots stem-and-leaf plots Ex: 79, 91, 59, 52, 94, 74, 75, 87, 67, 35, 91, 89, 96, 92, 92

Numerical variables: Graphs of frequencies II histogram 79, 91, 59, 52, 94, 74, 75, 87, 67, 35, 91, 89, 96, 92, 92 (Note similarity to dotplot and stem- and-leaf plot) Excel’s “histograms” aren’t, unless bar widths are equal

More on Histograms Frequencies are represented by areas, not heights Vertical axis should have “density scale” (in % per horizontal unit, not frequency) So (if widths differ), bar height is % / h.u. Result: total area under histogram = 100%

Density scale Weekly salaries in a company Vertical axis is in % per $200 Where is the high point?

Reading a histogram: Hrs slept by CU students (Data from questionnaire) HrsAreas 1,2,33x1=33/20 = 15% 4,52x3=66/20 = 30% 61x4=44/20 = 20% 71x3=33/20 = 15% 8,92x2=44/20 = 20% Sum= 20 So maybe 5% slept 1hr, 5% 2hr and 5% 3hr; Or maybe 0% slept 1hr, 7% 2hr, 8% 3hr 1.Which given interval contains the most students? 2.Which 1-hr period contains the most students (i.e., is most “crowded”)? 3.About what % slept 8 hr? 4.About what % slept 3 or 4 hr? 5.If there were 240 surveyed, about how many slept 6-7 hr?

Drawing a histogram:Horses’ weights in kg Wts (widths)Counts% / 10 kg (20) (20) (10) (10) (20) (20) (30)30124 Sum = 250

Frequencies by area?

We are given lists of numbers taken from a large number of residents of New York State. The numbers represent 1.the person’s height 2.the distance from the person’s home to the nearest airport 3.the distance from the person’s home to Syracuse 4.the size of the person’s savings From the following histograms, choose the one that you think best approximates the true histogram of each list.

Remarks on histograms If data is continuous, need convention about what do with data on boundaries: Does it fall into left or right interval? If data is discrete, it’s natural to center bars on values (which Excel does even when it shouldn’t).

Possible traits of a number list “skewed” right or left (the skew is the tail) –incomes are usually skewed right –test scores are often skewed left –IQs are usually symmetric outlier values (far from others – sometimes a rule defines one, but...) unimodal vs. bimodal –ex: heights of a group of men vs. hts of a group of men and women

Centers of numerical data average [mean] (Greek letter µ or variable with overbar  x ): sum of data divided by number of numbers in list –this is the “arithmetic” mean (others include geometric, harmonic) –“center of gravity” of histogram median: value that is greater than half, less than half of data –half area of histogram to each side –more “resistant” to skew & outliers

Weekly salaries in a small factory: 30 workers 17 $200, 5 $400, 6 $500, 1 $2000, 1 $4600 avg = $500, median = $200

Test for skew median < avg : skewed to right median > avg : skewed to left –(These can be used as the def of skew left or right)

Pop Quiz Is the average or the median likely to be greater for a list of heights of people (including children)? heights of adults? salaries?

Measures of spread standard deviation σ (maybe with subscript of variable) = √[Σ( x -  x ) 2 /n] –this is “population std dev”, SD in text, stdevp in Excel –“sample std dev” s = √[Σ( x -  x ) 2 /(n-1)] is SD + in text (much later), stdev in Excel IQR (“inter-quartile range”): difference between third and first quartiles –first quartile: 1/4 of data is less, 3/4 greater –third quartile: [Guess!] –(n-th percentile: n% of data is less, (100-n)% greater) (range: max value – min value)

Rough estimates of σ Steps: Estimate avg, estimate deviations of data values from avg, estimate avg of deviations (guess high) Ex: 41, 48, 50, 50, 54, 57. Is σ closest to 0, 5, 10 or 20? Ex: Is σ closest to 0, 3, 8 or 15?

Rule of Thumb (best for, but not limited to, bell-shaped data) 68% of values are within 1σ of mean 95% of values are within 2σ of mean almost all values are within 3σ of mean

So values within one σ of the average are fairly common,... while values around two σ away from average are mildly surprising,... and values more than three σ away from the average are “a minor miracle”. “z-score” (“std units”): z = ( x –  x ) / σ –the number of σ’s above average –(if negative, below average)

Effect of outliers 1,2,3,3,4,5 –avg = 3 –σ = √(10/6) ≈ 1.3 –median = 3 –IQR = (Excel values) = 1.5 1,2,3,3,4,100 –avg ≈ 18.8 –σ ≈ 36.3 –median = 3 –IQR = (Excel values) = 1.5

Changing the list (I): “Linear” changes of variable (i.e., changing units of measure) Basic list: 6, 9, 15 – µ = [6+9+15]/3 = 10 – σ = √[((6-10) 2 +(9-10) 2 +(15-10) 2 )/3] = √14 Add 5: 11, 14, 20 – µ = [ ]/3 = 15 – σ = √[((11-15) 2 +(14-15) 2 +(20-15) 2 )/3)] = √14 Multiply by 12 : 72, 108, 180 – µ = [ ]/3 = 120 – σ = √[((72-120) 2 +( ) 2 +( ) 2 )/3] = 12√14

Changing the list (II): A: 25, 35 B: 40, 40, 50, 50 – µ A = 30, σ A = 5, µ B = 45, σ B = 5 A and B: 25, 35, 40, 40, 50, 50 – µ = 40, σ = 5√3 > 5 Combining two lists Adjoining copies of the average: 6, 9, 15, 10, 10, 10, 10 – µ = ( (10))/7 = 10 – σ = √[((6-10) 2 +(9-10) 2 +(15-10) 2 +4(10-10) 2 )/7] = √(3/7)∙√14 < √14

Box-&-whisker plots Uses all of min, first quartile, median, third quartile, max “Normal” just shows them all “Modified” puts limit on whisker length IQR –3 rd quartile +1.5 IQR = “(inner) fence” whisker ends at last value before or on fence –beyond the fence is an “outlier” reject only “for cause” (?) –beyond 3 rd quartile + 3 IQR (“outer fence” or “bound”) is “extreme outlier”

Normal: Modified: