Chap 2 Introduction to Statistics

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

Chapter Three McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved
Random Sampling and Data Description
QUANTITATIVE DATA ANALYSIS
Descriptive Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Edpsy 511 Homework 1: Due 2/6.
CHAPTER 6 Statistical Analysis of Experimental Data
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
2 Textbook Shavelson, R.J. (1996). Statistical reasoning for the behavioral sciences (3 rd Ed.). Boston: Allyn & Bacon. Supplemental Material Ruiz-Primo,
With Statistics Workshop with Statistics Workshop FunFunFunFun.
Chapter 3 Statistical Concepts.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
Statistics Chapter 9. Statistics Statistics, the collection, tabulation, analysis, interpretation, and presentation of numerical data, provide a viable.
Data Handbook Chapter 4 & 5. Data A series of readings that represents a natural population parameter A series of readings that represents a natural population.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Statistical Tools in Evaluation Part I. Statistical Tools in Evaluation What are statistics? –Organization and analysis of numerical data –Methods used.
BUS250 Seminar 4. Mean: the arithmetic average of a set of data or sum of the values divided by the number of values. Median: the middle value of a data.
What is Business Statistics? What Is Statistics? Collection of DataCollection of Data –Survey –Interviews Summarization and Presentation of DataSummarization.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Worked examples and exercises are in the text STROUD PROGRAMME 27 STATISTICS.
Chapter 2 Describing Data.
Basic Statistics  Statistics in Engineering  Collecting Engineering Data  Data Summary and Presentation  Probability Distributions - Discrete Probability.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
STATISTICS.
FREQUANCY DISTRIBUTION 8, 24, 18, 5, 6, 12, 4, 3, 3, 2, 3, 23, 9, 18, 16, 1, 2, 3, 5, 11, 13, 15, 9, 11, 11, 7, 10, 6, 5, 16, 20, 4, 3, 3, 3, 10, 3, 2,
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Barnett/Ziegler/Byleen Finite Mathematics 11e1 Chapter 11 Review Important Terms, Symbols, Concepts Sect Graphing Data Bar graphs, broken-line graphs,
Chapter Eight: Using Statistics to Answer Questions.
FARAH ADIBAH ADNAN ENGINEERING MATHEMATICS INSTITUTE (IMK) C HAPTER 1 B ASIC S TATISTICS.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Chapter 2: Frequency Distributions. Frequency Distributions After collecting data, the first task for a researcher is to organize and simplify the data.
Lean Six Sigma: Process Improvement Tools and Techniques Donna C. Summers © 2011 Pearson Higher Education, Upper Saddle River, NJ All Rights Reserved.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Worked examples and exercises are in the text STROUD PROGRAMME 27 STATISTICS.
Chapter 2 Describing and Presenting a Distribution of Scores.
STROUD Worked examples and exercises are in the text Programme 28: Data handling and statistics DATA HANDLING AND STATISTICS PROGRAMME 28.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Chapter 14 Statistics and Data Analysis. Data Analysis Chart Types Frequency Distribution.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 2 Describing and Presenting a Distribution of Scores.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Exploratory Data Analysis
Different Types of Data
Analysis and Empirical Results
ISE 261 PROBABILISTIC SYSTEMS
Unit 2 Fundamentals of Statistics.
MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor
Chapter 2: Methods for Describing Data Sets
CHAPTER 5 Basic Statistics
Chapter 5 STATISTICS (PART 1).
PROGRAMME 27 STATISTICS.
Descriptive Statistics: Presenting and Describing Data
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
An Introduction to Statistics
Basic Statistical Terms
CHAPTER 5 Fundamentals of Statistics
CHAPTER 5 Fundamentals of Statistics
Unit 2 Fundamentals of Statistics.
Presentation transcript:

Chap 2 Introduction to Statistics This chapter gives overview of statistics including histogram construction, measures of central tendency, and dispersion

INTRODUCTION TO STATISTICS Statistics – deriving relevant information from data Deals with Collection of data – census, GDP, football, accident, no. of employees (male, female , department, etc) Collection , tabulation, analysis, interpretation, an presentation of quantitative data – can make some conclusions on sample or population studied, make decisions on quality

INTRODUCTION TO STATISTICS Use of statistics in quality deals with second meaning. – inductive statistics Examples : What can we learn from the data? What conclusions can be drawn? What does the data tell about our process and product performance? etc.

INTRODUCTION TO STATISTICS Understand the use of statistics vital in business to make decisions based on facts in conducting business improvements in controlling and monitoring process, products or service performance Application of statistics to real life problems such as for quality problems will result in improved organizational performance

Collection of data Collect Data – direct observation or indirect through written or verbal questions (market research, opinion polls) Direct observation measured, visual checking, classified as variables and attributes Variables data – measurable quality characteristics Attributes – characteristics not measured but classified as conforming or non-conforming

Collection of data Data collected with purpose Find out process conditions For improvement Variables – quality characteristics that are measurable and countable CONTINUOUS - Dimensions, weight, height, etc. (meter, gallon, p.s.i., etc.) DISCRETE - numbers that exhibit gaps, countable, (no. of defective parts, no. of defects/car, Whole numbers, 1, 2, 3….100)

Collection of data Attributes - quality characteristics that are non-measurable and ‘those we do not want to measure’ Example : surface appearance, color, Acceptable, non-acceptable conforming, non-conf. Data collected in form of discrete values Variables (weight of sugar) CAN be classified as attributes  weight within limits – number of conforming outside limits – no. of non conforming

Summarizing Data Consider this data set on number of Daily Billing errors Data in this from Meaningless Not effective Difficult to use 1 3 5 4 2

Need to summarize data in the form of: Graphical – Freq. Dist., Histogram, Graphs, Charts, Diagrams Analytical – Measures of central tendency, Measure of dispersion

Frequency Distribution (FD) Summary of how data (observations) occur within each subdivision or groups of observed values Help visualize distribution of data Can see how total frequency is distributed Two types : Ungrouped data – listing of observed values Grouped data – lump together observed values

FD - Ungrouped Data Establish array, arrange in ascending or descend (as in column 1) Tabulate the frequency – place tally marking in column 2 Present in graphical form – Histogram, Relative freq. distr. No of errors Tally mark Frequency /////////// 13 1 //// 2 ///// 3 4 5

FD – Ungrouped data No error Freq Relative freq Cumulative freq 1 2 3 4 5 14 12 10 8 6 4 graphical representations Frequency histogram Relative freq histogram Cumulative frequency histogram Relative cum frequency histogram Frequency No error Freq Relative freq Cumulative freq Rel cum freq 15 0.29 1 20 0.38 35 0.67 2 8 0.15 43 0.83 3 5 0.10 48 0.92 4 0.06 51 0.98 0.02 52 1.00 Total

Frequency Distribution For Grouped Data Data which are continuous variable need grouping Steps 1. Collect data and construct tally sheet Make tally - coded if necessary Too many data – group into cells Simplify presentation of distribution Too many cells – distort true picture Too few cells – too concentrated No of cells – judgment by analyst – trial and error Generally 5-20 cells Less than 100 data – use 5 –9 cells 100 – 500 data – use 8 to 17 cells More than 500 – use 15 to 20 cells

Midpoint UPPER BOUNDARY CELL CELL NOMENCLATURE Cell interval (i)

2. Determine the range R = XH - XL R = range XH = highest value of data XL = lowest value of data Example : If highest number is 2.575 and lowest number is 2.531, then = 2.575 – 2.531 = 0.044

3. Determine the cell interval Cell interval = distance between adjacent cell midpoints. If possible, use odd interval values e.g. 0.001, 0.07, 0.5 , 3; so that midpoint values will have same no. decimal places as data values. Use Sturgis rule. i = R/(1+ 3.322 log n) Trial and error h = R/i ;h= number of cells or cllases Assume i = 0.003; h = 0.044/0.003 = 15 cells Assume i = 0.005; h = 0.044/0.005 = 9 cells Assume ii = 0.007; h = 0.044/0/.007 = 6 cells Cell interval 0.005 with 9 cells will give best presentation of data. Use guidelines in step 1.

4. Determine cell midpoints MPL = XL + i/2 (do not round) = 2.531 + 0.005/2 = 2.533 1st cell have 5 different values (also the other cells) 2.533 2.538 2.531 2.532 2.533 2.534 2.535

5. Determine cell boundaries Limit values of cell lower upper To avoid ambiguity in putting data Boundary values have an extra decimal place or sig. figure in accuracy that observed values + 0.0005 to highest value in cell - 0.0005 to lowest value in cell

6. Tabulate cell frequency Post amount of numbers in each cell Frequency distribution table Cell boundary Cell MP Freq. 2.531 – 2.535 2.533 6 2.536 – 2.540 2.538 8 2.541 – 2.545 2.543 12 2.546 – 2.550 2.553 13 2.551 – 2.555 20 2.556 – 2.560 2.563 19 2.561 – 2.565 2.566 – 2.570 2.568 11 2.571 – 2.575 2.573 110

Freq dist gives better view of central value and how data dispersed than the unorganized data sheet Histogram – describes variation in process Used to solve problems determine process capability compare with specifications suggest shape of distribution indicate data discrepancies, e.g. gaps

Characteristics Of Frequency Distribution Symmetry, Number of modes (one, two or multiple), Peakedness of data Bi-modal Sym. Skew Right Left flatter platykurtic ‘very peak’ leptokurtic

Characteristics of Frequency Distribution F.D. can give sufficient info to provide basis for decision making. Distributions are compared regarding:- Shape Spread Location

Descriptive Statistics Analytical method allow comparison between data 2 main analytical methods for describing data Measures of central tendency Measures of dispersion Measures of central tendency of a distribution - a numerical value that describes the central position of data 3 common measures mean median mode

Measure of Central Tendency Mean - most common measure used What is middle value? What is average number of rejects, errors, dimension of product? Mean for Ungrouped Data - unarranged x (x bar)

Mean Example A QA engineer inspects 5 pieces of a tyre’s thread depth (mm). What is the mean thread depth? x1 = 12.3 x2 = 12.5 X3 = 12.0. x4 = 13.0 x5 = 12.8

Mean - Grouped Data When data already grouped in frequency distribution fi (n)= sum. of freq. fi = freq in the ith cell n = no. of cells/class xi = mid point in ith cell

Mean - Grouped Data = 2700/50 = 54 Cell (i) Class boundary Mid Point (xi) Freq (fi) Fixi fi fixi 1 1 – 20 10 2 20 21 – 40 30 300 12 3 41 - 60 50 1000 32 4 61 – 80 70 840 44 5 81 -100 90 6 540 Totals 2700 = 2700/50 = 54

Weighted average xw = weighted avg. Tensile tests aluminium alloy conducted with different number of samples each time. Results are as follows: 1st test : x1 = 207 MPa n = 5 2nd test : x2 = 203 MPa n = 6 3rd test : x3 = 206 MPa n = 3 or use sum of weights equals 1.00 W1 = 5/(5+6+3) = 0.36 W2 = 6/(5+6+3) = 0.43 W3 = 3/(5+6+3) = 0.21 Total = 1.00 xw = weighted avg. wi = weight of ith average

Median – Ungrouped Data Median – value of data which divides total observation into 2 equal parts Ungrouped data – 2 possibilities When total number of data (N) is a) odd or b) even If N is odd ; (N+1/2)th value is median eg. 3 4 5 6 8 N+1/2=6/2=3 , 3rd no. If N is even eg. 3 5 7 9 ½ of (5+7)=6 NOTE: ORDER THE NUMBERS FIRST!

Median – Grouped Data Need to find cell / class having middle value & interpolating in the cell using Lm = lower boundary of cell with the median Cfm = Cum. freq. of all cells below Lm fm =class/cell freq. where median occurs i = cell interval Example MD = 40.5 + 10 = 53.5

Measures of dispersion describes how the data are spread out or scattered on each side of central value both measures of central tendency & dispersion needed to describe data Exams Results Class 1 – avg. : 60.0 marks highest : 95 lowest : 25 Class 2 – avg. : 60.0 marks highest : 100 lowest : 15 marks

Measures of dispersion Main types – range, standard deviation, and variance Range – difference bet. highest & lowest value R = XH - XL Standard deviation Variance – standard deviation squared Large value shows greater variability or spread

Standard deviation For Ungrouped Data s = sample std. dev. xi = observed value x = average n = no. of observed value or use

Standard deviation – grouped data Cell (i) Class boundary Mid Point (xi) Freq (fi) Fixi fi fixi 1 1 – 20 10 2 20 21 – 40 30 300 12 3 41 - 60 50 1000 32 4 61 – 80 70 840 44 5 81 -100 90 6 540 Totals 2700 NOTE: DO NOT ROUND OFF fixi & fixi2 ACCURACY AFFECTED

Concept Of Population and Sample Total daily prod. of steel shaft. Year’s Prod. Volume of calculators Compute x and s sample statistics True Population Parameters  and  Why sample? not possible measure population costs involved 100% manual inspection – accuracy/error Population Sample

Concept Of Population and Sample Statistics, x , s POPN. Parameter  - mean  - std. dev.

Normal Distribution Also called Gaussian distribution Symmetrical, unimodal, bell-shaped dist with mean, median, mode same value Popn. curve – as sample size  cell interval  - get smooth polygon ND

Normal Distribution Much of variation in nature & industry follow N.D. Variation in height of humans, weight of elephants, casting weights, size piston ring Electrical properties, material – tensile strength, etc.

Example - ND

Characteristics of ND Can have different mean but same standard deviation

Different standard deviation but same mean

Relationship between std deviation and area under curve

Normal Distribution Example Need estimates of mean and standard deviation and the Normal Table Example : From past experience a manufacturer concludes that the burnout time of a particular light bulb follows a normal distribution. Sample has been tested and the average (x ) found to be 60 days with a standard deviation () of 20 days. How many bulbs can be expected to be still working after 100 days.

Solution Problem is actually to find area under the curve beyond 100 days Sketch Normal distribution and shade the area needed Calculate z value corresponding to x value using formula Z=(xi - )/ = (100-60)/20 = +2.00 Look in the Normal Table for z = +2.00 – gives area under curve as 0.9773 But, we want x >100 or z > 2.00. Therefore Area = 1.000 – 0.9773 = 0.0227, i.e. 2.27% probability that life of light bulb is > 100 hours σ =20 μ = 60 100 x

Test For Normality To determine whether data is normal Probability Plot - plot data on normal probability paper Steps Order the data Rank the observations Calculate the plotting position i= rank , n=sample size, PP= plotting position in % Label data scale Plot the points on normal probability paper Attempt to fit by eye ‘best line’ Determine normality

Example