Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003.

Slides:

Advertisements

Similar presentations

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

Advertisements

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

Introduction to Regression ©2005 Dr. B. C. Paul. Things Favoring ANOVA Analysis ANOVA tells you whether a factor is controlling a result It requires that.

Sampling Distributions

Introduction to Summary Statistics

Theoretical Probability Distributions We have talked about the idea of frequency distributions as a way to see what is happening with our data. We have.

Copyright © 2010 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

1 Psych 5500/6500 Measures of Central Tendency Fall, 2008.

1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.

Calculating & Reporting Healthcare Statistics

Applications of the Normal Distribution Model (The Confidence Interval) ©Dr. B. C. Paul 2003 revision 2009 Note – The concepts found in these slides are.

Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.

Today: Central Tendency & Dispersion

Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the.

Objective To understand measures of central tendency and use them to analyze data.

© Copyright McGraw-Hill CHAPTER 6 The Normal Distribution.

Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.

Measures of Central Tendency or Measures of Location or Measures of Averages.

Proportions for the Binomial Distribution ©2005 Dr. B. C. Paul.

One Way ANOVA ©2005 Dr. B. C. Paul modified 2009 Note – The concepts presented in these slides are considered common knowledge to those familiar with statistics.

Quantitative Skills: Data Analysis

Two Way ANOVA ©2005 Dr. B. C. Paul. ANOVA Application ANOVA allows us to review data and determine whether a particular effect is changing our results.

Basic Statistics Concepts Marketing Logistics. Basic Statistics Concepts Including: histograms, means, normal distributions, standard deviations.

Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.

Copyright © 2010 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.

The Gaussian (Normal) Distribution: More Details & Some Applications.

Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.

NOTES The Normal Distribution. In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.)

Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.

5.3 Random Variables  Random Variable  Discrete Random Variables  Continuous Random Variables  Normal Distributions as Probability Distributions 1.

Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.

Copyright © 2009 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

Introduction to Behavioral Statistics Probability, The Binomial Distribution and the Normal Curve.

TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.

Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.

Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.

PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?

Categorical vs. Quantitative…

Descriptive Statistics: Presenting and Describing Data.

Thursday August 29, 2013 The Z Transformation. Today: Z-Scores First--Upper and lower real limits: Boundaries of intervals for scores that are represented.

What is a Random Sample (and what if its not) ©Dr. B. C. Paul 2005.

Central Tendency & Dispersion

Chapter Eight: Using Statistics to Answer Questions.

Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.

Inference: Probabilities and Distributions Feb , 2012.

Sampling and estimation Petter Mostad

PCB 3043L - General Ecology Data Analysis.

Today: Standard Deviations & Z-Scores Any questions from last time?

Organizing and Analyzing Data. Types of statistical analysis DESCRIPTIVE STATISTICS: Organizes data measures of central tendency mean, median, mode measures.

Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.

1 Research Methods in Psychology AS Descriptive Statistics.

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 16 Mathematics of Normal Distributions 16.1Approximately Normal.

1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.

Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:

STATS DAY First a few review questions. Which of the following correlation coefficients would a statistician know, at first glance, is a mistake? A. 0.0.

Chapter 6 Continuous Random Variables Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.

PCB 3043L - General Ecology Data Analysis.

STAT 206: Chapter 6 Normal Distribution.

Descriptive Statistics: Presenting and Describing Data

STATS DAY First a few review questions.

Module 8 Statistical Reasoning in Everyday Life

Introduction to Summary Statistics

Introduction to Summary Statistics

Inferential Statistics

Xbar Chart By Farrokh Alemi Ph.D

Statistics Definitions

Advanced Algebra Unit 1 Vocabulary

Presentation transcript:

Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Why Should Engineers Even Care? Engineers Design and Plan –All heard of significant figures gets called –Is everything built actually ft Are our roof bolt spacings in the field really 5 feet? –Much profession is built around engineering tolerances - how close do I have to be to make it work reality is then (hopefully) a bunch of minor variations around acceptable answer

Building to Tolerance Design says shafts are machined to 1.25 inches +/- some tolerance Reality says there is actually a bunch of very similar sizes that are close to 1.25 inches –Every so often we will get a dud (we accept the reality but want to minimize frequency) Often can’t check all parts for tolerance but can check every so many to make sure process is under control –Sample is a few values collected from a larger population Statistical Probability Distribution is a model of this process

Engineers Make Changes as a Means to Improve Things Mining Engineering Example –Coal production from a face area is critical to costs and competitiveness –Make a policy or equipment change - does it work Will Joy’s new high voltage miner really improve coal production? Will change in a ventilation pattern really reduce dust violations that limit production?

Cause and Effect Relationships Very few real results of anything are just one value –Coal production has up and down days Does the new policy, equipment or practice result in more up days of higher value? –How many good results do you actually need to see before you can feel confident that its not just coincidental higher (or lower) values? Varying effects can be modeled as probability distribution

Engineering Design Practice You have a bunch of equations and formulas that tell you whether something should work. Next design step is often to consider that things don’t always work exactly as they should –Mining Truck or a Water Treatment Plant processing train do not work all the time Thus real production is different than the design equation

Modeling We build a mathematical model of the situation and then do the math to see if it is going to work for us in the real world We may not think of it but most of our engineering design equations are mathematical models that were fit to actual data long ago –Newtonian physics (we call them laws now) –Darcy’s law and the Bernoulli Equation

How do You Decide if a Mathematical Model Fits What You See? Because you usually can’t measure 100% accurate or don’t think of or can’t consider every minor effect –Real results tend to be distributed around our potential mathematical models Statistical models consider a distribution of answers around an underlying trend

Sometimes you don’t know what is driving a result Is absenteeism being driven by work assignments, health, deer season etc. Statistical models can compare variations to possible causes and help identify what is driving things.

Spatial Relationships We take samples of an ore body –Do the results mean we have a certain tonnage of ore at a certain grade? We use samples to tell us what material to take to the processing plant or waste dump We may want to tell our mill operator how much a grade or ore may go up or down We can have statistical models built with a spatial or location relationship.

How Statistics Works Often trained to think that answer to real world problems comes out of an equation We actually create mathematical models that approximately fit reality and then work off of something predictable –math that actually is used to study mathematical models may be something only a French Mathematician could love –A lot of the basic ideas are fairly intuitive

Example If I have a random number generator that produces numbers between 1 and 100, what value is most likely? If I take 25 of those random numbers what will the average value most likely be close to?

What Did You Assume to Get Those Answers? You assumed how those values were distributed –You considered what was called a uniform distribution (all numbers are equally likely to come up) –Statistics begins with a series of standard mathematical distributions We try to pick one that most nearly matches our reality

Getting Your Answers You also assumed that the numbers were taken from that distribution at random –ie no one is cherry picking any values preferentially to any other –One of the reasons that statisticians get so crazy if they think someone is Cherry Picking the sample Root of all Statistics is that you assume reality follows a standard mathematical distribution and the part we see was picked at random from that distribution

How Do We Come Up With What Distribution Closely Resembles Our Reality? Process Starts with Figuring Out Which of Our Standard Model Distributions it is Three Levels of Effort Say “I Believe” and assume one –Most commonly done with “Normal Distribution” - “Bell Curve” –Many things tend to be normally distributed –Strength of past experience becomes rationale Also have people who do it without having any idea what they have done –Standard statistics is built around normal distribution

Levels of Effort Level 2 –Study the distribution to see if we are doing something terrible –Common approach is called a “Histogram” it’s a bar graph that we plot our data on so we can look at it –Also have things like probability paper where you plot your data and see if you get a straight line

Effort Level 3 Use statistical techniques to test whether our sample data is like a set that could reasonably be pulled from some standard distribution –Often our goodness of fit tests All three levels of effort have some degree of custom for their use in some practices

Measuring Properties of Distributions Put sample data into a standard equation that generates a number –Often actually call that number a statistic –Measures some property of the distribution that the data was taken from Some statistics have obvious tangible meaning –Example - Mean - mathematical average value of the sample or population

Calculating a Mean (or simple average) Add up all the numbers and then divide by how ever many numbers you added Example –Numbers 5, 10, 15, 20, 25 –What is the Mean? Calculate –( )/5 –Numerator totals to 75 –Denominator is the number of values I put in –Divide the total by the number of values put in –Answer is 15 (the Mean or Average Value)

Statisticians Need Confusing Ways to Write Equations X i means a sample value –The i subscript tells you whether it was the first, second, third etc sample From example on last slide we know X 2 was the second number we looked at which was 10 Σ means the sum of a series of values n means the number of samples considered Thus we write the formula for mean as – We of course also have a special symbol for a mean –

Can Do Problems with Software (in this case SPSS) Type in Data

Ready to Enter Data Type in the Data

Command to Analyze Pull Down Analyze Menu Highlight Descriptive Statistics Highlight Frequencies

Click on Frequencies It gives me a list of Variables to use This list is tough with Only one variable Highlight the variable And push the arrow To move it into the Use area Click Statistics

Choose Your Statistics Check off Mean And push continue

Click OK on the Frequencies Screen Read Off Our Mean at 2.89

More Measurements Mode –The value that has the greatest chance of coming up Example –If I have 10 people who are 5’10” –2 people who are 4’3” –2 people who are 6’10” –If you pick a person at random from my group what height will person most likely be?

More Measures Median –Half of the values are higher - half are lower Mean, Median, and Mode all seem to have somewhat obvious physical meanings Other statistics are less obvious –Variance –A number that comes out of a formula that tells you how spread out the distribution is Square root of variance is Standard Deviation –Average difference between a sample and the mean value

The Standard Deviation Standard Deviation is the average difference between individual samples and the mean What does it mean? Take each sample number, subtract the average sample Value from it, square the result, do this for every number And add up the result, then divide the result by one less Than the number of samples you took, and then take the Square root of that value.

As a Practical Matter That’s a Pain I have to compute the average before I can do the math for standard deviation Alternative Formula Tells you keep track of two number 1- Take each number square it and then add the squares up 2- Take each number and add them up and then square the total

Getting Standard Deviation Statistical Calculators have multiple memories –They add up numbers in one memory –They square and add up numbers in another –They total entries in another –They then apply the standard deviation formula Of course can also use SPSS

Doing Standard Deviation with SPSS Pull Down Analyze Highlight Descriptive Statistics Highlight and click frequencies

Check Off Standard Deviation Push Continue Push Ok on the Frequencies menu

Read Off the Output Std is 1.12

Variance is also a measure of how much things differ from their average Variance is just the standard deviation squared To calculate a variance just do the standard deviation thing without taking the square root at the end Of course I could also check off variance instead of Standard Deviation in SPSS

Types of Distributions Idea is that we try to approximate reality with a mathematically defined distribution –Then we can use mathematical operations to predict our answers Distributions that often fit reality –Normal Distribution (developed in 1733) Bell Curve –Uniform Distribution –Binomial Distribution –T Distribution –Qui Square Distribution –Lognormal Distribution

Derived Distributions T distribution, Qui Squared, and Lognormal Distributions are all derived from the Normal Distribution for specific types of situations

Normal Distribution Shaped Like Formula

Symmetric Distributions with a Central Tendency Normal Distribution is classic example –Most of the chances are right near the center of the distribution Frequency drops off to sides Mode is at the Center of the Distribution –Distribution is mirror image about its center Allows to just compute one side Median is Mean is the Mode A lot of reality has central tendency with relatively symmetric sides –T distribution like that too Sides slope off a little differently

Why the Normal Distribution One of the first mathematically defined distributions that was a real good fit –People developed other formulas and distributions from calculations done on the normal distribution T distribution and Qui Square Distribution both result from performing mathematical operations on samples of a normal distribution –Normal Distribution was first to press with a distribution that was heavy at the center and symmetric

Reality 101 for Statistical Distributions Probably no such thing as a real normal distribution in life Even if there were we almost never count each and every member of the population so you’d never know if it was Statistical Distributions let us take limited data – see what it approximately is –Then use the defined mathematical model to suddenly know everything about it

Back to Why the Normal Distribution Big part of Real World is Central Tendency and Symmetric Found that calculations done with a normal distribution were robust –Minor lack of fit in real world data doesn’t change the answers much –Thus works on almost anything with central tendency and near symmetric

Most Common Lack of Fit Not Symmetric Robustness covers a Little skewness This type of shape can be fit with a Distribution adapted from normal called lognormal If you take averages of about 25 samples From this – the averages will be normal (averaging normalizes) Taking logarithms of the data will make The transformed distribution normal Taking square-root will normalize A few others

Multi-Modal Distributions These types of distributions are often 3 different normally Distributed families over-lying each other Finding what is causing the three families often helps us To better understand our world

Uniform Distribution All values within some range (which may or may not be plus or minus infinity) are equally likely Distribution has no central tendency Tends to be associated with truly random events (or at least events where the underlying cause is eluding our mathematical modeling)

Characteristics of Uniform Distribution Because all values are equally likely it has no mode Mean is at the center of the range Uniform is still symmetric about Mean so the Median and Mean are the same Standard Deviation is 1/4th the range (if range is infinite obviously that’s not defined) Variance is Standard Deviation Squared

Binomial Distribution Outcomes that are either off or on –Clearly describes computers and digital data Many things either work or they don’t –Mining dealing with whether our trucks are in working order –Water treatment plant – water purification train is working or not working –Coin tosses are heads or tails

New Problem Can’t talk about means, modes, and medians because outcome has no continuous distribution Want to know what fraction of the outcomes are “yes” –P = % of members of bimodal population are positive Usually interested in what chances are that we can take 5 members out of the population and have them all positive –Example if I have 5 mining trucks how much of the time will all 5 be running?

The Ordinate Problem How continuously distributed are our outcomes? –Our number line is continuous so at first glance we almost assumed everything was continuous When and what if they are not This usually doesn’t take a very smart statistician to figure out Some things are yes or no distributed –Use binomial distribution model Da!

Some Things are Integer Distributed Continuity really is a function of observational scale –According to quantum physics everything is made of integer numbers of discrete quanta –At our observation scale the little integer jumps are perhaps so small we cannot even measure them –Many times integer continuity is negligible

What If Integer Continuity is Not Negligible? Happens when have small numbers or integer distributed data –How does one deal with teacher rankings in classes of 5 students? Our scale of observation is integer Our sample size is small enough we can’t mask it If it was a class of 500 students we could probably model outcomes rather well as if continuous Non-Parametric Statistical Models

Summary of Ideas Real world data comes as distributions of answers not one equation numbers We can represent these distributions with mathematical models that fully define how the data is distributed –Allows us to approximate things we could never get enough data to count We work on these models and call our work Statistics