Data Mining for Engineers

Slides:



Advertisements
Similar presentations
AP Statistics Section 3.2 C Coefficient of Determination
Advertisements

San Jose State University Engineering 101 JKA & KY.
Introduction to Regression ©2005 Dr. B. C. Paul. Things Favoring ANOVA Analysis ANOVA tells you whether a factor is controlling a result It requires that.
Analyzing Bivariate Data With Fathom * CFU Using technology with a set of contextual linear data to examine the line of best fit; determine and.
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
LSP 121 Intro to Statistics and SPSS. Statistics One of many definitions: The mathematics of collecting and analyzing data to draw conclusions and make.
Business and Economics 7th Edition
CHAPTER 3 Describing Relationships
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Regression Basics For Business Analysis If you've ever wondered how two or more things relate to each other, or if you've ever had your boss ask you to.
Objectives of Multiple Regression
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Sample size vs. Error A tutorial By Bill Thomas, Colby-Sawyer College.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
Basic Statistics Concepts Marketing Logistics. Basic Statistics Concepts Including: histograms, means, normal distributions, standard deviations.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
CHAPTER 3 Describing Relationships
Descriptive Statistics. Outline of Today’s Discussion 1.Central Tendency 2.Dispersion 3.Graphs 4.Excel Practice: Computing the S.D. 5.SPSS: Existing Files.
Regression Analysis1. 2 INTRODUCTION TO EMPIRICAL MODELS LEAST SQUARES ESTIMATION OF THE PARAMETERS PROPERTIES OF THE LEAST SQUARES ESTIMATORS AND ESTIMATION.
STAT03 - Descriptive statistics (cont.) - variability 1 Descriptive statistics (cont.) - variability Lecturer: Smilen Dimitrov Applied statistics for testing.
PreCalculus 1-7 Linear Models. Our goal is to create a scatter plot to look for a mathematical correlation to this data.
Yandell – Econ 216 Chap 3-1 Chapter 3 Numerical Descriptive Measures.
Exploratory Data Analysis
Department of Mathematics
Analysis and Empirical Results
1 Functions and Applications
Lesson 15-7 Curve Fitting Pg 842 #4 – 7, 9 – 12, 14, 17 – 26, 31, 38
Introduction to Excel 2007 January 29, 2008.
CHAPTER 3 Describing Relationships
Mixed Costs Chapter 2: Managerial Accounting and Cost Concepts. In this chapter we explain how managers need to rely on different cost classifications.
PCB 3043L - General Ecology Data Analysis.
Chapter 5 STATISTICS (PART 4).
Correlations and Scatterplots
Understanding Standards Event Higher Statistics Award
Estimating with PROBE II
Regression and Residual Plots
MEASURES OF CENTRAL TENDENCY
Introduction to Instrumentation Engineering
LESSON 21: REGRESSION ANALYSIS
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Statistics for the Social Sciences
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
DSS-ESTIMATING COSTS Cost estimation is the process of estimating the relationship between costs and cost driver activities. We estimate costs for three.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Linear Regression Dr. Richard Jackson
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
USING DATA Obj
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Introduction to Excel 2007 Part 1: Basics and Descriptive Statistics Psych 209.
CHAPTER 3 Describing Relationships
Presentation transcript:

Data Mining for Engineers Graphic Methods of Pulling Useful Diagnostic Information from Large Messy Data Sets. Slides Extracted from a Talk Given at the Vibo-Rama 2011 Meeting of the Vibration Institute March 10, 2011 Holiday Inn Express Latham, NY 12065

DATA MINING FOR ENGINEERS Graphic Methods of Pulling Useful Diagnostic Information from Large Messy Data Sets

Talk Outline

Talk Outline

Data Mining for Engineers Assessment of Learning Questions* You Will Be Asked To Answer When The Talk Ends Who’s taken a formal course in statistics? What is Regression Analysis? What is data dependence? Who’s tried using statistics to analyze data? What is average? Mean? Median? Standard deviation? What is Correlation? Who knows what derived variables are? Sliders? What other kinds of data manipulators can you think of? What is replicated data, and when can it be used, and not used? When is it OK to delete/not include data points in a statistical analysis? What kinds of non-numerical information might you want apply statistical methods to? *Questions will be interspersed at the beginning and throughout the presentation to assess participant pre-knowledge as well as audience understanding of and experience with basic statistical parameters.

Answers to Questions Data Mining for Engineers Assessment of Learning Who’s taken a formal course in statistics? (See Hands Raised) What is Regression Analysis? Regression Analysis is a Statistical approach to forecasting … change in a dependent variable on the basis of observed changes in one or more independent variables. Regression Analysis is also known as … curve fitting or line fitting because a regression analysis equation can be used in fitting a curve or line to data points. Relationships depicted in a Regression Analysis are, however, associative only, and any cause-effect inference is purely subjective unless otherwise proven. What is a simple definition of Data Dependence? Data dependence is when one set of information is directly related to another. One goal for regression analysis is to find a mathematical relationship that describes the connection between the two sets of data. Who’s tried using statistics to analyze data? (See Hands Raised) What is Average? An Average is total numeric sum of all the data divided by the number of data points. Mathematically it can be stated as follows; Average = Sum of Numbers / Quantity of Numbers What is Arithmetic Mean? Arthmetic Mean - the average obtained by dividing a sum by the number of its addends. Sometimes in statistics the word “MEAN” by itself is referred to as the halfway point between the extreme the values in the data.

More Answers to Questions Data Mining for Engineers Assessment of Learning What is the Median? The Median is the value of the term in the middle Define Standard Deviation? The Standard Deviation is a statistical measure of the spread or variability in a data set. Mathematically the Standard Deviation is the root mean square (RMS) of the values from their arithmetic mean. What is Correlation? Correlation is the amount of positive or negative relationship existing between two measures. What are Derived Variables? Derived Variables come from a user provided formula What are Sliders? In the data desk program Sliders are a rapid way of changing and entering variable values to get quick results. What other kinds of data manipulators can you think of? (Student’s idea Only) What is Replicated Data, and when can it be used? Replicated data is the process of adding subsets of did you already have into your database. You might want to Replicate Data if it truly strengthens the associated relationship between this When is it OK to Delete (or Not Include) data points in a statistical analysis? You should Not Delete Data entries - if a faulty (untrue) relationship between your data sets would result after deletion. What kinds of Non-numerical Information might you want to apply statistical methods to? Any that help describe actual relationships your data may have

NSEWACOUSTICS.WORDPRESS.COM WEBSITE WEBSITE

NSEWACOUSTICS.WORDPRESS.COM

At the Start … Let me start off by saying - this presentation cannot be appreciated by Just Looking at a set of static slides - presented ONE AT TIME. What I’m about to show you is highly dynamic and requires the use of a real-time computer. Only after experiencing the dynamic effects of this presentation will you to get a real feel for what it’s like to DATA MINE. In this highly interactive presentation I will give you a just a glimpse of what you can learn from huge amounts of data in a very short order using some graphic analytical tools that are available today.

1 MILLION DATA POINTS Did you ever think about what a million data points looks like. Have you ever seen a million data points all at once? You’re looking ‘em. The plot below contains 1 million data points.

I can’t believe there’s really 1 million points in this plot … If you don’t believe there’s a million data points here - Let’s rotate ‘em in real time and see if you can pick out each and every point and count them one by one. Rotate Plot Now that I’m rotating them do you believe there’s 1 million points? – half of them are Green

LOOKING CLOSER AT THIS DATA Here’s an output from the data mining package I’m going to use throughout the day. Below is a plot matrix of the data from three different axes. One of the viewing angles has been magnified to reveal individual points. Half of the data has been highlighted in Green.

Here’s a closer at what this data mining package can tell us Now, along with some of the multi-plots we can see a list of the data rows by count. The arrow points to the one millionth row, which just happens to be highlighted green. Other details show data subset icons listed by name and a few other icons which represent the action plots, we have made up to this point. The nice thing is, the program keeps track of whatever you do, as you do it, so you can back-track and review whatever you did and found out

Package Has Dynamic Tables & Plots

Box plots, area plots and Multi-Series Plots

MULTI-PLOT MATRIX

MULTI-PLOT MATRIX MULTI-PLOT MATRIX Estimate Accuracy Plot

Can PROFITS Be Estimated by Company Info Stock Market Data Can PROFITS Be Estimated by Company Info

Seeded cloud data

Individual Plots by Data Row

Highlight Data by Clicking

Mining Tools are Menu Driven

Mining Tools are Menu Driven

Mining Tools are Menu Driven

How to Do Regression Analysis

One Click Regression

Multi-Click Multi-Regression

Regression Analysis Live Demo Using Five Variables & 11 Sliders

Regression Analysis Live Demo Using Five Variables & 11 Sliders

Regression Analysis Live Demo Using Five Variables & 11 Sliders

Wheelset Data Multi Plots Almost Parallel and Vertically Offset MEASURED DATA REGRESSION FIT DATA

Lake Michigan Level Analysis

Lake Michigan Water Levels Predicted Versus Actual

Lake Michigan Water Levels Predicted Versus Actual

Lake Michigan Water Levels Predicted Versus Actual

Dynamic Slider Demo

Simulation Using Real Data

Simulation Using Real Data

Simulation Using Real Data

Scatter Plots and Histograms

Four Variable Regression DEMO Some data is obviously extraneous and must be removed and you can use color to highlight and remove it – all automatically

Eliminate Extraneous Data

FAST REGRESSION SIMULATOR

Fast Regression Simulator with Data Scatter Added

Roller Bearing Acoustic Signal Simulator