Data Mining for Engineers Graphic Methods of Pulling Useful Diagnostic Information from Large Messy Data Sets. Slides Extracted from a Talk Given at the Vibo-Rama 2011 Meeting of the Vibration Institute March 10, 2011 Holiday Inn Express Latham, NY 12065
DATA MINING FOR ENGINEERS Graphic Methods of Pulling Useful Diagnostic Information from Large Messy Data Sets
Talk Outline
Talk Outline
Data Mining for Engineers Assessment of Learning Questions* You Will Be Asked To Answer When The Talk Ends Who’s taken a formal course in statistics? What is Regression Analysis? What is data dependence? Who’s tried using statistics to analyze data? What is average? Mean? Median? Standard deviation? What is Correlation? Who knows what derived variables are? Sliders? What other kinds of data manipulators can you think of? What is replicated data, and when can it be used, and not used? When is it OK to delete/not include data points in a statistical analysis? What kinds of non-numerical information might you want apply statistical methods to? *Questions will be interspersed at the beginning and throughout the presentation to assess participant pre-knowledge as well as audience understanding of and experience with basic statistical parameters.
Answers to Questions Data Mining for Engineers Assessment of Learning Who’s taken a formal course in statistics? (See Hands Raised) What is Regression Analysis? Regression Analysis is a Statistical approach to forecasting … change in a dependent variable on the basis of observed changes in one or more independent variables. Regression Analysis is also known as … curve fitting or line fitting because a regression analysis equation can be used in fitting a curve or line to data points. Relationships depicted in a Regression Analysis are, however, associative only, and any cause-effect inference is purely subjective unless otherwise proven. What is a simple definition of Data Dependence? Data dependence is when one set of information is directly related to another. One goal for regression analysis is to find a mathematical relationship that describes the connection between the two sets of data. Who’s tried using statistics to analyze data? (See Hands Raised) What is Average? An Average is total numeric sum of all the data divided by the number of data points. Mathematically it can be stated as follows; Average = Sum of Numbers / Quantity of Numbers What is Arithmetic Mean? Arthmetic Mean - the average obtained by dividing a sum by the number of its addends. Sometimes in statistics the word “MEAN” by itself is referred to as the halfway point between the extreme the values in the data.
More Answers to Questions Data Mining for Engineers Assessment of Learning What is the Median? The Median is the value of the term in the middle Define Standard Deviation? The Standard Deviation is a statistical measure of the spread or variability in a data set. Mathematically the Standard Deviation is the root mean square (RMS) of the values from their arithmetic mean. What is Correlation? Correlation is the amount of positive or negative relationship existing between two measures. What are Derived Variables? Derived Variables come from a user provided formula What are Sliders? In the data desk program Sliders are a rapid way of changing and entering variable values to get quick results. What other kinds of data manipulators can you think of? (Student’s idea Only) What is Replicated Data, and when can it be used? Replicated data is the process of adding subsets of did you already have into your database. You might want to Replicate Data if it truly strengthens the associated relationship between this When is it OK to Delete (or Not Include) data points in a statistical analysis? You should Not Delete Data entries - if a faulty (untrue) relationship between your data sets would result after deletion. What kinds of Non-numerical Information might you want to apply statistical methods to? Any that help describe actual relationships your data may have
NSEWACOUSTICS.WORDPRESS.COM WEBSITE WEBSITE
NSEWACOUSTICS.WORDPRESS.COM
At the Start … Let me start off by saying - this presentation cannot be appreciated by Just Looking at a set of static slides - presented ONE AT TIME. What I’m about to show you is highly dynamic and requires the use of a real-time computer. Only after experiencing the dynamic effects of this presentation will you to get a real feel for what it’s like to DATA MINE. In this highly interactive presentation I will give you a just a glimpse of what you can learn from huge amounts of data in a very short order using some graphic analytical tools that are available today.
1 MILLION DATA POINTS Did you ever think about what a million data points looks like. Have you ever seen a million data points all at once? You’re looking ‘em. The plot below contains 1 million data points.
I can’t believe there’s really 1 million points in this plot … If you don’t believe there’s a million data points here - Let’s rotate ‘em in real time and see if you can pick out each and every point and count them one by one. Rotate Plot Now that I’m rotating them do you believe there’s 1 million points? – half of them are Green
LOOKING CLOSER AT THIS DATA Here’s an output from the data mining package I’m going to use throughout the day. Below is a plot matrix of the data from three different axes. One of the viewing angles has been magnified to reveal individual points. Half of the data has been highlighted in Green.
Here’s a closer at what this data mining package can tell us Now, along with some of the multi-plots we can see a list of the data rows by count. The arrow points to the one millionth row, which just happens to be highlighted green. Other details show data subset icons listed by name and a few other icons which represent the action plots, we have made up to this point. The nice thing is, the program keeps track of whatever you do, as you do it, so you can back-track and review whatever you did and found out
Package Has Dynamic Tables & Plots
Box plots, area plots and Multi-Series Plots
MULTI-PLOT MATRIX
MULTI-PLOT MATRIX MULTI-PLOT MATRIX Estimate Accuracy Plot
Can PROFITS Be Estimated by Company Info Stock Market Data Can PROFITS Be Estimated by Company Info
Seeded cloud data
Individual Plots by Data Row
Highlight Data by Clicking
Mining Tools are Menu Driven
Mining Tools are Menu Driven
Mining Tools are Menu Driven
How to Do Regression Analysis
One Click Regression
Multi-Click Multi-Regression
Regression Analysis Live Demo Using Five Variables & 11 Sliders
Regression Analysis Live Demo Using Five Variables & 11 Sliders
Regression Analysis Live Demo Using Five Variables & 11 Sliders
Wheelset Data Multi Plots Almost Parallel and Vertically Offset MEASURED DATA REGRESSION FIT DATA
Lake Michigan Level Analysis
Lake Michigan Water Levels Predicted Versus Actual
Lake Michigan Water Levels Predicted Versus Actual
Lake Michigan Water Levels Predicted Versus Actual
Dynamic Slider Demo
Simulation Using Real Data
Simulation Using Real Data
Simulation Using Real Data
Scatter Plots and Histograms
Four Variable Regression DEMO Some data is obviously extraneous and must be removed and you can use color to highlight and remove it – all automatically
Eliminate Extraneous Data
FAST REGRESSION SIMULATOR
Fast Regression Simulator with Data Scatter Added
Roller Bearing Acoustic Signal Simulator