Download presentation
Presentation is loading. Please wait.
1
Data Mining for Engineers
Graphic Methods of Pulling Useful Diagnostic Information from Large Messy Data Sets. Slides Extracted from a Talk Given at the Vibo-Rama 2011 Meeting of the Vibration Institute March 10, 2011 Holiday Inn Express Latham, NY
2
DATA MINING FOR ENGINEERS
Graphic Methods of Pulling Useful Diagnostic Information from Large Messy Data Sets
3
Talk Outline
4
Talk Outline
5
Data Mining for Engineers Assessment of Learning
Questions* You Will Be Asked To Answer When The Talk Ends Who’s taken a formal course in statistics? What is Regression Analysis? What is data dependence? Who’s tried using statistics to analyze data? What is average? Mean? Median? Standard deviation? What is Correlation? Who knows what derived variables are? Sliders? What other kinds of data manipulators can you think of? What is replicated data, and when can it be used, and not used? When is it OK to delete/not include data points in a statistical analysis? What kinds of non-numerical information might you want apply statistical methods to? *Questions will be interspersed at the beginning and throughout the presentation to assess participant pre-knowledge as well as audience understanding of and experience with basic statistical parameters.
6
Answers to Questions Data Mining for Engineers Assessment of Learning
Who’s taken a formal course in statistics? (See Hands Raised) What is Regression Analysis? Regression Analysis is a Statistical approach to forecasting … change in a dependent variable on the basis of observed changes in one or more independent variables. Regression Analysis is also known as … curve fitting or line fitting because a regression analysis equation can be used in fitting a curve or line to data points. Relationships depicted in a Regression Analysis are, however, associative only, and any cause-effect inference is purely subjective unless otherwise proven. What is a simple definition of Data Dependence? Data dependence is when one set of information is directly related to another. One goal for regression analysis is to find a mathematical relationship that describes the connection between the two sets of data. Who’s tried using statistics to analyze data? (See Hands Raised) What is Average? An Average is total numeric sum of all the data divided by the number of data points. Mathematically it can be stated as follows; Average = Sum of Numbers / Quantity of Numbers What is Arithmetic Mean? Arthmetic Mean - the average obtained by dividing a sum by the number of its addends. Sometimes in statistics the word “MEAN” by itself is referred to as the halfway point between the extreme the values in the data.
7
More Answers to Questions
Data Mining for Engineers Assessment of Learning What is the Median? The Median is the value of the term in the middle Define Standard Deviation? The Standard Deviation is a statistical measure of the spread or variability in a data set. Mathematically the Standard Deviation is the root mean square (RMS) of the values from their arithmetic mean. What is Correlation? Correlation is the amount of positive or negative relationship existing between two measures. What are Derived Variables? Derived Variables come from a user provided formula What are Sliders? In the data desk program Sliders are a rapid way of changing and entering variable values to get quick results. What other kinds of data manipulators can you think of? (Student’s idea Only) What is Replicated Data, and when can it be used? Replicated data is the process of adding subsets of did you already have into your database. You might want to Replicate Data if it truly strengthens the associated relationship between this When is it OK to Delete (or Not Include) data points in a statistical analysis? You should Not Delete Data entries - if a faulty (untrue) relationship between your data sets would result after deletion. What kinds of Non-numerical Information might you want to apply statistical methods to? Any that help describe actual relationships your data may have
9
NSEWACOUSTICS.WORDPRESS.COM WEBSITE WEBSITE
10
NSEWACOUSTICS.WORDPRESS.COM
11
At the Start … Let me start off by saying - this presentation cannot be appreciated by Just Looking at a set of static slides - presented ONE AT TIME. What I’m about to show you is highly dynamic and requires the use of a real-time computer. Only after experiencing the dynamic effects of this presentation will you to get a real feel for what it’s like to DATA MINE. In this highly interactive presentation I will give you a just a glimpse of what you can learn from huge amounts of data in a very short order using some graphic analytical tools that are available today.
12
1 MILLION DATA POINTS Did you ever think about what a million data points looks like. Have you ever seen a million data points all at once? You’re looking ‘em. The plot below contains 1 million data points.
13
I can’t believe there’s really 1 million points in this plot …
If you don’t believe there’s a million data points here - Let’s rotate ‘em in real time and see if you can pick out each and every point and count them one by one. Rotate Plot Now that I’m rotating them do you believe there’s 1 million points? – half of them are Green
14
LOOKING CLOSER AT THIS DATA
Here’s an output from the data mining package I’m going to use throughout the day. Below is a plot matrix of the data from three different axes. One of the viewing angles has been magnified to reveal individual points. Half of the data has been highlighted in Green.
15
Here’s a closer at what this data mining package can tell us
Now, along with some of the multi-plots we can see a list of the data rows by count. The arrow points to the one millionth row, which just happens to be highlighted green. Other details show data subset icons listed by name and a few other icons which represent the action plots, we have made up to this point. The nice thing is, the program keeps track of whatever you do, as you do it, so you can back-track and review whatever you did and found out
16
Package Has Dynamic Tables & Plots
17
Box plots, area plots and Multi-Series Plots
18
MULTI-PLOT MATRIX
19
MULTI-PLOT MATRIX MULTI-PLOT MATRIX Estimate Accuracy Plot
20
Can PROFITS Be Estimated by Company Info
Stock Market Data Can PROFITS Be Estimated by Company Info
22
Seeded cloud data
23
Individual Plots by Data Row
28
Highlight Data by Clicking
29
Mining Tools are Menu Driven
30
Mining Tools are Menu Driven
31
Mining Tools are Menu Driven
33
How to Do Regression Analysis
34
One Click Regression
35
Multi-Click Multi-Regression
36
Regression Analysis Live Demo Using Five Variables & 11 Sliders
37
Regression Analysis Live Demo Using Five Variables & 11 Sliders
38
Regression Analysis Live Demo Using Five Variables & 11 Sliders
39
Wheelset Data Multi Plots Almost Parallel and Vertically Offset
MEASURED DATA REGRESSION FIT DATA
41
Lake Michigan Level Analysis
44
Lake Michigan Water Levels Predicted Versus Actual
45
Lake Michigan Water Levels Predicted Versus Actual
46
Lake Michigan Water Levels Predicted Versus Actual
47
Dynamic Slider Demo
48
Simulation Using Real Data
49
Simulation Using Real Data
50
Simulation Using Real Data
51
Scatter Plots and Histograms
52
Four Variable Regression DEMO
Some data is obviously extraneous and must be removed and you can use color to highlight and remove it – all automatically
53
Eliminate Extraneous Data
54
FAST REGRESSION SIMULATOR
55
Fast Regression Simulator with Data Scatter Added
56
Roller Bearing Acoustic Signal Simulator
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.