Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining for Engineers

Similar presentations


Presentation on theme: "Data Mining for Engineers"— Presentation transcript:

1 Data Mining for Engineers
Graphic Methods of Pulling Useful Diagnostic Information from Large Messy Data Sets. Slides Extracted from a Talk Given at the Vibo-Rama 2011 Meeting of the Vibration Institute March 10, 2011 Holiday Inn Express Latham, NY

2 DATA MINING FOR ENGINEERS
Graphic Methods of Pulling Useful Diagnostic Information from Large Messy Data Sets

3 Talk Outline

4 Talk Outline

5 Data Mining for Engineers Assessment of Learning
Questions* You Will Be Asked To Answer When The Talk Ends Who’s taken a formal course in statistics? What is Regression Analysis? What is data dependence? Who’s tried using statistics to analyze data? What is average? Mean? Median? Standard deviation? What is Correlation? Who knows what derived variables are? Sliders? What other kinds of data manipulators can you think of? What is replicated data, and when can it be used, and not used? When is it OK to delete/not include data points in a statistical analysis? What kinds of non-numerical information might you want apply statistical methods to? *Questions will be interspersed at the beginning and throughout the presentation to assess participant pre-knowledge as well as audience understanding of and experience with basic statistical parameters.

6 Answers to Questions Data Mining for Engineers Assessment of Learning
Who’s taken a formal course in statistics? (See Hands Raised) What is Regression Analysis? Regression Analysis is a Statistical approach to forecasting … change in a dependent variable on the basis of observed changes in one or more independent variables. Regression Analysis is also known as … curve fitting or line fitting because a regression analysis equation can be used in fitting a curve or line to data points. Relationships depicted in a Regression Analysis are, however, associative only, and any cause-effect inference is purely subjective unless otherwise proven. What is a simple definition of Data Dependence? Data dependence is when one set of information is directly related to another. One goal for regression analysis is to find a mathematical relationship that describes the connection between the two sets of data. Who’s tried using statistics to analyze data? (See Hands Raised) What is Average? An Average is total numeric sum of all the data divided by the number of data points. Mathematically it can be stated as follows; Average = Sum of Numbers / Quantity of Numbers What is Arithmetic Mean? Arthmetic Mean - the average obtained by dividing a sum by the number of its addends. Sometimes in statistics the word “MEAN” by itself is referred to as the halfway point between the extreme the values in the data.

7 More Answers to Questions
Data Mining for Engineers Assessment of Learning What is the Median? The Median is the value of the term in the middle Define Standard Deviation? The Standard Deviation is a statistical measure of the spread or variability in a data set. Mathematically the Standard Deviation is the root mean square (RMS) of the values from their arithmetic mean. What is Correlation? Correlation is the amount of positive or negative relationship existing between two measures. What are Derived Variables? Derived Variables come from a user provided formula What are Sliders? In the data desk program Sliders are a rapid way of changing and entering variable values to get quick results. What other kinds of data manipulators can you think of? (Student’s idea Only) What is Replicated Data, and when can it be used? Replicated data is the process of adding subsets of did you already have into your database. You might want to Replicate Data if it truly strengthens the associated relationship between this When is it OK to Delete (or Not Include) data points in a statistical analysis? You should Not Delete Data entries - if a faulty (untrue) relationship between your data sets would result after deletion. What kinds of Non-numerical Information might you want to apply statistical methods to? Any that help describe actual relationships your data may have

8

9 NSEWACOUSTICS.WORDPRESS.COM WEBSITE WEBSITE

10 NSEWACOUSTICS.WORDPRESS.COM

11 At the Start … Let me start off by saying - this presentation cannot be appreciated by Just Looking at a set of static slides - presented ONE AT TIME. What I’m about to show you is highly dynamic and requires the use of a real-time computer. Only after experiencing the dynamic effects of this presentation will you to get a real feel for what it’s like to DATA MINE. In this highly interactive presentation I will give you a just a glimpse of what you can learn from huge amounts of data in a very short order using some graphic analytical tools that are available today.

12 1 MILLION DATA POINTS Did you ever think about what a million data points looks like. Have you ever seen a million data points all at once? You’re looking ‘em. The plot below contains 1 million data points.

13 I can’t believe there’s really 1 million points in this plot …
If you don’t believe there’s a million data points here - Let’s rotate ‘em in real time and see if you can pick out each and every point and count them one by one. Rotate Plot Now that I’m rotating them do you believe there’s 1 million points? – half of them are Green

14 LOOKING CLOSER AT THIS DATA
Here’s an output from the data mining package I’m going to use throughout the day. Below is a plot matrix of the data from three different axes. One of the viewing angles has been magnified to reveal individual points. Half of the data has been highlighted in Green.

15 Here’s a closer at what this data mining package can tell us
Now, along with some of the multi-plots we can see a list of the data rows by count. The arrow points to the one millionth row, which just happens to be highlighted green. Other details show data subset icons listed by name and a few other icons which represent the action plots, we have made up to this point. The nice thing is, the program keeps track of whatever you do, as you do it, so you can back-track and review whatever you did and found out

16 Package Has Dynamic Tables & Plots

17 Box plots, area plots and Multi-Series Plots

18 MULTI-PLOT MATRIX

19 MULTI-PLOT MATRIX MULTI-PLOT MATRIX Estimate Accuracy Plot

20 Can PROFITS Be Estimated by Company Info
Stock Market Data Can PROFITS Be Estimated by Company Info

21

22 Seeded cloud data

23 Individual Plots by Data Row

24

25

26

27

28 Highlight Data by Clicking

29 Mining Tools are Menu Driven

30 Mining Tools are Menu Driven

31 Mining Tools are Menu Driven

32

33 How to Do Regression Analysis

34 One Click Regression

35 Multi-Click Multi-Regression

36 Regression Analysis Live Demo Using Five Variables & 11 Sliders

37 Regression Analysis Live Demo Using Five Variables & 11 Sliders

38 Regression Analysis Live Demo Using Five Variables & 11 Sliders

39 Wheelset Data Multi Plots Almost Parallel and Vertically Offset
MEASURED DATA REGRESSION FIT DATA

40

41 Lake Michigan Level Analysis

42

43

44 Lake Michigan Water Levels Predicted Versus Actual

45 Lake Michigan Water Levels Predicted Versus Actual

46 Lake Michigan Water Levels Predicted Versus Actual

47 Dynamic Slider Demo

48 Simulation Using Real Data

49 Simulation Using Real Data

50 Simulation Using Real Data

51 Scatter Plots and Histograms

52 Four Variable Regression DEMO
Some data is obviously extraneous and must be removed and you can use color to highlight and remove it – all automatically

53 Eliminate Extraneous Data

54 FAST REGRESSION SIMULATOR

55 Fast Regression Simulator with Data Scatter Added

56 Roller Bearing Acoustic Signal Simulator


Download ppt "Data Mining for Engineers"

Similar presentations


Ads by Google