Download presentation
Presentation is loading. Please wait.
Published byJean Parsons Modified over 9 years ago
1
University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad
2
University of Toronto 8/30/20152 Agenda Explosion of data Introduction to data mining Examples of data mining in science and engineering Challenges and opportunities
3
University of Toronto 8/30/20153 Explosion of Data Data in the world doubles every 20 months! NASA’s Earth Orbiting System: 46 megabytes of data per second 4,000,000,000,000 bytes a day FBI fingerprints image library: 200,000,000,000,000 bytes In-line image analysis for particle detection: 1 megabyte in one second
4
University of Toronto 8/30/20154 Explosion of Data (cont.)
5
University of Toronto 8/30/20155 Explosion of Data (cont.)
6
University of Toronto 8/30/20156 Explosion of Data (cont.)
7
University of Toronto 8/30/20157 Explosion of Data (cont.)
8
University of Toronto 8/30/20158 Fast, accurate, and scalable data analysis techniques to extract useful knowledge: The answer is Data Mining. What we need?
9
University of Toronto 8/30/20159 What is Data Mining? “Data Mining is the exploration and analysis of large or small quantities of data in order to discover meaningful patterns, trends and rules.” Data Knowledge Data Mining
10
University of Toronto 8/30/201510 AI, Machine Learning Statistics Data Mining Database Data AnalysisData Warehouse OLAP
11
University of Toronto 8/30/201511 Data Mining Data AnalysisDatabase StatisticsMachine LearningData WarehouseOLAP
12
University of Toronto 8/30/201512 Text FilesRelational Database Multi- dimensional Database EntitiesFileTableCube AttributesRow and Col Record, Field, Index Dimension, Level, Measurement MethodsRead, Write Select, Insert, Update, Delete Drill down, Drill up, Drill through Language-SQLMDX Database
13
University of Toronto 8/30/201513 Data Analysis Classification Regression Clustering Association Sequence Analysis
14
University of Toronto 8/30/201514 Data Analysis X1X1 X2X2 Y2Y2 Output Variables or Targets Y1Y1 Numeric Categorical Numeric Categorical Regression (0,1) Classification (good, bad) age, income, … gender, occupation, … Linear Models or Decision Trees Input Variables or Attributes Model W1W1 W2W2
15
University of Toronto 8/30/201515 Data Analysis (cont.) Age Income Clustering 1, chips, coke, chocolate 2, gum, chips 3, chips, coke 4, … Probability (chips, coke) ? Association Sequence Analysis …ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA… X t-1 XtXt T
16
University of Toronto 8/30/201516 Data Mining in Research Life Cycle Questions Needs Search Research Experiment Modeling Report Library Data Database Data Analysis
17
University of Toronto 8/30/201517 Data Mining – Modeling Steps 1.Problem Definition 2.Data Preparation 3.Exploration 4.Modeling 5.Evaluation 6.Deployment
18
University of Toronto 8/30/201518 Agenda Explosion of data Introduction to data mining Examples of data mining in science and engineering Challenges and opportunities
19
University of Toronto 8/30/201519 Examples of data mining in science & engineering 1. Data mining in Biomedical Engineering “Robotic Arm Control Using Data Mining Techniques” 2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing”
20
University of Toronto 8/30/201520 1. Problem Definition “Control a robotic arm by means of EMG signals from biceps and triceps muscles.” Supination Pronation Flexion Extension Muscle Contraction BicepsTriceps Supination HH Pronation LL Flexion HL Extension LH
21
University of Toronto 8/30/201521 2. Data Preparation The dataset includes 80 records. There are two input variables; biceps signal and triceps signal. One output variable, with four possible values; Supination, Pronation, Flexion and Extension.
22
University of Toronto 8/30/201522 3. Exploration Triceps Record# Scatter Plot Flexion Extension Supination Pronation
23
University of Toronto 8/30/201523 3. Exploration (cont.) Biceps Record# Scatter Plot Flexion Extension Supination Pronation
24
University of Toronto 8/30/201524 5. Modeling Classification OneR Decision Tree Naïve Bayesian K-Nearest Neighbors Neural Networks Linear Discriminant Analysis Support Vector Machines …
25
University of Toronto 8/30/201525 6. Model Deployment A neural network model was successfully implemented inside the robotic arm.
26
University of Toronto 8/30/201526 Examples of data mining in science & engineering 1. Data mining in Biomedical Engineering “Robotic Arm Control Using Data Mining Techniques” 2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing”
27
University of Toronto 8/30/201527 Plastics Extrusion Plastic pellets Plastic melt
28
University of Toronto 8/30/201528 Film Extrusion Extruder Plastic Film Defect due to particle contaminant
29
University of Toronto 8/30/201529 In-Line Monitoring Transition Piece Window Ports
30
University of Toronto 8/30/201530 In-Line Monitoring Light Source Extruder and Interface Optical Assembly Imaging Computer Light
31
University of Toronto 8/30/201531 Melt Without Contaminant Particles (WO)
32
University of Toronto 8/30/201532 Melt With Contaminant Particles (WP)
33
University of Toronto 8/30/201533 1. Problem Definition Classify images into those with particles (WP) and those without particles (WO). WOWP
34
University of Toronto 8/30/201534 2. Data Preparation 2000 Images 54 Input variables all numeric One output variables with two possible values -With Particle -Without Particle
35
University of Toronto 8/30/201535 2. Data Preparation (cont.) Pre-processed images to remove noise Dataset 1 with sharp images: 1350 images including 1257 without particles and 91 with particles Dataset 2 with sharp and blurry images: 2000 images including 1909 without particles and blurry particles and 91 with particles 54 Input variables, all numeric One output variable, with two possible values (WP and WO)
36
University of Toronto 8/30/201536 3. Exploration Demo!
37
University of Toronto 8/30/201537 4. Modeling Classification: OneR Decision Tree 3-Nearest Neighbors Naïve Bayesian
38
University of Toronto 8/30/201538 5. Evaluation DatasetAttrib.ClassOne-RC4.53.N.NBayes Sharp Images 54299.999.8 95.8 Sharp + Blurry Images 54298.597.8 93.3 Sharp + Blurry Images 54387 8479 10 -fold cross-validation If pixel_density_max < 142 then WP
39
University of Toronto 8/30/201539 6. Deploy model A Visual Basic program will be developed to implement the model.
40
University of Toronto 8/30/201540 Agenda Explosion of data Introduction to data mining Examples of data mining in science & engineering Challenges and opportunities
41
University of Toronto 8/30/201541 Challenges and Opportunities Data mining is a ‘top ten’ emerging technology. High pay job! in the financial, medical and engineering. Faster, more accurate and more scalable techniques. Incremental, on-line and real-time learning algorithms. Parallel and distributed data processing techniques.
42
University of Toronto 8/30/201542 Data mining is an exciting and challenging field with the ability to solve many complex scientific and business problems. You can be part of the solution!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.