University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad
University of Toronto 8/30/20152 Agenda Explosion of data Introduction to data mining Examples of data mining in science and engineering Challenges and opportunities
University of Toronto 8/30/20153 Explosion of Data Data in the world doubles every 20 months! NASA’s Earth Orbiting System: 46 megabytes of data per second 4,000,000,000,000 bytes a day FBI fingerprints image library: 200,000,000,000,000 bytes In-line image analysis for particle detection: 1 megabyte in one second
University of Toronto 8/30/20154 Explosion of Data (cont.)
University of Toronto 8/30/20155 Explosion of Data (cont.)
University of Toronto 8/30/20156 Explosion of Data (cont.)
University of Toronto 8/30/20157 Explosion of Data (cont.)
University of Toronto 8/30/20158 Fast, accurate, and scalable data analysis techniques to extract useful knowledge: The answer is Data Mining. What we need?
University of Toronto 8/30/20159 What is Data Mining? “Data Mining is the exploration and analysis of large or small quantities of data in order to discover meaningful patterns, trends and rules.” Data Knowledge Data Mining
University of Toronto 8/30/ AI, Machine Learning Statistics Data Mining Database Data AnalysisData Warehouse OLAP
University of Toronto 8/30/ Data Mining Data AnalysisDatabase StatisticsMachine LearningData WarehouseOLAP
University of Toronto 8/30/ Text FilesRelational Database Multi- dimensional Database EntitiesFileTableCube AttributesRow and Col Record, Field, Index Dimension, Level, Measurement MethodsRead, Write Select, Insert, Update, Delete Drill down, Drill up, Drill through Language-SQLMDX Database
University of Toronto 8/30/ Data Analysis Classification Regression Clustering Association Sequence Analysis
University of Toronto 8/30/ Data Analysis X1X1 X2X2 Y2Y2 Output Variables or Targets Y1Y1 Numeric Categorical Numeric Categorical Regression (0,1) Classification (good, bad) age, income, … gender, occupation, … Linear Models or Decision Trees Input Variables or Attributes Model W1W1 W2W2
University of Toronto 8/30/ Data Analysis (cont.) Age Income Clustering 1, chips, coke, chocolate 2, gum, chips 3, chips, coke 4, … Probability (chips, coke) ? Association Sequence Analysis …ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA… X t-1 XtXt T
University of Toronto 8/30/ Data Mining in Research Life Cycle Questions Needs Search Research Experiment Modeling Report Library Data Database Data Analysis
University of Toronto 8/30/ Data Mining – Modeling Steps 1.Problem Definition 2.Data Preparation 3.Exploration 4.Modeling 5.Evaluation 6.Deployment
University of Toronto 8/30/ Agenda Explosion of data Introduction to data mining Examples of data mining in science and engineering Challenges and opportunities
University of Toronto 8/30/ Examples of data mining in science & engineering 1. Data mining in Biomedical Engineering “Robotic Arm Control Using Data Mining Techniques” 2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing”
University of Toronto 8/30/ Problem Definition “Control a robotic arm by means of EMG signals from biceps and triceps muscles.” Supination Pronation Flexion Extension Muscle Contraction BicepsTriceps Supination HH Pronation LL Flexion HL Extension LH
University of Toronto 8/30/ Data Preparation The dataset includes 80 records. There are two input variables; biceps signal and triceps signal. One output variable, with four possible values; Supination, Pronation, Flexion and Extension.
University of Toronto 8/30/ Exploration Triceps Record# Scatter Plot Flexion Extension Supination Pronation
University of Toronto 8/30/ Exploration (cont.) Biceps Record# Scatter Plot Flexion Extension Supination Pronation
University of Toronto 8/30/ Modeling Classification OneR Decision Tree Naïve Bayesian K-Nearest Neighbors Neural Networks Linear Discriminant Analysis Support Vector Machines …
University of Toronto 8/30/ Model Deployment A neural network model was successfully implemented inside the robotic arm.
University of Toronto 8/30/ Examples of data mining in science & engineering 1. Data mining in Biomedical Engineering “Robotic Arm Control Using Data Mining Techniques” 2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing”
University of Toronto 8/30/ Plastics Extrusion Plastic pellets Plastic melt
University of Toronto 8/30/ Film Extrusion Extruder Plastic Film Defect due to particle contaminant
University of Toronto 8/30/ In-Line Monitoring Transition Piece Window Ports
University of Toronto 8/30/ In-Line Monitoring Light Source Extruder and Interface Optical Assembly Imaging Computer Light
University of Toronto 8/30/ Melt Without Contaminant Particles (WO)
University of Toronto 8/30/ Melt With Contaminant Particles (WP)
University of Toronto 8/30/ Problem Definition Classify images into those with particles (WP) and those without particles (WO). WOWP
University of Toronto 8/30/ Data Preparation 2000 Images 54 Input variables all numeric One output variables with two possible values -With Particle -Without Particle
University of Toronto 8/30/ Data Preparation (cont.) Pre-processed images to remove noise Dataset 1 with sharp images: 1350 images including 1257 without particles and 91 with particles Dataset 2 with sharp and blurry images: 2000 images including 1909 without particles and blurry particles and 91 with particles 54 Input variables, all numeric One output variable, with two possible values (WP and WO)
University of Toronto 8/30/ Exploration Demo!
University of Toronto 8/30/ Modeling Classification: OneR Decision Tree 3-Nearest Neighbors Naïve Bayesian
University of Toronto 8/30/ Evaluation DatasetAttrib.ClassOne-RC4.53.N.NBayes Sharp Images Sharp + Blurry Images Sharp + Blurry Images fold cross-validation If pixel_density_max < 142 then WP
University of Toronto 8/30/ Deploy model A Visual Basic program will be developed to implement the model.
University of Toronto 8/30/ Agenda Explosion of data Introduction to data mining Examples of data mining in science & engineering Challenges and opportunities
University of Toronto 8/30/ Challenges and Opportunities Data mining is a ‘top ten’ emerging technology. High pay job! in the financial, medical and engineering. Faster, more accurate and more scalable techniques. Incremental, on-line and real-time learning algorithms. Parallel and distributed data processing techniques.
University of Toronto 8/30/ Data mining is an exciting and challenging field with the ability to solve many complex scientific and business problems. You can be part of the solution!