University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Introduction to Data Mining with XLMiner
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Clementine Server Clementine Server A data mining software for business solution.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Knowledge Discovery Centre: CityU-SAS Partnership 1 Speakers: Prof Y V Hui, CityU Dr H P Lo, CityU Dr Sammy Yuen, CityU Dr K W Cheng, SAS Institute Mr.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Introduction: The essential background
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
27-18 września Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College Bio Informatics January
October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY.
يادگيري ماشين Machine Learning Lecturer: A. Rabiee
MIS2502: Data Analytics Advanced Analytics - Introduction.
Data Mining and Decision Support
Data Mining By: Johan Johansson. Mining Techniques Association Rules Association Rules Decision Trees Decision Trees Clustering Clustering Nearest Neighbor.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Brief Intro to Machine Learning CS539
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
School of Computer Science & Engineering
Prepared by: Mahmoud Rafeek Al-Farra
Introduction to Data Mining
Introduction Data Mining for Business Analytics.
Machine Learning & Data Science
Basic Intro Tutorial on Machine Learning and Data Mining
Data Warehousing and Data Mining
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
Machine Learning with Weka
Course Introduction CSC 576: Data Mining.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad

University of Toronto 8/30/20152 Agenda  Explosion of data  Introduction to data mining  Examples of data mining in science and engineering  Challenges and opportunities

University of Toronto 8/30/20153 Explosion of Data  Data in the world doubles every 20 months!  NASA’s Earth Orbiting System: 46 megabytes of data per second 4,000,000,000,000 bytes a day  FBI fingerprints image library: 200,000,000,000,000 bytes  In-line image analysis for particle detection: 1 megabyte in one second

University of Toronto 8/30/20154 Explosion of Data (cont.)

University of Toronto 8/30/20155 Explosion of Data (cont.)

University of Toronto 8/30/20156 Explosion of Data (cont.)

University of Toronto 8/30/20157 Explosion of Data (cont.)

University of Toronto 8/30/20158 Fast, accurate, and scalable data analysis techniques to extract useful knowledge: The answer is Data Mining. What we need?

University of Toronto 8/30/20159 What is Data Mining? “Data Mining is the exploration and analysis of large or small quantities of data in order to discover meaningful patterns, trends and rules.” Data Knowledge Data Mining

University of Toronto 8/30/ AI, Machine Learning Statistics Data Mining Database Data AnalysisData Warehouse OLAP

University of Toronto 8/30/ Data Mining Data AnalysisDatabase StatisticsMachine LearningData WarehouseOLAP

University of Toronto 8/30/ Text FilesRelational Database Multi- dimensional Database EntitiesFileTableCube AttributesRow and Col Record, Field, Index Dimension, Level, Measurement MethodsRead, Write Select, Insert, Update, Delete Drill down, Drill up, Drill through Language-SQLMDX Database

University of Toronto 8/30/ Data Analysis  Classification  Regression  Clustering  Association  Sequence Analysis

University of Toronto 8/30/ Data Analysis X1X1 X2X2 Y2Y2 Output Variables or Targets Y1Y1 Numeric Categorical Numeric Categorical Regression (0,1) Classification (good, bad) age, income, … gender, occupation, … Linear Models or Decision Trees Input Variables or Attributes Model W1W1 W2W2

University of Toronto 8/30/ Data Analysis (cont.) Age Income Clustering 1, chips, coke, chocolate 2, gum, chips 3, chips, coke 4, … Probability (chips, coke) ? Association Sequence Analysis …ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA… X t-1 XtXt T

University of Toronto 8/30/ Data Mining in Research Life Cycle  Questions  Needs Search Research Experiment Modeling Report Library Data Database Data Analysis

University of Toronto 8/30/ Data Mining – Modeling Steps 1.Problem Definition 2.Data Preparation 3.Exploration 4.Modeling 5.Evaluation 6.Deployment

University of Toronto 8/30/ Agenda  Explosion of data  Introduction to data mining  Examples of data mining in science and engineering  Challenges and opportunities

University of Toronto 8/30/ Examples of data mining in science & engineering 1. Data mining in Biomedical Engineering “Robotic Arm Control Using Data Mining Techniques” 2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing”

University of Toronto 8/30/ Problem Definition “Control a robotic arm by means of EMG signals from biceps and triceps muscles.” Supination Pronation Flexion Extension Muscle Contraction BicepsTriceps Supination HH Pronation LL Flexion HL Extension LH

University of Toronto 8/30/ Data Preparation  The dataset includes 80 records.  There are two input variables; biceps signal and triceps signal.  One output variable, with four possible values; Supination, Pronation, Flexion and Extension.

University of Toronto 8/30/ Exploration Triceps Record# Scatter Plot Flexion Extension Supination Pronation

University of Toronto 8/30/ Exploration (cont.) Biceps Record# Scatter Plot Flexion Extension Supination Pronation

University of Toronto 8/30/ Modeling  Classification  OneR  Decision Tree  Naïve Bayesian  K-Nearest Neighbors  Neural Networks  Linear Discriminant Analysis  Support Vector Machines  …

University of Toronto 8/30/ Model Deployment A neural network model was successfully implemented inside the robotic arm.

University of Toronto 8/30/ Examples of data mining in science & engineering 1. Data mining in Biomedical Engineering “Robotic Arm Control Using Data Mining Techniques” 2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing”

University of Toronto 8/30/ Plastics Extrusion Plastic pellets Plastic melt

University of Toronto 8/30/ Film Extrusion Extruder Plastic Film Defect due to particle contaminant

University of Toronto 8/30/ In-Line Monitoring Transition Piece Window Ports

University of Toronto 8/30/ In-Line Monitoring Light Source Extruder and Interface Optical Assembly Imaging Computer Light

University of Toronto 8/30/ Melt Without Contaminant Particles (WO)

University of Toronto 8/30/ Melt With Contaminant Particles (WP)

University of Toronto 8/30/ Problem Definition Classify images into those with particles (WP) and those without particles (WO). WOWP

University of Toronto 8/30/ Data Preparation  2000 Images  54 Input variables all numeric  One output variables with two possible values -With Particle -Without Particle

University of Toronto 8/30/ Data Preparation (cont.)  Pre-processed images to remove noise  Dataset 1 with sharp images: 1350 images including 1257 without particles and 91 with particles  Dataset 2 with sharp and blurry images: 2000 images including 1909 without particles and blurry particles and 91 with particles  54 Input variables, all numeric  One output variable, with two possible values (WP and WO)

University of Toronto 8/30/ Exploration Demo!

University of Toronto 8/30/ Modeling Classification: OneR Decision Tree 3-Nearest Neighbors Naïve Bayesian

University of Toronto 8/30/ Evaluation DatasetAttrib.ClassOne-RC4.53.N.NBayes Sharp Images Sharp + Blurry Images Sharp + Blurry Images fold cross-validation If pixel_density_max < 142 then WP

University of Toronto 8/30/ Deploy model  A Visual Basic program will be developed to implement the model.

University of Toronto 8/30/ Agenda  Explosion of data  Introduction to data mining  Examples of data mining in science & engineering  Challenges and opportunities

University of Toronto 8/30/ Challenges and Opportunities  Data mining is a ‘top ten’ emerging technology.  High pay job! in the financial, medical and engineering.  Faster, more accurate and more scalable techniques.  Incremental, on-line and real-time learning algorithms.  Parallel and distributed data processing techniques.

University of Toronto 8/30/ Data mining is an exciting and challenging field with the ability to solve many complex scientific and business problems. You can be part of the solution!