2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
modified by Marius Bulacu
1 Introduction and Review CS 636 – Adv. Data Mining.
Data Mining: Concepts and Techniques
Dr. Tahar Kechadi Dr. Joe Carthy
Data Mining By Archana Ketkar.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Data Mining: A Closer Look
Data Mining.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Data Mining Using IBM Intelligent Miner Presented by: Qiyan (Jennifer ) Huang.
Chapter 5: Data Mining for Business Intelligence
10 Data Mining. What is Data Mining? “Data Mining is the process of selecting, exploring and modeling large amounts of data to uncover previously unknown.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Chapter 1. Introduction Motivation: Why data mining?
Data Mining Techniques As Tools for Analysis of Customer Behavior
Business Intelligence
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
Data Mining: Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior Lecture 2:
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150 DW Chapter 1. Introduction Instructor: Dan Hebert.
Chapter 1 Introduction to Data Mining
Knowledge Discovery and Data Mining Evgueni Smirnov.
Knowledge Discovery and Data Mining Evgueni Smirnov.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
2015年10月18日星期日 2015年10月18日星期日 2015年10月18日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
October 18, 2015 Data Mining: Concepts and Techniques 1 DATA MINING Motivation: Why data mining? What is data mining? Data Mining: On what kind of data?
CS690L - Lecture 6 1 CS690L Data Mining and Knowledge Discovery Overview Yugi Lee STB #555 (816) This.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
2015年11月2日星期一 2015年11月2日星期一 2015年11月2日星期一 Main Data Mining Techniques for Biomedical Informatics 1 Data Analysis (by DM Techniques) for Biomedical Informatics.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Introduction to Data-Mining Marko Grobelnik Institut Jozef Stefan.
January 8, 2016Data Mining: Concepts and Techniques1 Data Mining: Trends and Applications.
January 17, 2016Data Mining: Concepts and Techniques 1 What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting ( non-trivial,
Conclusions. Why Data Mining? -- Potential Applications Database analysis and decision support – Market analysis and management target marketing, customer.
Academic Year 2014 Spring Academic Year 2014 Spring.
February 13, 2016 Data Mining: Concepts and Techniques 1 1 Data Mining: Concepts and Techniques These slides have been adapted from Han, J., Kamber, M.,
Data Warehousing/Mining 1. 2 Chapter 1. Introduction v Motivation: Why data mining? v What is data mining? v Data Mining: On what kind of data? v Data.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
2016年6月12日星期日 2016年6月12日星期日 2016年6月12日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
Lecture-2 Bscshelp.com.  Why Data Mining and What Kinds of Data Can Be Mined?  Potential Applications 2.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
July 7, 2016 Data Mining: Concepts and Techniques 1 1.
Data Mining Functionalities
Data Mining.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining – Intro.
Data Mining Motivation: “Necessity is the Mother of Invention”
Chapter 1 Introduction to Data Mining
Introduction C.Eng 714 Spring 2010.
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Presentation transcript:

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information Management Chang Gung University

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 2 Outline Motivation to data mining What is data mining? Applications of data mining Data mining process Main data mining techniques Classification of data mining systems

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 3 Motivation Phenomenon : data explosion (Automated data collection tools, mature database technology) Tremendous amount of Web pages 40 + billion photos on Facebook 1 million new transactions/hour added in Walmart database Big data in Clouds Data from wearable devices, Internet of Things (IoT) Problem : We drown in data, but need knowledge for decision making Solution : data Mining One of main emerging technologies that will change the world in the near future

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 4 What Is Data Mining? Formal Definition of Data mining Automatic extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) knowledge (rules, regularities, patterns, trends, affinities) from large amount of data Alternative names Business intelligence (BI), knowledge discovery in databases (KDD), data/pattern analysis, knowledge extraction, data dredging, information harvesting, etc.

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 5 Example : Mining a Concept Hierarchy all EuropeNorth_America MexicoCanadaSpainGermany Vancouver M. WindL. Chan... all region office country TorontoFrankfurtcity

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 6 Part of International Sales, Shipping Data

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 7 Confluence of Multiple Disciplines Data Mining Artificial Intelligence Statistics Database Technology Information Science Machine Learning Visualization KDD process

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 8 Evolution of Database Technology 1960s: Data collection, database creation, network DBMS 1970s: Relational data model, relational DBMS 1980s: Advanced data models (OO, spatial, temporal D/Bs, etc.) 1990s ~: Data mining, data warehousing, multimedia D/B, and Web

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 9 Applications of Data Mining Decision support Business decision support Consumer understanding and service improvement Market trend analysis and management Risk analysis and management Fraud detection and management Medical decision support Other Applications Web analysis Bioinformatics Text mining

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 10 Applications of Data Mining (Market Analysis and Management) Data sources for analysis Transactions of credit card, retail industry, etc. Public lifestyle studies (breakfast, brunch, coffee) Various questionnaires Market basket analysis and cross selling Associations/co-relations between product sales Prediction based on the association information (1/2)

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 11 Group profiling of customers Find clusters of “model” customers who share the same characteristics: e.g., spending habits, income level, interest, … Data mining can tell you what types of customers buy what products (by clustering or classification techniques) Identifying customer requirements Identifying potential product sales for (eC) customers Use prediction to find what factors will attract new customers (2/2) Applications of Data Mining (Market Analysis and Management)

Finance planning and asset evaluation Cash flow analysis and prediction Asset evaluation Time series analysis (trend analysis) Competitive analysis and market segmentation Monitoring competitors and market directions Setting pricing strategy in a highly competitive market Grouping customers/a class-based pricing procedure 2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 12 Applications of Data Mining (Risk Analysis and Management)

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 13 Applications Health care, insurance, credit card services Approach use historical data to build models of fraudulent behavior, and use data mining techniques to help identify similar instances Examples Detection of money laundering: Detect suspicious money transaction patterns in banks Fraud detection of medical insurance: Detect cheating ring of patients and doctors Applications of Data Mining (Fraud Detection and Management)

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 14 Web Mining : mining web logs (FB + News portal) Discovering customer preference and behavior Analyzing effectiveness of Web marketing Biomedical Informatics Finding related genes of genetic diseases Drug discovery Bacterial identification Text Ming Detection of spam : analyze content Medical informatics : automatic classification of cancer reports News classification : find related articles Applications of Data Mining (Other Applications)

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 15 Relevant Data Data Preprocessing Data Mining Evaluation/PresentationPattern Knowledge Databases Steps in KDD Process (Technically) Data mining The core step of KDD process

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 16 Main Steps of a KDD Process (Fully) Domain knowledge Acquisition Learning relevant prior knowledge and goals of application Data collection and preprocessing (may take 60% of effort!) Data selection and integration : creating a target data set Data cleaning, data transformation, and data reduction Data mining Choosing functions of data mining association, classification, clustering, regression, summarization. Choosing the mining algorithm(s) Searching for patterns of interest Pattern evaluation and knowledge presentation removing redundant patterns, transformation, visualization, etc. Use of discovered knowledge

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 17 Mining On What Kind of Data? Relational databases Transactional databases Data warehouses Advanced D/B and information repositories Web pages Temporal data (Time-series data) Spatial databases Text databases and multimedia databases Object-oriented databases Heterogeneous and legacy databases

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 18 Relevant Data Data Preprocessing Databases Steps in KDD Process

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 19 Why Data Preprocessing? Data in the real world is dirty (e.g., FaceBook) incomplete lacking attribute values, lacking certain attributes of interest, or containing only aggregate data noisy containing errors or outliers inconsistent containing discrepancies in codes or names No quality data, no quality mining results! Quality decisions must be based on quality data

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 20 What Major Tasks in Data Preprocessing Data cleaning Data integration Data transformation Data reduction Data discretization

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 21 Relevant Data Data Preprocessing Data Mining Pattern Databases Steps in KDD Process

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 22 Main Data Mining Techniques Association Rule Mining (Descriptive Analysis) Classification and Prediction (Predictive Analysis) Cluster Analysis (Exploratory Analysis) Regression Analysis Outlier Analysis Trend Analysis

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 23 Main Data Mining Techniques Association Rule Mining (association rule : correlation and causality) Form of association rules buy(T, “Beer”)  buy(T, “Diaper”) [support = 2%, confidence = 70%] Walmart story sales(T, “computer”)  sales(T, “software”) [support = 1%, confidence = 75%] 3C retail stores age(X, “21..25”) ^ income(X, “30..39K”)  buys(X, “PC”) [support = 2%, confidence = 60%] IBM failure story age(X, “31..35”) ^ income(X, “40..49K”)  buys(X, “iPad”) [support = 1%, confidence = 70%] Acer failure story (1/4)

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 24 Association Rule Mining (Support and Confidence) Given a transaction D/B, find all the rules X  Y with minimum support and confidence support, S, probability that a transaction contains {X & Y } confidence, C, conditional probability that a transaction having {X} also contains Y Association rules with sup. >= 50% A  C (50%, 66.6%) C  A (50%, 100%) transactions buy X transactions buy both transactions buy Y all transactions

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 25 Use a training set to construct a model for the outcome forecast of future events. Two main types Classification Finding a model that distinguishes classes for future events e.g., loan approval, customer classification, recognition of finger print Model representation: decision-tree, artificial neural networks Prediction Finding a model that predicts numerical values for future events e.g., stock price prediction Model representation: regression, artificial neural networks (2/4) Main Data Mining Techniques Supervised Learning

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 26 Use a training set to construct a model for the outcome forecast of future events Classification predicts categorical class labels constructs a classification model to classify new data Prediction predicts numerical values Constructs a continuous-valued function to predict unknown or missing values Typical Applications credit card approval medical diagnosis & treatment Pattern recognition Classification vs. Prediction

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Data Mining: Concepts and Techniques 27 Model construction Training Data ( I, O ) Learning Algorithms Model f y=f(x) ( x  I, y  O ) Model usage Model f input features x’ output y’ class label or value : : Process of Classification & Prediction (A Two-Step Process )

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 28 An Example of Training Dataset (Data of Consumers' Buying Behavior) This follows an example from Quinlan’s ID3 class label (O)Input features (I) : customer characteristics

no yes fair excellent <= 30 > student? age? credit rating? no yes : test (input) attribute : class label for Buy_PC : attribute value ? A Decision Tree Model for Predicting buy_PC Model : buy_PC = f (age, student, credit rating) f x : y : yes 29

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 30 A Decision Tree for CAD Screening (Constructed from ~500 Records)

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 31 Cluster analysis (unsupervised learning) Class label is unknown: Group data to form new classes Application example : Customer profiling for product recommendation (Online Bookstores) Typical clustering principle Maximizing the intra-class similarity and minimizing the interclass similarity (3/4) Main Data Mining Techniques Cluster analysis

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 32 A B C Difficulty : Data distribution of high dimension is not visually visible. X Y Z 3 clusters with points X, Y, and Z as outliers Example of 2D Cluster Analysis

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 33 Clustering Example in Cluster Analysis CAD data) Clustering Example in High Dimension (Cluster Analysis CAD data) Data matrix for visualization Clustering dendrogram Profile of CAD patients Profile of healthy people

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Main Data Mining Techniques for Biomedical Informatics 34 Profile of Stroke Patients ( Diagnosed by Indices of Chinese Medicine )

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Main Data Mining Techniques for Biomedical Informatics 35 x y y = a x + b X1X1 Y 1 ? Main Data Mining Techniques Example of Linear Regression (4/4) Predict y ’s value at X 1 using linear regression y = f (x), what is f ? explore the meaning of a and b

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 36 Outlier analysis Outlier: a data object that does not comply with the general behavior of the data It can be considered as noise or exception but is quite useful in fraud detection, rare events analysis Trend analysis Trend and deviation: regression analysis Sequential pattern mining, periodicity analysis Other pattern-directed or statistical analyses Other Data Mining Techniques (4/4)

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 37 Are All the “Discovered” Patterns Interesting? A data mining system/query may generate thousands of patterns, not all of them are interesting. Pattern screening becomes a problem. Interestingness : a measure for automatic pattern screening A pattern is interesting if it is easily understood, potentially useful, novel, valid on new or test data with some degree of certainty, or it validates some hypothesis that a user seeks to confirm Objective vs. subjective interestingness measures for data screening Objective: based on statistics and structures of data patterns, e.g., support, confidence, etc. Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty, actionability, etc.

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 38 Can We Find All and Only Interesting Patterns? Completeness vs. Optimization Completeness : Find all the interesting patterns Can a data mining system find all the interesting patterns? Optimization : Only find interesting patterns Can a data mining system find only the interesting patterns? Approaches First generate all the patterns and then filter out the uninteresting ones. Generate only the interesting patterns—mining query optimization

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 39 Classification Scheme of DM Techniques General functionality Descriptive/Exploratory data mining Predictive data mining Different views, different classifications Kinds of databases to be mined Kinds of knowledge to be discovered Kinds of techniques utilized Kinds of applications adapted

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 40 A Multi-Dimensional View of DM Technique Classification Databases to be mined Relational, transactional, Web, object-oriented, object-relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, etc. Knowledge to be mined Association, classification, clustering, trend, characterization, deviation and outlier analysis, etc. Techniques utilized Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, etc. Applications adapted Retail, telecommunication, banking, fraud analysis, stock market analysis, Web mining, Biomedical informatics, etc.

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 41 Summary for Data Mining Data mining: automatic discovery of interesting knowledge from large amounts of data A natural evolution of database technology, in great demand, with wide applications A KDD process includes data pre-processing, data mining, pattern evaluation, and knowledge presentation Main data mining functions: association, classification, clustering, outlier and trend analysis, characterization, etc.

2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining42 Thanks !!!! Have a Nice Day !