Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Data Mining Concepts Emre Eftelioglu.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Data Mining CS 157B Section 2 Keng Teng Lao. Overview Definition of Data Mining Application of Data Mining.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Dr. Awad Khalil Computer Science Department AUC
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Data mining: some basic ideas Francisco Moreno Excerpts from Fundamentals of DB Systems, Elmasri & Navathe and other sources.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
1 1 Slide Introduction to Data Mining and Business Intelligence.
TCU Dept. of Computer Science CRESCENT Database Issues in Smart Homes Pervasive Intelligent Environments Spring 2004 March 2, 2004.
Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Data Mining By Dave Maung.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor.
Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
DATA MINING By Cecilia Parng CS 157B.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Academic Year 2014 Spring Academic Year 2014 Spring.
February 13, 2016 Data Mining: Concepts and Techniques 1 1 Data Mining: Concepts and Techniques These slides have been adapted from Han, J., Kamber, M.,
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
July 7, 2016 Data Mining: Concepts and Techniques 1 1.
Lecture 15 Data Mining Concepts
Data Mining.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining: Introduction
DATA MINING © Prentice Hall.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining: Concepts and Techniques Course Outline
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
Data Warehousing and Data Mining
I don’t need a title slide for a lecture
Data Mining: Concepts and Techniques
Supporting End-User Access
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Data Mining: Concepts and Techniques
Presentation transcript:

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts

Copyright © 2011 Ramez Elmasri and Shamkant Navathe What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data Data mining: a misnomer? Alternative names Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. Watch out: Is everything “data mining”? Simple search and query processing (Deductive) expert systems 2

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Definitions of Data Mining The discovery of new information in terms of patterns or rules from vast amounts of data. The process of finding interesting structure in data. The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Data Warehousing The data warehouse is a historical database designed for decision support. Data mining can be applied to the data in a warehouse to help with certain types of decisions. Proper construction of a data warehouse is fundamental to the successful use of data mining.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Knowledge Discovery (KDD) Process Data mining—core of knowledge discovery process Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Data Mining and Business Intelligence Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Architecture: Typical Data Mining System data cleaning, integration, and selection Database or Data Warehouse Server Data Mining Engine Pattern Evaluation Graphical User Interface Knowl edge- Base Database Data Warehouse World-Wide Web Other Info Repositories

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Knowledge Discovery in Databases (KDD) Data mining is actually one step of a larger process known as knowledge discovery in databases (KDD). The KDD process model comprises six phases Data selection Data cleansing Enrichment Data transformation or encoding Data mining Reporting and displaying discovered knowledge

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Goals of Data Mining and Knowledge Discovery (PICO) Prediction: Determine how certain attributes will behave in the future. Identification: Identify the existence of an item, event, or activity. Classification: Partition data into classes or categories. Optimization: Optimize the use of limited resources.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Types of Discovered Knowledge Association Rules Classification Hierarchies Sequential Patterns Patterns Within Time Series Clustering

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Association Rules Association rules are frequently used to generate rules from market-basket data. A market basket corresponds to the sets of items a consumer purchases during one visit to a supermarket. The set of items purchased by customers is known as an itemset. An association rule is of the form X=>Y, where X ={x 1, x 2, …., x n }, and Y = {y 1,y 2, …., y n } are sets of items, with x i and y i being distinct items for all i and all j. For an association rule to be of interest, it must satisfy a minimum support and confidence.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Association Rules Confidence and Support Support: The minimum percentage of instances in the database that contain all items listed in a given association rule. Support is the percentage of transactions that contain all of the items in the itemset, LHS U RHS. Confidence: Given a rule of the form A=>B, rule confidence is the conditional probability that B is true when A is known to be true. Confidence can be computed as support(LHS U RHS) / support(LHS)

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Clustering Unsupervised learning or clustering builds models from data without predefined classes. The goal is to place records into groups where the records in a group are highly similar to each other and dissimilar to records in other groups. The k-Means algorithm is a simple yet effective clustering technique.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Additional Data Mining Methods Sequential pattern analysis Time Series Analysis Regression Neural Networks Genetic Algorithms

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Sequential Pattern Analysis Transactions ordered by time of purchase form a sequence of itemsets. The problem is to find all subsequences from a given set of sequences that have a minimum support. The sequence S 1, S 2, S 3,.. is a predictor of the fact that a customer purchasing itemset S 1 is likely to buy S 2, and then S 3, and so on.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Time Series Analysis Time series are sequences of events. For example, the closing price of a stock is an event that occurs each day of the week. Time series analysis can be used to identify the price trends of a stock or mutual fund. Time series analysis is an extended functionality of temporal data management.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Regression Analysis A regression equation estimates a dependent variable using a set of independent variables and a set of constants. The independent variables as well as the dependent variable are numeric. A regression equation can be written in the form Y=f(x 1,x 2,…,x n ) where Y is the dependent variable. If f is linear in the domain variables x i, the equation is call a linear regression equation.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Neural Networks A neural network is a set of interconnected nodes designed to imitate the functioning of the brain. Node connections have weights which are modified during the learning process. Neural networks can be used for supervised learning and unsupervised clustering. The output of a neural network is quantitative and not easily understood.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Genetic Learning Genetic learning is based on the theory of evolution. An initial population of several candidate solutions is provided to the learning model. A fitness function defines which solutions survive from one generation to the next. Crossover, mutation and selection are used to create new population elements.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Data Mining Applications Marketing Marketing strategies and consumer behavior Finance Fraud detection, creditworthiness and investment analysis Manufacturing Resource optimization Health Image analysis, side effects of drug, and treatment effectiveness