Chapter 2: Data Mining Dr. Goutam Sarker,

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining – Intro.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Data Mining: A Closer Look
Data Mining.
Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Data Mining By Dave Maung.
Principles of Data Mining. Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Academic Year 2014 Spring Academic Year 2014 Spring.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining.
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
Machine Learning overview Chapter 18, 21
DATA MINING © Prentice Hall.
Introduction to Data Mining
Introduction C.Eng 714 Spring 2010.
Data and Applications Security Introduction to Data Mining
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Course Introduction CSC 576: Data Mining.
Data Mining: Introduction
Data Warehousing Data Mining Privacy
Data Mining: Concepts and Techniques
CSE591: Data Mining by H. Liu
Promising “Newer” Technologies to Cope with the
Presentation transcript:

Chapter 2: Data Mining Dr. Goutam Sarker, Fellow: IE(I), Fellow: IETE(I), Senior Member: IEEE(USA), Associate Professor, CSE, NITD 9/23/2017 6:11 AM Data Mining / CSE Department/ Dr. Goutam Sarker

Data Mining / CSE Department/ Dr. Goutam Sarker What is Data Mining ? The term “data mining” refers to the finding of relevant and useful information from databases. 9/23/2017 6:11 AM Data Mining / CSE Department/ Dr. Goutam Sarker

Data Mining / CSE Department/ Dr. Goutam Sarker Definition 1 Data mining or knowledge discovery in databases, is the non trivial extraction of implicit, previously unknown and potentially useful information from the data. This encompasses a number of technical approaches, such as clustering, data summarization, classification, pattern recognition, etc. 9/23/2017 6:11 AM Data Mining / CSE Department/ Dr. Goutam Sarker

Data Mining / CSE Department/ Dr. Goutam Sarker Definition 2 Data mining is the search for the relationships and global patterns that exist in large databases but are hidden among vast amounts of data. 9/23/2017 6:11 AM Data Mining / CSE Department/ Dr. Goutam Sarker

Data Mining / CSE Department/ Dr. Goutam Sarker Definition 3 Data mining is the process of discovering meaningful, new correlation patterns and trends by sifting through large amount of data stored in repositories, using pattern recognition techniques as well as statistical and mathematical techniques. 9/23/2017 6:11 AM Data Mining / CSE Department/ Dr. Goutam Sarker

Data Mining / CSE Department/ Dr. Goutam Sarker KDD vs. Data Mining Knowledge Discovery in Database (KDD): was formalized in 1989, with reference to the general concept of being broad and high level in the pursuit of seeking knowledge from data. Data mining: is the only one of the many steps involved in knowledge discovery in databases. The various steps in the knowledge discovery process include data selection, data cleaning and preprocessing, data transformation and reduction, data mining algorithm selection and finally the post processing and the interpretation of the discovered knowledge. The KDD process tends to be highly iterative and interactive. 9/23/2017 6:11 AM Data Mining / CSE Department/ Dr. Goutam Sarker

Data Mining / CSE Department/ Dr. Goutam Sarker Stages of KDD Selection. Preprocessing. Transformation. Data Mining. Interpretation and Evaluation. Data Visualization. 9/23/2017 6:11 AM Data Mining / CSE Department/ Dr. Goutam Sarker

Data Mining / CSE Department/ Dr. Goutam Sarker Stages of KDD contd. Selection: This stage is concerned with selecting or segmenting the data that are relevant to some criteria. Preprocessing: Preprocessing is the data cleaning stage where unnecessary information is removed. Transformation: The data is not merely transferred across, but transformed in order to be suitable for the task of data mining. In this stage, the data is made usable and navigable. Data Mining: This stage is concerned with the extraction of patterns from the data. Interpretation and Evaluation: The pattern obtained in the data mining stage are converted into knowledge, which in turn is used to support decision making. Data Visualization: Data visualization makes it possible for the analyst to gain a deeper, more intuitive understanding of the data. 9/23/2017 6:11 AM Data Mining / CSE Department/ Dr. Goutam Sarker

Data Mining / CSE Department/ Dr. Goutam Sarker DBMS vs. DM We know that DBMS supports query languages which are useful for query triggered data exploration, whereas data mining supports automatic data exploration. If we know exactly what information we are seeking, a DBMS query would suffice; whereas if we vaguely know the possible correlations or patterns, then data mining techniques are useful. One of the tasks of data mining is hypothesis testing, wherein we formulate a hypothesis and test it by sifting through the database. The data mining application goes where the naturally reside. This avoids performance degradation and takes full advantage of database technology. 9/23/2017 6:11 AM Data Mining / CSE Department/ Dr. Goutam Sarker

Related Areas: Statistics Machine Learning Supervised Learning. Unsupervised Learning.

Artificial Intelligence (AI) vs. Data Mining The tasks of automatically discovering patterns in the data has so far been mostly the domains of Artificial Intelligence. There are mainly 2 aspects to differentiate DM from AI. These are:

Data Mining emphasizes the human understandability of discovered patterns; whereas in AI, the discovered patterns are meant to be used by the machine itself. Data Mining techniques are meant to be scalable to huge store of data such as the world wide web (www). In contrast, the traditional AI approaches have mostly been researched using small “toy” data sets that fit in the main memory.

Data Mining has borrowed a good deal from AI, especially from the field of machine learning in which a program dynamically improves itself. Almost all classification techniques of machine learning have been used in data mining. Only those classification models that are not easily understandable by human users (e.g. neural network techniques have been omitted.

Goals and DM Techniques Two fundamental goals of data mining Prediction Description Prediction makes use of existing variables in the database in order to predict unknown or future values of interest. Description focuses on finding patterns describing the data and subsequent presentation for user interpretation. 9/23/2017 6:11 AM Data Mining / CSE Department/ Dr. Goutam Sarker

Classification of Techniques User guided or verification driven data mining Discovery driven or automatic discovery of rules 9/23/2017 6:11 AM Data Mining / CSE Department/ Dr. Goutam Sarker

Data Mining Techniques Verification Model: In this process of data mining, the user makes a hypothesis and tests the hypothesis on the data to verify its validity. The emphasis is on the user who is responsible for formulating the hypothesis. Discovery Model: The discovery model differs in its emphasis. It is the system automatically discovering important information hidden in the data. The data is sifted in search of frequently occurring patterns, trends and generalizations about the data without guidance from the user. 9/23/2017 6:11 AM Data Mining / CSE Department/ Dr. Goutam Sarker

Discovery Driven Tasks Discovery of association rules Discovery of classification rules Clustering Discovery of frequent episodes Deviation detection 9/23/2017 6:11 AM Data Mining / CSE Department/ Dr. Goutam Sarker

Discovery of Association Rules An association rule has the form X ⇒ Y, where X and Y are the sets of items. The intuitive meaning of such a rule is that the transaction of database which contains X tends to contain Y Given a database, the goal is to discover all the rules that have the support and confidence greater than or equal to the minimum support and confidence. 9/23/2017 6:11 AM Data Mining / CSE Department/ Dr. Goutam Sarker

Classification * Classification involves finding rules that partition the data into disjoint groups. The input for the classification is the training data set, whose class labels are already known. 9/23/2017 6:11 AM

Clustering *Clustering is a method of grouping data into different groups, so that the data in each group share similar trends and patterns Clustering constitutes a major class of data mining algorithms The objectives of clustering are: To uncover natural grouping To initiate hypothesis about the data To find out consistent and valid organization of the data 9/23/2017 6:11 AM

Discovery of Classification Rules Classification involves finding rules that partition the data into disjoint groups. The input to the classification is the training data set whose class labels are already known. This can be termed as supervised learning also.

There are several classification discovery models: Decision Trees. Neural Networks. Genetic Algorithms.

Frequent Episodes Frequent episodes are the sequence of events that occur frequently, close to each other and are extracted from the time sequence 9/23/2017 6:11 AM

R is a set of event types A is a particular type of event Therefore A ϵ R An event is defined as a pair (A, t) , where as above A ϵ R

A sequence of events (also called event sequence ) S of R is a triple (TS, TC, S) Where TS = starting time TC = ending time S= {(A1,t1), (A2,t2), … … … (An, tn) } is the ordered sequence of events, such that

Ai ϵ R and Ts <= ti <= Tc for all i = 1,2, … … … n-1

3 types of episodes a) Serial episodes: Which occur in sequence. b) Parallel episodes: No constraints on the occurrence of event types. c) Non serial non parallel: If the occurrences of A and B preceed an occurrence of C, and there is no constraint on the occurrences of A and B

Deviation Detection Deviation detection is to identify outlying points in a particular data set, and explain whether they are due to noise or other impurities being present in the data or due to trivial reasons 9/23/2017 6:11 AM

Mining Problems Neural Networks Genetic Algorithms Rough Set Techniques Support Vector Machines 9/23/2017 6:11 AM

Other Mining Problems: Sequence Mining: is concerned with mining sequence data. Web Mining: World Wide Web is a fertile area for data mining research having the huge amount of information available online. Text Mining: Text documents are structured by means of information extraction, text categorization etc 9/23/2017 6:11 AM

Spatial Data Mining: Spatial Data mining is the branch of data mining that deals with spatial (location) data. Geographically referenced data Digital mapping Remote Sensing

DM Applications: case studies Housing Loan Prepayment Prediction Crime Detection Customer Retention Brand Loyalty 9/23/2017 6:11 AM

5. Banking Detection of patterns of fraudulent credit card use. Identifying ‘loyal’ customers. Determining ‘credit card spending’ by customer group

6. Astronomy: Detection of unusual stars or galaxies or nebulas or super galaxies may lead to the discovery of previously unknown phenomena and terrestrial body.

End of Chapter 2 9/23/2017 6:11 AM