Chapter 3 Introduction to Data Mining

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
CIS 674 Introduction to Data Mining
Data Mining.
Business Intelligence
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Lingma Acheson Department of Computer and Information Science, IUPUI
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
DATA MINING & KNOWLEDGE DISCOVERY
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
Understanding Data Analytics and Data Mining Introduction.
Data Mining Techniques As Tools for Analysis of Customer Behavior Lecture 2:
Chapter 1 Introduction to Data Mining
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
CS690L - Lecture 6 1 CS690L Data Mining and Knowledge Discovery Overview Yugi Lee STB #555 (816) This.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
January 17, 2016Data Mining: Concepts and Techniques 1 What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting ( non-trivial,
Academic Year 2014 Spring Academic Year 2014 Spring.
February 13, 2016 Data Mining: Concepts and Techniques 1 1 Data Mining: Concepts and Techniques These slides have been adapted from Han, J., Kamber, M.,
Data Mining Concepts and Techniques Course Presentation by Ali A. Ali Department of Information Technology Institute of Graduate Studies and Research Alexandria.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
July 7, 2016 Data Mining: Concepts and Techniques 1 1.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
There is an inherent meaning in everything. “Signs for people who can see.”
Data Mining Functionalities
Data Mining.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining – Intro.
Data Mining Motivation: “Necessity is the Mother of Invention”
DATA MINING © Prentice Hall.
Data Mining: Data Preparation
Data Mining Techniques and Applications
Data Mining.
Data warehouse & Data Mining: Concepts and Techniques
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Introduction to Data Mining
Data Mining: Concepts and Techniques Course Outline
Sangeeta Devadiga CS 157B, Spring 2007
Data Mining Concept Description
Lingma Acheson Department of Computer and Information Science, IUPUI
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Supporting End-User Access
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Warehousing Data Mining Privacy
Data Mining: Concepts and Techniques
Data Mining.
Data Mining: Concepts and Techniques
Presentation transcript:

Chapter 3 Introduction to Data Mining Prof. Chintan H. Makwana

Syllabus Topics Introduction Classification of Data Mining System Data Mining Primitives KDD Process Data Mining Architecture Data Mining Functionalities Integration of a Data Mining System with a Database or Data Warehouse System Issues in Data Mining Importance of Data Mining Application of Data Mining Social Impacts

Introduction Data??? Information??? Database??? DBMS???

Data Introduction Structured :DBMS Dhaval Gohel 40 50 60 Rishabh Chauhan 70 80 Mayur Chauhan Jaldhi Patel 30 Viral Prajapati 90 Semi –structured:XML Unstructured:text <Name>Dhaval Gohel</Name> <CA>40</CA> <IP>50</IP> <CS>60</CS> Dhaval Gohel,40,50,60 Rishabh Chauhan 60,70,80

Information Introduction Dhaval Gohel have 50% in current Sem. Viral Prajapati have highest marks in Reaserch Skill. Ankit Prajapti have lowest marks in CA.

Data base Introduction 120160107001 Dhaval Gohel 120160107002 Rishabh Chauhan 120160107004 Mayur Padiya 120160107007 Jaldhi Patel 120160107008 Viral Prajapati Dhaval Gohel Dakor Rishabh Chauhan Modasa Mayur Padiya Nadiyad Jaldhi Patel Dehgam Viral Prajapati Naroda Dhaval Gohel 40 50 60 Rishabh Chauhan 70 80 Mayur Padiya Jaldhi Patel 30 Viral Prajapati 90

Introduction DBMS

Introduction Data: row facts Information: processed data Database: collection of organized related data DBMS: set of software and tools used manipulate the database

What do you mean by Data Mining? Data Mining: “ Data Mining is the process of discovering interesting knowledge from large amount of data stored in databases, data warehouses, or other information repositories. “ Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

Different Names of Data Mining Knowledge discovery (mining) in databases (KDD) knowledge extraction data/pattern analysis data archeology data dredging information harvesting business intelligence, etc.

Database vs Data Mining - Find all employee having salary >=50,000 - Find all the student who have attendance 0% last month - Find all the Student who have Apple Laptop Data Mining: - Find all employee who is contractual (Classification) - Find all the student who have attending lectures (Clustering) - Find all the Student who have Apple Laptop and Apple Phone (Association Rule)

Classification of Data Mining System Database technology Information Science Statistics Machine Learning Visualization Other disciplines

Integration of Multiple Technology Information Science Machine Learning Database Technology Statistics Algorithms Visualization Data Mining

Classification of Data Mining System Classification is based on Kind of database Mined: Data model like relational, transactional, object- relational, or data warehouse. Special types of data handled like spatial, time series, text, stream data, multimedia data mining system, or a World Wide Web mining system.

Classification of Data Mining System Kind of knowledge Mined Data Mining functionalities like Characterization and Discrimination, Mining Frequent Patterns, Classification and Prediction, Cluster Analysis, Outlier Analysis, Evolution Analysis Data regularities vs data irregularities

Classification of Data Mining System Kinds of techniques utilized Degree of user iteration involved e.g., autonomous systems, interactive exploratory systems, query-driven system Method of data analysis employed e.g., database-oriented or data warehouse oriented techniques, machine learning, statistics, visualization, pattern recongnization, neural networks, and so on.

Classification of Data Mining System Application adapted Finance, telecommunication, DNA, stock markets, e-mail and so on.

Data Mining Primitives The set of task-relevant data to be mined The kind of knowledge to be mined The background knowledge to be used in the discovery process The interestingness measures and thresholds for pattern evaluation The expected representation for visualizing the discovered patterns

Knowledge Discovering form Data Pattern Evaluation Data Mining Pattern Task-relevant Data Data transformations Selection and Transformation Preprocessed Data Data Cleaning Data Integration Databases

KDD Process steps Cleaning: remove noise and inconsistent data Integration: where multiple data sources may be combine Selection: Data relevant to the analysis task are retrieved from the database Transformation: Data are transformed into appropriate form for mining. Summary or aggregation operations Data Mining: Various techniques like Association rule mining, Classification, Clustering are apply to Identify and count patterns Pattern Evaluation: Identify truly interesting patterns representing knowledge base on some interestingness measure. For example Support and Count for Association Rule Mining Knowledge Presentation: Visualization and knowledge representation techniques are used to present the mined knowledge to the user

KDD Process on Web Log Data

KDD Process on Web Log Data Cleaning: remove error logs Integration: multiple logs may be combine Selection: Data having valid Status and Media type is selected Transformation: Transfer data to day wise, week wise Data Mining: Identify Pattern and count frequent access Pattern Evaluation: Display frequently access sequences Knowledge Presentation: url page wise user count graph, IP address wise number of page visited count graph

Data Mining Architecture Components Databases, Data warehouse, World Wide Web or other Information repository Database or Data warehouse server Knowledge base Data mining engine Pattern Evaluation Module User Interface

Data Mining Functionalities Data Mining functionalities are used to specify the kind of patterns to be found in data mining tasks. Task: Descriptive and Predictive Descriptive: General Properties of data and database Predictive: Perform inference (Conclusion) on the current data

Data Mining task

Data Mining Functionalities Characterization and Discrimination Mining Frequent Patterns Classification and Prediction Cluster Analysis Outlier Analysis Evolution Analysis

Characterization and Discrimination Data Characterization is a summarization of the general characteristics or features of a target class of data. For example: to analyze the improvements of the students who study in 2nd Semester ME in GECM and whose marks increased 5% in the current semester. Display forms: pie charts, bar charts, multidimensional data cubes etc..

Characterization and Discrimination Data Discrimination is a comparison of the general features of target class data objects with the general features of objects from one or a set of contrasting classes. For example: faculties may like to compare the results of students who study in 2nd Semester ME in GECM and whose marks increased 5% and decreased 5% in the current semester . Display forms: pie charts, bar charts, multidimensional data cubes etc..

Mining Frequent Patterns, Association Rule Mining Frequent patterns are patterns that occur frequently in data set. Forms: Frequent itemsets, subsequences, and substructures. Frequent itemsets: ex. milk and bread. Subsequence: ex. PC followed by Soft. Substructure: sub graph, tress, or lattices

Mining Frequent Patterns, Association Rule Mining Association Rule Mining is method use to find the interesting frequent pattern from large set of data items. computer  antivirus [support=2%, Confidence=60%] Support means that 2% of all the transactions in which computer and antivirus purchased together. Confidence 60% means 60% of customers who purchased a computer also purchased antivirus together

Classification and Prediction Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts. The model is derived based on the analysis of a set of training data and is used to predict the class label of objects for which the class label is unknown. Classification is a two phase process 1) Lerning: Training data are analyzed by classification algorithm. 2) Classification: Classify data into the class lable.

Classification and Prediction Prediction values continuous valued functions, i.e. it is used to predict missing or unavailable numeric data values rather than class labels. Regression analysis is a statistical method used numeric prediction. Dhaval Gohel 40 50 60 Pass Rishabh Chauhan 70 80 Mayur Padiya 30 Fail Ankit Prajapati Classification Prediction

Classification and Prediction

Cluster Analysis Clustering analyzes data objects without consulting class labels. Clustering can be used to generate class labels for a group of data which did not exist at the beginning. The objects are clustered or grouped based on the principle of maximizing the intra-class similarity and minimizing the inter-class similarity.

Outlier Analysis Outliers are data objects that do not comply with the general behavior or model of data. The analysis of outlier data is referred to as outlier mining. Many data mining techniques discard outliers or exceptions as noise. However, in some events these kind of events are more interesting. This analysis of outlier data is referred to as outlier analysis ex: fraud detection.

Evolution Analysis Data evolution analysis describes and models regularities or trends for objects whose behavior changes over time. This may include characterization, discrimination, association and correlation analysis, classification, prediction or clustering of time related data. Distinct features of such data include time series data analysis, sequence or periodicity pattern matching and similarity based data analysis.

No coupling: DM system will not utilize any function of DB/DW System Integration of a Data Mining System with a Database or Data Warehouse System No coupling: DM system will not utilize any function of DB/DW System Loose coupling: DM system will use some facilities of a DB/DW System Semitight coupling: Linking a DM system to a DB/DW System, efficient implementations of a few essential data mining primitives Tight coupling: DM system is smoothly integrated into the DB/DW System

Major Issues in Data Mining Mining different kinds of data Handling multiple levels of abstraction Incorporation of background knowledge Visualization of mining results Handling of incomplete or noisy data Scalability of algorithms

Importance of Data Mining Data collected in large data repositories become “data tombs”. Data Mining tools perform data analysis and my uncover important data patterns, contributing greatly to business strategies, knowledge bases, and scientific and medical research. Data Mining tools turns data tombs into “Golden nuggets” of knowledge.

Application of Data Mining Market analysis Fraud detection Customer retention Production control Science exploration

Social Impacts of Data Mining Privacy Profiling Unauthorized use