1 Εξόρυξη Γνώσης (data mining) Χ. Παπαθεοδώρου Εργαστήριο Ψηφιακών Βιβλιοθηκών & Ηλεκτρονικής Δημοσίευσης Τμήμα Αρχειονομίας – Βιβλιοθηκονομίας, Ιόνιο.

Slides:



Advertisements
Similar presentations
3/3/20081 Data Warehousing and Data Mining. 3/3/20082 Why Data Mining? — Potential Applications Database analysis and decision support –Market analysis.
Advertisements

Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining Knowledge Discovery in Databases Data 31.
Dr. Tahar Kechadi Dr. Joe Carthy
Data Mining By Archana Ketkar.
Classification.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Data Mining.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Using IBM Intelligent Miner Presented by: Qiyan (Jennifer ) Huang.
10 Data Mining. What is Data Mining? “Data Mining is the process of selecting, exploring and modeling large amounts of data to uncover previously unknown.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Business Intelligence
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Data Mining Techniques As Tools for Analysis of Customer Behavior Lecture 2:
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Chapter 1 Introduction to Data Mining
Knowledge Discovery and Data Mining Evgueni Smirnov.
Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.
Basic Data Mining Technique
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Knowledge Discovery and Data Mining Evgueni Smirnov.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Data Mining Find information from data data ? information.
Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Introduction to Data-Mining Marko Grobelnik Institut Jozef Stefan.
Han: Introduction to KDD 1 Introduction to Knowledge Discovery and Data Mining ©Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Conclusions. Why Data Mining? -- Potential Applications Database analysis and decision support – Market analysis and management target marketing, customer.
Data Mining and Decision Support
Academic Year 2014 Spring Academic Year 2014 Spring.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Lecture-2 Bscshelp.com.  Why Data Mining and What Kinds of Data Can Be Mined?  Potential Applications 2.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining.
Data Mining – Intro.
DATA MINING © Prentice Hall.
Chapter 6 Classification and Prediction
DATA MINING BY: PRADEEP AGRAWAL MBA (SEC – A) ALLIANCE UNIVERSITY – SCHOOL OF BUSINESS.
Mining Association Rules
Data Mining 101 with Scikit-Learn
Introduction C.Eng 714 Spring 2010.
Data Mining: Concepts and Techniques Course Outline
Classification and Prediction
Data Warehousing and Data Mining
Classification and Prediction
Data Mining Concepts and Techniques
CSCI N317 Computation for Scientific Applications Unit Weka
©Jiawei Han and Micheline Kamber
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Warehousing Data Mining Privacy
©Jiawei Han and Micheline Kamber
Data Mining: Concepts and Techniques
Presentation transcript:

1 Εξόρυξη Γνώσης (data mining) Χ. Παπαθεοδώρου Εργαστήριο Ψηφιακών Βιβλιοθηκών & Ηλεκτρονικής Δημοσίευσης Τμήμα Αρχειονομίας – Βιβλιοθηκονομίας, Ιόνιο Πανεπιστήμιο

2 Data Mining  Εξόρυξη γνώσης από πολύ μεγάλες συλλογές δεδομένων  Γνώση: κανόνες, πρότυπα συμπεριφοράς και συσχετίσεις μεταξύ αντικειμένων (όχι προφανής, λανθάνουσα, προηγουμένως άγνωστη, και χρήσιμη)  Αντικείμενο: Αποτελείται από ένα σύνολο χαρακτηριστικών  Δεν είναι:  (Deductive) query processing.  Expert systems, small machine learning /statistical programs

3 Why Data Mining? Potential Applications  Database analysis and decision support  Market analysis and management  target marketing, customer relation management, market basket analysis, cross selling, market segmentation  Risk analysis and management  Forecasting, customer retention, improved underwriting, quality control, competitive analysis  Fraud detection and management  Other Applications  Text mining (news group, , documents) and Web analysis.  Intelligent query answering

4 Market Analysis and Management (1)  Where are the data sources for analysis?  Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies  Target marketing  Find clusters of model customers who share the same characteristics: interest, income level, spending habits, etc.  Determine customer purchasing patterns over time  Conversion of single to a joint bank account: marriage, etc.  Cross-market analysis  Associations/co-relations between product sales  Prediction based on the association information

5 Market Analysis and Management (2)  Customer profiling  data mining can tell you what types of customers buy what products (clustering or classification)  Identifying customer requirements  identifying the best products for different customers  use prediction to find what factors will attract new customers  Provides summary information  various multidimensional summary reports  statistical summary information (data central tendency and variation)

6 Corporate Analysis and Risk Management  Finance planning and asset evaluation  cash flow analysis and prediction  contingent claim analysis to evaluate assets  cross-sectional and time series analysis (financial- ratio, trend analysis, etc.)  Resource planning:  summarize and compare the resources and spending  Competition:  monitor competitors and market directions  group customers into classes and a class-based pricing procedure  set pricing strategy in a highly competitive market

7 Steps of a KDD Process  Learning the application domain:  relevant prior knowledge and goals of application  Creating a target data set: data selection  Data cleaning and preprocessing: (may take 60% of effort!)  Data reduction and transformation:  Find useful features, dimensionality/variable reduction, invariant representation.  Choosing functions of data mining  summarization, classification, regression, association, clustering.  Choosing the mining algorithm(s)  Data mining: search for patterns of interest  Pattern evaluation and knowledge presentation  visualization, transformation, removing redundant patterns, etc.  Use of discovered knowledge

Data Mining: A KDD Process  Data mining: the core of knowledge discovery process. Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation

9 Data pre-processing  Data preparation is a big issue for data mining  Data preparation includes  Data cleaning and data integration  Data reduction and feature selection  Discretization  A lot a methods have been developed but still an active area of research

10 Data pre-processing

11 Clustering  Partition data set into clusters, and one can store cluster representation only  Can have hierarchical clustering and be stored in multi-dimensional index tree structures  There are many choices of clustering definitions and clustering algorithms

12 Cluster Analysis

13 Classification  Classification is an extensively studied problem (mainly in statistics, machine learning & neural networks)  Classification is probably one of the most widely used data mining techniques with a lot of extensions  Scalability is still an important issue for database applications: thus combining classification with database techniques should be a promising topic  Research directions: classification of non-relational data, e.g., text, spatial, multimedia, etc..

14 Classification process  Model construction: describing a set of predetermined classes  Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute  The set of tuples used for model construction: training set  The model is represented as classification rules, decision trees, or mathematical formulae  Model usage: for classifying future or unknown objects  Estimate accuracy of the model  The known label of test sample is compared with the classified result from the model  Accuracy rate is the percentage of test set samples that are correctly classified by the model  Test set is independent of training set, otherwise over-fitting will occur

15 Classification Process (1): Model Construction Training Data Classification Algorithms IF rank = ‘ professor ’ OR years > 6 THEN tenured = ‘ yes ’ Classifier (Model)

16 Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?

17 Supervised vs. Unsupervised Learning  Supervised learning (classification)  Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations  New data is classified based on the training set  Unsupervised learning (clustering)  The class labels of training data is unknown  Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data

18 Document category modelling  Example: Filtering spam .  Task: classify incoming as spam and legitimate (2 document categories).  Simple blacklist and keyword-based methods have failed.  More intelligent, adaptive approaches are needed (e.g. naive Bayesian category modeling).

19 Document category modelling  Step 1 (linguistic pre-processing): Tokenization, removal of stopwords, stemming/lemmatization.  Step 2 (vector representation): bag-of-words or n-gram modeling (n=2,3).  Step 3 (feature selection): information gain evaluation.  Step 4 (machine learning): Bayesian modeling, using word/n-gram frequency.

20 What Is Association Mining?  Association rule mining:  Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.  Applications:  Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc.  Example.  Rule form: "Body  ead [support, confidence].  buys(x, "diapers )  buys(x, "beers ) [0.5%, 60%]

21 Association Rule: Basic Concepts  Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit)  Find: all rules that correlate the presence of one set of items with that of another set of items  E.g., 98% of people who purchase tires and auto accessories also get automotive services done  Applications  *  Maintenance Agreement (What the store should do to boost Maintenance Agreement sales)  Home Electronics  * (What other products should the store stocks up?)

22 Rule Measures: Support and Confidence  Find all the rules X & Y  Z with minimum confidence and support  support, s, probability that a transaction contains {X & Y & Z}  confidence, c, conditional probability that a transaction having {X & Y} also contains Z Customer buys diaper Custome r buys both Customer buys beer Find the rules with support and confidence equal or grater than a given threshold

23 Mining Association Rules An Example For rule A  C: support = support({A =>C}) = 50% confidence = support({A =>C})/support({A}) = 66.6% Min. support 50% Min. confidence 50%

24 References  U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press,  J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann,  T. Imielinski and H. Mannila. A database perspective on knowledge discovery. Communications of ACM, 39:58-64,  G. Piatetsky-Shapiro, U. Fayyad, and P. Smith. From data mining to knowledge discovery: An overview. In U.M. Fayyad, et al. (eds.), Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press,  G. Piatetsky-Shapiro and W. J. Frawley. Knowledge Discovery in Databases. AAAI/MIT Press, 1991.