Data Mining Functionalities

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Chapter 9 Business Intelligence Systems
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Business Intelligence
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Lingma Acheson Department of Computer and Information Science, IUPUI
DATA MINING & KNOWLEDGE DISCOVERY
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Data Mining Chun-Hung Chou
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
Understanding Data Analytics and Data Mining Introduction.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Chapter 1 Introduction to Data Mining
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Data Mining and Information Visualization Yan Liu, PhD Assistant Professor Department of Biomedical, Industrial and Human Factors Engineering Wright State.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Academic Year 2014 Spring Academic Year 2014 Spring.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Multivariate Analysis - Introduction. What is Multivariate Analysis? The expression multivariate analysis is used to describe analyses of data that have.
Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.
Data Mining.
Data Mining – Intro.
What Is Cluster Analysis?
By Arijit Chatterjee Dr
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Multivariate Analysis - Introduction
Data Mining Techniques and Applications
Data Mining.
Mining Association Rules
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Chapter 3 Introduction to Data Mining
Introduction to Data Mining
Adrian Tuhtan CS157A Section1
MIS5101: Data Analytics Advanced Analytics - Introduction
Sangeeta Devadiga CS 157B, Spring 2007
Data Mining Concept Description
Data Analysis.
Lingma Acheson Department of Computer and Information Science, IUPUI
Data Science introduction.
Data Mining: Concepts and Techniques
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
Classification & Prediction
Data Mining: Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
MIS2502: Data Analytics Introduction to Advanced Analytics and R
Multivariate Analysis - Introduction
Presentation transcript:

Data Mining Functionalities Data mining functionalities specify the kind of patterns to be found in data mining tasks. In general, data mining tasks can be classified into two categories: descriptive and predictive. Descriptive mining tasks characterize the general properties of the data in the database. Predictive mining tasks perform inference on the current data in order to make predictions.

Data Mining Functionalities But some times users may not know what kinds of patterns in their data may be interesting - Users like to search for several different kinds of patterns in parallel Data mining systems should be able to - Discover patterns at various granularity – different levels of abstraction. - Allow users to specify hints to focus the search for interesting patterns. Because some patterns may not hold for all of the data in the database, a measure of certainty or “trustworthiness” is usually associated with each discovered pattern.

Data Mining Functionalities Data mining functionalities, and the kinds of patterns they can discover, are: Concept/Class Description: Characterization and Discrimination Mining Frequent Patterns, Associations, and Correlations Classification and Prediction Cluster Analysis Outlier Analysis

Concept/Class Description Data can be associated with classes or concepts. Class : A collection of things sharing a common attribute Classes of items for sale include computers and printers Concept: An abstract or general idea inferred or derived from specific instances Concepts of customers include bigSpenders and budgetSpenders. Summarized, concise and precise descriptions of individual classes and concepts are called class/concept descriptions These descriptions can be derived via data characterization, data discrimination or both

Concept/Class Description Data characterization is a summary of the general characteristics or features of a target class of data. The data corresponding to the user-specified class are typically collected by a database query. For example, to study the characteristics of software products whose sales increased by 10% in the last year, the data related to such products can be collected by executing an SQL query. Simple data summaries can be done based on statistical measures and plots. The data cube–based OLAP roll-up operation can be used to perform data summarization along a specified dimension.

Concept/Class Description The output of data characterization can be presented in various forms. Examples include pie charts, bar charts, curves, multidimensional data cubes, and multidimensional tables. The resulting descriptions can also be presented as generalized relations or in rule form.

Concept/Class Description Example: A data mining system should be able to produce a description summarizing the characteristics of customers who spend more than $1,000 a year at AllElectronics. The result could be a general profile of the customers, such as they are 40–50 years old, employed, and have excellent credit ratings. The system should allow users to drill down on any dimension, such as on occupation in order to view these customers according to their type of employment.

Concept/Class Description Data discrimination is a comparison of the general features of target class data objects with the general features of objects from one or a set of contrasting classes. The target and contrasting classes can be specified by the user, and the corresponding data objects retrieved through database queries. For example, the user may like to compare the general features of software products whose sales increased by 10% in the last year with those whose sales decreased by at least 30% during the same period.

Concept/Class Description Example of Data discrimination. A data mining system should be able to compare two groups of AllElectronics customers, such as those who shop for computer products regularly versus those who rarely shop for such products

Concept/Class Description 80% of the customers who frequently purchase computer products are between 20 and 40 years old and have a university education 60% of the customers who infrequently buy such products are either seniors or youths, and have no university degree. Drilling down on a dimension, such as occupation, or adding new dimensions, such as income level, may help in finding even more discriminative features between the two classes.

Mining Frequent Patterns Patterns that occur frequently in data – Frequent Patterns Frequent itemset is a set of items that frequently appear together in a transactional data set, such as milk and bread. Subsequence is a (frequent) sequential pattern such as the pattern that customers tend to purchase first a PC, followed by a digital camera, and then a memory card Substructure can refer to different structural forms, such as graphs, trees, or lattices, which may be combined with itemsets or subsequences. Mining frequent patterns leads to the discovery of interesting associations and correlations within data.

Mining Associations and Correlations Association analysis example: An example rule mined from AllElectronics to determine which items are frequently purchased together within the same transactions where X is a variable representing a customer. A confidence says that if a customer buys a computer, there is a 50% chance that she will buy software as well. A support says that 1% of all of the transactions under analysis showed that computer and software were purchased together.

Mining Associations and Correlations Association rules that contain a single predicate are referred to as single-dimensional association rules. Example of multidimensional association rule 2% are 20 to 29 years of age with an income of 20,000 to 29,000 and have purchased a CD player There is a 60% probability that a customer in this age and income group will purchase a CD player.

Classification and Prediction Classification : process of finding a model that describes and distinguishes data classes or concepts. Use the model to predict the class of objects whose class label is unknown The derived model is based on the analysis of a set of training data How is the derived model presented? Classification (IF-THEN) rules, Decision trees Mathematical formulae, or neural networks.

Classification and Prediction Classification predicts categorical (discrete, unordered) labels. Prediction models continuous-valued functions. Prediction is used to predict missing or unavailable numerical data values rather than class labels. Regression analysis is a statistical methodology that is most often used for numeric prediction, although other methods exist as well.

Classification and Prediction Example: Classify a large set of items in the store, based on three kinds of responses to a sales campaign: good response, mild response, and no response. Derive a model for each of these three classes based on the descriptive features of the items, such as price, brand, place made, type, and category. IF-THEN rules:

Classification and Prediction Example: Decision tree: Predict the amount of revenue that each item will generate during an upcoming sale at AllElectronics, based on previous sales data.

Cluster Analysis Unlike classification and prediction, which analyse class-labelled data objects, clustering analyses data objects without consulting a known class label. The objects are clustered or grouped based on the principle of maximizing the intra-class similarity and minimizing the interclass similarity. Objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters.

Outlier Analysis A database may contain data objects that do not comply with the general behaviour or model of the data. These data objects are outliers. Most data mining methods discard outliers as noise or exceptions. Some applications such as fraud detection, the rare events can be more interesting than the more regularly occurring ones. Uncover fraudulent usage of credit cards by detecting purchases of extremely large amounts for a given account number in comparison to regular charges incurred by the same account.

Evolution Analysis Data evolution analysis describes and models regularities or trends for objects whose behaviour changes over time. Although this may include characterization, discrimination, association and correlation analysis, classification, prediction, or clustering of time-related data, distinct features of such an analysis include time-series data analysis, sequence or periodicity pattern matching, and similarity-based data analysis.

Evolution Analysis Suppose that you have the major stock market (time-series) data of the last several years available from the New York Stock Exchange. You would like to invest in shares of high-tech industrial companies. A data mining study of stock exchange data may identify stock evolution regularities for overall stocks and for the stocks of particular companies. Such regularities may help predict future trends in stock market prices, contributing to your decision making regarding stock investments.