Download presentation
Presentation is loading. Please wait.
Published byDrusilla Blair Modified over 9 years ago
1
Academic Year 2014 Spring Academic Year 2014 Spring
2
MODULE CC3005NI: Advanced Database Systems “Distributed Database (DDB) and Data Mining (DM)” (PART – 2) Academic Year 2014 Spring Academic Year 2014 Spring
3
Data Mining is a knowledge discovery process of automated extraction of hidden predictive information or patterns from data in large databases. Key Points Knowledge discovery in databases – KDD Automated process Extraction or searching for interesting / useful information or pattern or trend From large databases Data Mining:
4
Problem: Data Explosion Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories We are drowning in data, but starving for knowledge! Motivation:
5
Solution: Data Warehousing and Data Mining Data Warehousing and On-Line Analytical Processing (OLAP) Extraction of interesting knowledge (rules, regularities, patterns, future trends, predictions) from large databases. Motivation:
6
Data Mining aims to find information from data in: Relational Databases Data Warehouses Advanced Database and information repositories o Object Oriented and Object Relational Databases o Spatial Databases o Time-Series Data and Temporal Data o Text databases and Multimedia Databases o Heterogeneous and Legacy Databases o WWW Data Mining: From Where?
7
Data Mining is aimed at providing these capabilities: Automated discovery of previously unknown patterns o Data Mining tools sift through large amounts of data to discover meaningful new correlations and hidden patterns Automated prediction of trends and behaviours o Data Mining tools predict future trends, behaviours, allowing businesses to make proactive, knowledge driven decisions Results used to help business make better business decisions and to gain a competitive advantages Data Mining: Objectives
8
Database Analysis and Decision Support Market Analysis and Management o Target Marketing, Customer Relation Management, Market Basket Analysis, Market Segmentation Risk Analysis and Management o Forecasting, Customer Retention, Quality Control, Competitive Analysis Fraud Detection and Management o Identify unusual spending patterns, Irregularities Other Applications o Text Mining (News Group, email, Documents) and Web Analysis o Intelligent Query Answering Data Mining: Example
9
Data Mining: Interdisciplinary Subject Data Mining Database Technology Statistics Visualization Artificial Intelligence Machine Learning Other Disciplines
10
Process of Information / Knowledge Extraction is carried out repetitively, adaptively and progressively. Comprehension of application domain Preparation of Data sets Discovery of Patterns Evaluation of Patterns and implications Comprehension of Application Domain Develop a good understanding of application domain. Data Mining Process:
11
Preparation of data sets Identify a subset of data of database / Data Warehouse on which to carry out Data Mining Encode / cleaning data to make it suitable input to Data Mining Algorithms Discovery of Patterns Apply techniques of Data Mining on data set extracted earlier in order to discover repetitive patterns in data. Data Mining Process:
12
Evaluation of Patterns and Implications Draw implications from discovered patterns Evaluating which experiments to carry out next, which hypothesis to formulate, or which consequences to draw in process of knowledge discovery. Data Mining Process:
13
There are various techniques / algorithms used for Data Mining, including: Association Classification Sequential Patterns Patterns with Time Series Categorisation and Segmentation Data Mining: Some Techniques
14
Association rules discover regular patterns within large data sets, such as presence of two items within group of tuples. These rules discover situation in which presence of item in transaction is linked to presence of another item with high probability. Association Rules
15
Quality of association rules can be measured precisely, by defining properties of SUPPORT and CONFIDENCE. SUPPORT is minimum (percentage) of transactions (or baskets) containing both items A and B (A and B could both be single or group items) CONFIDENCE is minimum (percentage) of those baskets containing both items A and B, among those containing A. Association Rules
16
milk + bread + cereal milk + bread + sugar + eggs milk + bread + butter Shopping Baskets milk + bread + butter Customer - 1Customer - 2Customer - 3 Customer - n hmmm... which items are frequently purchased together by my customer? Marketing Analyst milkbreadsugarbuttercerealegg Basket – 1110010 Basket – 2111001 Basket – 3110100 Basket – n001001 Boolean Representation Association – Example Shopping Habits
17
Association – Example
18
Strategy 1: Place milk and bread within close proximity may further encourage customers to purchase these items together within single visits to store! How Data Mining (DM) Improves Business?
19
Strategy 2: Place milk and bread at opposite ends of store may entice customers who purchase such items to pick up other items along way! How Data Mining (DM) Improves Business?
20
Strategy 3: Put these two items into package at reduced price!!!
21
Classification Classify phenomenon in a predefined class. Place milk and bread within close proximity may further encourage customers to purchase these items together within single visits to store! Classifier is an algorithm that carries out classification Classifier is typically presented as decision trees. In these trees nodes are labeled by conditions that allow decision making. Examples: Motor Insurance Health Insurance
22
Classification – Example
24
Discover patterns between events such that presence of one set of items/objects in database of events over period of time. Detection of sequential patterns is equivalent to detecting association among events with certain temporal relationships (time dimension). Examples Understand and Analyse long term Customer buying Behaviour Medical Diagnosis Sequential Patterns
25
Discover links between two sets of data which are time dependent, and is based on degree of similarity between patterns that both time series demonstrate. Similarities can be detected within positions of time series Examples Stock Market Movement (Compare Market performance of Oct 2001 with Oct 2007) New home owners’ buying patterns within two months of purchase Products selling patterns in different seasons. Patterns with Time Series
26
Categorisation is process of partitioning given collection of events or items into a set of segments/clusters which share some common properties. Segments/Clusters may be predefined, or may be determined during process of categorisation Categorisation & Segmentation
27
Examples Classification of customer profile: by frequency of visits, types of financing used, amount of purchase, etc. Demographic information: age, income group, place of residence, buying habits, etc. Planning store promotions and advertisements, planning seasonal marketing strategies, planning additional stores. Categorisation & Segmentation
28
Other Data Mining Approaches
29
Typical Application Area of DM Finance and Banking Retails and Sales Credit Card Operations Medical Diagnoses and Healthcare Insurance Others
30
Integrated DM Environment To maximise its potential and performance, Data Mining tools must be fully integrated with Data Warehouse environment as well as flexible interactive business analysis tools. OLAP (On Line Analytical Processing) enables more sophisticated end user business model to be applied when navigating Data Warehouse Data Mining Server can be integrated with Data Warehouse and OLAP server. This integration enables operational decisions to be directly implemented and tracked. As warehouse expands with new decisions and results, organisation can continually mine best practices and apply them to future decisions.
31
Integrated DM Environment
32
Thank you!!! Questions are WELCOME Academic Year 2014 Spring Academic Year 2014 Spring
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.