IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 Knowledge-Driven Business Intelligence Systems: Part I Week 10 Dr. Jocelyn San Pedro School of Information.

Slides:



Advertisements
Similar presentations
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Advertisements

Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Decision Support Framework for BIS
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining Knowledge Discovery in Databases Data 31.
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 Summary and Revision Week 13 Dr. Jocelyn San Pedro School of Information Management & Systems Monash.
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Chapter Extension 12 Database Marketing.
Building Knowledge-Driven DSS and Mining Data
Data Mining – Intro.
Data Mining: A Closer Look
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Data Mining.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Basic Data Mining Techniques
Chapter 5: Data Mining for Business Intelligence
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Data warehousing Data Mining.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Succeeding with Technology Database Systems Basic Data Management Concepts Organizing Data in a Database Database Management Systems Using Database Systems.
Using Data Mining Technologies to find Currency Trading Rules A. G. Malliaris M. E. Malliaris Loyola University Chicago Multinational Finance Society,
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
DATA MINING By Cecilia Parng CS 157B.
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Scenario Management Data.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Elsayed Hemayed Data Mining Course
Academic Year 2014 Spring Academic Year 2014 Spring.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
Jerry Post Copyright © Database Management Systems: Data Mining Market Baskets Association Rules.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining Functionalities
Data Mining.
Data Mining – Intro.
By Arijit Chatterjee Dr
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Adrian Tuhtan CS157A Section1
Exam #3 Review Zuyin (Alvin) Zheng.
Data Analysis.
Market Basket Analysis and Association Rules
MIS2502: Data Analytics Introduction to Advanced Analytics
Kenneth C. Laudon & Jane P. Laudon
Business Intelligence
Presentation transcript:

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 Knowledge-Driven Business Intelligence Systems: Part I Week 10 Dr. Jocelyn San Pedro School of Information Management & Systems Monash University

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Lecture Outline  Knowledge-Driven BIS  Knowledge-Driven BIS Technologies  Data mining  Data Mining Techniques

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Learning Objectives At the end of this lecture, the students will  Have better understanding of knowledge-driven business intelligence systems  Have understanding of some data mining techniques used in knowledge-driven business intelligence systems  Have understanding of some data mining applications

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Knowledge-driven BIS information systems that provide BI through access and manipulation of predictive/descriptive models and/or knowledge bases (containing expert’s domain knowledge)  Predictive models – used to forecast explicit values based on patterns determined from known results  Descriptive models – describe patterns in existing data and are generally used to create meaningful subgroups such as demographic clusters  Knowledge Base – a collection of organised facts, rules and procedures

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Predictive models can provide answers to questions like  Which products should be promoted to a particular customer?  What is the probability that a certain customer will respond to a planned promotion?  Which securities will be most profitable to buy or sell during the next trading session?  What is the likelihood that a certain customer will default or pay back on schedule?  What is the appropriate medical diagnosis for this patient?

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Descriptive models Sample demographic clusters/ subgroups  Men who buy diapers also buy beer  People who buy scuba gear take Australian vacations  People who purchase skim milk also tend to buy whole wheat bread  Customers who responded to a particular offer are likely to respond to similar offer

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Knowledge-Driven BIS Technologies  Data Mining  Data Visualisation Data mining Positioning -

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Data Mining  Set of activities used to find new, hidden, or unexpected patterns in the data  Process of using raw data to infer business relationships  Collection of powerful data analysis techniques intended to assist in analysing extremely large datasets Marakas, 2002  Process of extracting knowledge hidden from large volumes of raw data

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Data Mining Techniques Classification – discover rules that define whether an item or event belongs to a particular subset or class of data  Involves building model; then predicting classifications  e.g. matching buyer attributes with product attributes  predict customers likely to buy a particular product next month  targeted promotional contact or mailing list

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Example: Using Decision Trees to Predict Classifications - ALICE d'ISoft # of customers in the database N: # and % of customers who had trouble paying back loan Y: # and % of customers who had no trouble paying back loan Graphical chart representing success rate Y and failure rate N A Credit Officer wishes to identify customers who had trouble paying back their loans. Parent Node

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Example: Using Decision Trees to Predict Classifications - ALICE d'ISoft Split the records according to most discriminating attribute: housing type

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Example Classification Rule: People who rent their home and earn more than 7853 Francs have an 86% success rate.

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Data Mining Techniques Association – or link analysis – search all details or transactions from operational systems for patterns with a high probability of repetition  Results to development of associative algorithm that correlates one set of events or items with another set of events or items  e.g. of association rules or patterns:  83% of all records that contain items A, B, C also contain items D and E  83% - confidence factor

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Data Mining Techniques Another example of link analysis:  Market basket analysis – analysing the products contained in a purchaser’s basket and then using an associative rule to compare hundreds of thousands of baskets  29% of the time that the brand X blender is sold, the customer also buys a set of kitchen tumblers  68% of the time that a customer buys beverages, the customer also buys pretzels >Determine the location and content of promotional or end-of-aisle displays

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Market Basket Analysis  This is the most widely used and, in many ways, most successful data mining algorithm.  It essentially determines what products people purchase together.  Stores can use this information to place these products in the same area.  Direct marketers can use this information to determine which new products to offer to their current customers.  Inventory policies can be improved if reorder points reflect the demand for the complementary products. Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Association Rules for Market Basket Analysis Rules are written in the form “left-hand side implies right- hand side” and an example is: Yellow Peppers IMPLIES Red Peppers, Bananas, Bakery To make effective use of a rule, three numeric measures about that rule must be considered: (1) support, (2) confidence and (3) lift Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Measures of Predictive Ability Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall LEFTRIGHTLEFTRIGHTLEFTRIGHT Support refers to the percentage of baskets where the rule was true (both left and right side products were present). Confidence measures what percentage of baskets that contained the left-hand product also contained the right. Lift measures how much more frequently the left-hand item is found with the right than without the right.

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, An Example  The confidence suggests people buying any kind of pepper also buy bananas.  Green peppers sell in about the same quantities as red or yellow, but are not as predictive. Rule: Green Peppers IMPLIES Bananas Red Peppers IMPLIES Bananas Yellow Peppers IMPLIES Bananas Lift Support Confidence Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Market Basket Analysis Methodology  We first need a list of transactions and what was purchased. This is pretty easily obtained these days from scanning cash registers.  Next, we choose a list of products to analyse, and tabulate how many times each was purchased with the others.  The diagonals of the table shows how often a product is purchased in any combination, and the off-diagonals show which combinations were bought. Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, A Convenience Store Example Consider the following simple example about five transactions at a convenience store: Transaction 1: Frozen pizza, cola, milk Transaction 2: Milk, potato chips Transaction 3: Cola, frozen pizza Transaction 4: Milk, pretzels Transaction 5: Cola, pretzels These need to be cross tabulated and displayed in a table. Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, A Convenience Store Example Produc t Bought Pizza also Milk also Cola also Chips also Pretzel s also Pizza21200 Milk13111 Cola21301 Chips01010 Pretzel s  Pizza and Cola sell together more often than any other combo; a cross-marketing opportunity?  Milk sells well with everything – people probably come here specifically to buy it. Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Limitations of Market Basket Analysis  A large number of real transactions are needed to do an effective basket analysis, but the data’s accuracy is compromised if all the products do not occur with similar frequency.  The analysis can sometimes capture results that were due to the success of previous marketing campaigns (and not natural tendencies of customers). Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Market Basket Analysis - PolyAnalyst Groups of products sold together well Association Rules Market Basket Analysis in PolyAnalyst

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, HealthCare Fraud Example Market Basket Analysis + Summary Statistics reveal providers sharing a large number of patients >>>Potential Provider Fraud

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Data Mining Techniques Sequencing or time-series analysis – techniques that relate events in time  Prediction of interest rate fluctuations or stock performance based on a series of preceding events  E.g. buying sequence: parents buy promotional toys associated with a particular movie within 2 weeks after renting the movie >flyer campaign for promotional toys should be linked to customer lists created a s a results of movie rentals  sequence of customer purchases > catalogue of specific product types can be target-mailed to the customer Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Association and Sequencing Association and sequencing tools analyse data to discover rules that identify patterns of behaviour. An association tool will find rules such as:  When people buy diapers they also buy beer 50 percent of the time. A sequencing technique is very similar to an association technique, but it adds time to the analysis and produces rules such as:  People who have purchased a VCR are three times more likely to purchase a camcorder in the time period two to four months after the VCR was purchased.

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Association and Sequencing Example in care management, procedure interactions and pharmaceutical interactions  Patients who are taking drugs A, B, and C are two and a half times more likely to also be taking drug D.  Patients receiving procedure X from Doctor Y are three times less likely to get infection Z.

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Association and Sequencing Example in financial industry:  The prices of stocks in industry Q are 1.8 times more likely to close up one day after stocks in industry R closed down.

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Association and Sequencing Example in fraud detection in telecommunications and insurance:  International credit card calls longer than three minutes originating in area code 555 between 1:00 AM and 3:00 AM are three times more likely to go uncollected.  Accident claims involving soft tissue trauma where attorney P represents the claimant are twice as likely to be fraudulent.

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Data Mining Techniques Clustering – technique for creating partitions so that all members of each set are similar according to some metric or set of metrics  e.g., credit card purchase data  Cluster 1: business-issues gold card, meals charged on weekdays, mean values greater than $250  Cluster 2: personal platinum card, meals charged on weekends, mean value $175, bottle of wine charged more than 65% of the time Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Clustering- Example Identifying natural clusters of patient populations

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Clustering- Example Identifying natural clusters of patient populations

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Current Limitations and Challenges to Data Mining Despite the potential power and value, data mining is still a new field. Some things that thus far have limited advancement are:  Identification of missing information – not all knowledge gets stored in a database  Data noise and missing values – future systems need better ways to handle this  Large databases and high dimensionality – future applications need ways to partition data into more manageable chunks Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Summary  Business intelligence systems with data mining tools allow the systems to find hidden patterns from large datasets, and use these patterns to turn data into actionable information  BIS using data mining tools need data visualisation tools, to present to the end-user such hidden patterns  Hidden patterns when placed onto the hands of decision makers, become actionable information or business intelligence

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, References Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall (or other editions) Power, D. (2002) Decision Support Systems: Concepts and Resources for Managers, Quorum Books. FREE online resource: Data Mining booklet

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, Questions? School of Information Management and Systems, Monash University T1.28, T Block, Caulfield Campus