Presentation is loading. Please wait.

Presentation is loading. Please wait.

IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 Knowledge-Driven Business Intelligence Systems: Part I Week 10 Dr. Jocelyn San Pedro School of Information.

Similar presentations


Presentation on theme: "IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 Knowledge-Driven Business Intelligence Systems: Part I Week 10 Dr. Jocelyn San Pedro School of Information."— Presentation transcript:

1 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 Knowledge-Driven Business Intelligence Systems: Part I Week 10 Dr. Jocelyn San Pedro School of Information Management & Systems Monash University

2 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 2 Lecture Outline  Knowledge-Driven BIS  Knowledge-Driven BIS Technologies  Data mining  Data Mining Techniques

3 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 3 Learning Objectives At the end of this lecture, the students will  Have better understanding of knowledge-driven business intelligence systems  Have understanding of some data mining techniques used in knowledge-driven business intelligence systems  Have understanding of some data mining applications

4 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 4 Knowledge-driven BIS information systems that provide BI through access and manipulation of predictive/descriptive models and/or knowledge bases (containing expert’s domain knowledge)  Predictive models – used to forecast explicit values based on patterns determined from known results  Descriptive models – describe patterns in existing data and are generally used to create meaningful subgroups such as demographic clusters  Knowledge Base – a collection of organised facts, rules and procedures

5 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 5 Predictive models can provide answers to questions like  Which products should be promoted to a particular customer?  What is the probability that a certain customer will respond to a planned promotion?  Which securities will be most profitable to buy or sell during the next trading session?  What is the likelihood that a certain customer will default or pay back on schedule?  What is the appropriate medical diagnosis for this patient?

6 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 6 Descriptive models Sample demographic clusters/ subgroups  Men who buy diapers also buy beer  People who buy scuba gear take Australian vacations  People who purchase skim milk also tend to buy whole wheat bread  Customers who responded to a particular offer are likely to respond to similar offer

7 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 7 Knowledge-Driven BIS Technologies  Data Mining  Data Visualisation Data mining Positioning - http:// www.redbooks.ibm.com/redbooks/pdfs/sg245252.pdf

8 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 8 Data Mining  Set of activities used to find new, hidden, or unexpected patterns in the data  Process of using raw data to infer business relationships  Collection of powerful data analysis techniques intended to assist in analysing extremely large datasets Marakas, 2002  Process of extracting knowledge hidden from large volumes of raw data http://www.megaputer.com/dm/dm101.php3

9 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 9 Data Mining Techniques Classification – discover rules that define whether an item or event belongs to a particular subset or class of data  Involves building model; then predicting classifications  e.g. matching buyer attributes with product attributes  predict customers likely to buy a particular product next month  targeted promotional contact or mailing list

10 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 10 Example: Using Decision Trees to Predict Classifications - ALICE d'ISoft # of customers in the database N: # and % of customers who had trouble paying back loan Y: # and % of customers who had no trouble paying back loan Graphical chart representing success rate Y and failure rate N http://www.alice-soft.com/html/tech_dt.htm A Credit Officer wishes to identify customers who had trouble paying back their loans. Parent Node

11 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 11 Example: Using Decision Trees to Predict Classifications - ALICE d'ISoft http://www.alice-soft.com/html/tech_dt.htm Split the records according to most discriminating attribute: housing type

12 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 12 http://www.alice-soft.com/html/tech_dt.htm Example Classification Rule: People who rent their home and earn more than 7853 Francs have an 86% success rate.

13 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 13 Data Mining Techniques Association – or link analysis – search all details or transactions from operational systems for patterns with a high probability of repetition  Results to development of associative algorithm that correlates one set of events or items with another set of events or items  e.g. of association rules or patterns:  83% of all records that contain items A, B, C also contain items D and E  83% - confidence factor

14 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 14 Data Mining Techniques Another example of link analysis:  Market basket analysis – analysing the products contained in a purchaser’s basket and then using an associative rule to compare hundreds of thousands of baskets  29% of the time that the brand X blender is sold, the customer also buys a set of kitchen tumblers  68% of the time that a customer buys beverages, the customer also buys pretzels >Determine the location and content of promotional or end-of-aisle displays

15 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 15 Market Basket Analysis  This is the most widely used and, in many ways, most successful data mining algorithm.  It essentially determines what products people purchase together.  Stores can use this information to place these products in the same area.  Direct marketers can use this information to determine which new products to offer to their current customers.  Inventory policies can be improved if reorder points reflect the demand for the complementary products. Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

16 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 16 Association Rules for Market Basket Analysis Rules are written in the form “left-hand side implies right- hand side” and an example is: Yellow Peppers IMPLIES Red Peppers, Bananas, Bakery To make effective use of a rule, three numeric measures about that rule must be considered: (1) support, (2) confidence and (3) lift Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

17 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 17 Measures of Predictive Ability Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall LEFTRIGHTLEFTRIGHTLEFTRIGHT Support refers to the percentage of baskets where the rule was true (both left and right side products were present). Confidence measures what percentage of baskets that contained the left-hand product also contained the right. Lift measures how much more frequently the left-hand item is found with the right than without the right.

18 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 18 An Example  The confidence suggests people buying any kind of pepper also buy bananas.  Green peppers sell in about the same quantities as red or yellow, but are not as predictive. Rule: Green Peppers IMPLIES Bananas Red Peppers IMPLIES Bananas Yellow Peppers IMPLIES Bananas Lift1.371.431.17 Support3.778.5822.12 Confidence85.9689.4773.09 Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

19 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 19 Market Basket Analysis Methodology  We first need a list of transactions and what was purchased. This is pretty easily obtained these days from scanning cash registers.  Next, we choose a list of products to analyse, and tabulate how many times each was purchased with the others.  The diagonals of the table shows how often a product is purchased in any combination, and the off-diagonals show which combinations were bought. Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

20 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 20 A Convenience Store Example Consider the following simple example about five transactions at a convenience store: Transaction 1: Frozen pizza, cola, milk Transaction 2: Milk, potato chips Transaction 3: Cola, frozen pizza Transaction 4: Milk, pretzels Transaction 5: Cola, pretzels These need to be cross tabulated and displayed in a table. Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

21 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 21 A Convenience Store Example Produc t Bought Pizza also Milk also Cola also Chips also Pretzel s also Pizza21200 Milk13111 Cola21301 Chips01010 Pretzel s 01102  Pizza and Cola sell together more often than any other combo; a cross-marketing opportunity?  Milk sells well with everything – people probably come here specifically to buy it. Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

22 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 22 Limitations of Market Basket Analysis  A large number of real transactions are needed to do an effective basket analysis, but the data’s accuracy is compromised if all the products do not occur with similar frequency.  The analysis can sometimes capture results that were due to the success of previous marketing campaigns (and not natural tendencies of customers). Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

23 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 23 Market Basket Analysis - PolyAnalyst http://www.megaputer.com/products/pa/algorithms/ba.php3 Groups of products sold together well Association Rules Market Basket Analysis in PolyAnalyst

24 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 24 HealthCare Fraud Example Market Basket Analysis + Summary Statistics reveal providers sharing a large number of patients >>>Potential Provider Fraud http://www.megaputer.com

25 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 25 Data Mining Techniques Sequencing or time-series analysis – techniques that relate events in time  Prediction of interest rate fluctuations or stock performance based on a series of preceding events  E.g. buying sequence: parents buy promotional toys associated with a particular movie within 2 weeks after renting the movie >flyer campaign for promotional toys should be linked to customer lists created a s a results of movie rentals  sequence of customer purchases > catalogue of specific product types can be target-mailed to the customer Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

26 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 26 Association and Sequencing Association and sequencing tools analyse data to discover rules that identify patterns of behaviour. An association tool will find rules such as:  When people buy diapers they also buy beer 50 percent of the time. A sequencing technique is very similar to an association technique, but it adds time to the analysis and produces rules such as:  People who have purchased a VCR are three times more likely to purchase a camcorder in the time period two to four months after the VCR was purchased. http://www.dbmsmag.com/9807m03.html

27 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 27 Association and Sequencing Example in care management, procedure interactions and pharmaceutical interactions  Patients who are taking drugs A, B, and C are two and a half times more likely to also be taking drug D.  Patients receiving procedure X from Doctor Y are three times less likely to get infection Z. http://www.dbmsmag.com/9807m03.html

28 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 28 Association and Sequencing Example in financial industry:  The prices of stocks in industry Q are 1.8 times more likely to close up one day after stocks in industry R closed down. http://www.dbmsmag.com/9807m03.html

29 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 29 Association and Sequencing Example in fraud detection in telecommunications and insurance:  International credit card calls longer than three minutes originating in area code 555 between 1:00 AM and 3:00 AM are three times more likely to go uncollected.  Accident claims involving soft tissue trauma where attorney P represents the claimant are twice as likely to be fraudulent. http://www.dbmsmag.com/9807m03.html

30 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 30 Data Mining Techniques Clustering – technique for creating partitions so that all members of each set are similar according to some metric or set of metrics  e.g., credit card purchase data  Cluster 1: business-issues gold card, meals charged on weekdays, mean values greater than $250  Cluster 2: personal platinum card, meals charged on weekends, mean value $175, bottle of wine charged more than 65% of the time Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

31 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 31 Clustering- Example Identifying natural clusters of patient populations http://www.enee.umd.edu/medlab/papers/dcsThShort/thpaper1.html

32 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 32 Clustering- Example Identifying natural clusters of patient populations http://www.enee.umd.edu/medlab/papers/dcsThShort/thpaper1.html

33 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 33 Current Limitations and Challenges to Data Mining Despite the potential power and value, data mining is still a new field. Some things that thus far have limited advancement are:  Identification of missing information – not all knowledge gets stored in a database  Data noise and missing values – future systems need better ways to handle this  Large databases and high dimensionality – future applications need ways to partition data into more manageable chunks Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall

34 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 34 Summary  Business intelligence systems with data mining tools allow the systems to find hidden patterns from large datasets, and use these patterns to turn data into actionable information  BIS using data mining tools need data visualisation tools, to present to the end-user such hidden patterns  Hidden patterns when placed onto the hands of decision makers, become actionable information or business intelligence

35 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 35 References Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall (or other editions) Power, D. (2002) Decision Support Systems: Concepts and Resources for Managers, Quorum Books. FREE online resource: Data Mining booklet http://www.twocrows.com/intro-dm.pdf http://www.twocrows.com/intro-dm.pdf

36 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 36 Questions? Jocelyn.sanpedro@sims.monash.edu.au School of Information Management and Systems, Monash University T1.28, T Block, Caulfield Campus 9903 2735


Download ppt "IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1, 2004 Knowledge-Driven Business Intelligence Systems: Part I Week 10 Dr. Jocelyn San Pedro School of Information."

Similar presentations


Ads by Google