Decision Trees and Association Rules Prof. Sin-Min Lee Department of Computer Science.

Slides:

Advertisements

Similar presentations

COMP3740 CR32: Knowledge Management and Adaptive Systems

Advertisements

Modelling with expert systems. Expert systems Modelling with expert systems Coaching modelling with expert systems Advantages and limitations of modelling.

Heuristic Search techniques

Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.

Data Mining Tri Nguyen. Agenda Data Mining As Part of KDD Decision Tree Association Rules Clustering Amazon Data Mining Examples.

Part2 AI as Representation and Search

September 26, 2012Introduction to Artificial Intelligence Lecture 7: Search in State Spaces I 1 After our “Haskell in a Nutshell” excursion, let us move.

Intelligent systems Lecture 6 Rules, Semantic nets.

Artificial Intelligence

November 10, 2009Introduction to Cognitive Science Lecture 17: Game-Playing Algorithms 1 Decision Trees Many classes of problems can be formalized as search.

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

DATA MINING -ASSOCIATION RULES-

Case-based Reasoning System (CBR)

Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 7: Expert Systems and Artificial Intelligence Decision Support.

1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.

Fast Algorithms for Association Rule Mining

Chapter 12: Intelligent Systems in Business

Building Knowledge-Driven DSS and Mining Data

CS157A Spring 05 Data Mining Professor Sin-Min Lee.

Chapter 5 Data mining : A Closer Look.

Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.

Sepandar Sepehr McMaster University November 2008

Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.

Basic Data Mining Techniques

Data Mining and Decision Trees Prof. Sin-Min Lee Department of Computer Science.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.

1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.

Artificial Intelligence Introduction (2). What is Artificial Intelligence ?  making computers that think?  the automation of activities we associate.

Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.

What is Artificial Intelligence? AI is the effort to develop systems that can behave/act like humans. Turing Test The problem = unrestricted domains –human.

Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.

Short Introduction to Machine Learning Instructor: Rada Mihalcea.

CS62S: Expert Systems Based on: The Engineering of Knowledge-based Systems: Theory and Practice A. J. Gonzalez and D. D. Dankel.

Artificial Intelligence Lecture 9. Outline Search in State Space State Space Graphs Decision Trees Backtracking in Decision Trees.

COMP3503 Intro to Inductive Modeling

Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.

Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.

Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Decision Support Systems Chapter 10.

Decision Trees and Association Rules Prof. Sin-Min Lee Department of Computer Science.

1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.

1 N -Queens via Relaxation Labeling Ilana Koreh ( ) Luba Rashkovsky ( )

CS 415 – A.I. Slide Set 5. Chapter 3 Structures and Strategies for State Space Search – Predicate Calculus: provides a means of describing objects and.

CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)

Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.

EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.

Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.

Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.

1 The main topics in AI Artificial intelligence can be considered under a number of headings: –Search (includes Game Playing). –Representing Knowledge.

Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

CSCI 4310 Lecture 2: Search. Search Techniques Search is Fundamental to Many AI Techniques.

Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.

Data Mining and Decision Support

Search in State Spaces Problem solving as search Search consists of –state space –operators –start state –goal states A Search Tree is an efficient way.

February 11, 2016Introduction to Artificial Intelligence Lecture 6: Search in State Spaces II 1 State-Space Graphs There are various methods for searching.

ITEC 1010 Information and Organizations Chapter V Expert Systems.

G5AIAI Introduction to AI

Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.

Biointelligence Lab School of Computer Sci. & Eng. Seoul National University Artificial Intelligence Chapter 8 Uninformed Search.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Architecture Components

Intro to Expert Systems Paula Matuszek CSC 8750, Fall, 2004

Haskell Tips You can turn any function that takes two inputs into an infix operator: mod 7 3 is the same as 7 `mod` 3 takeWhile returns all initial.

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Implementation of Learning Systems

Presentation transcript:

Decision Trees and Association Rules Prof. Sin-Min Lee Department of Computer Science

Data Mining: A KDD Process –Data mining: the core of knowledge discovery process. Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation

Data Mining process model -DM

People and Computers WHAT IS INTELLIGENCE? Model of intelligent behavior Develop theories of human intelligence program computer models test validity of theories New knowledge gained from testing models COGNITIVE SCIENTISTS AI RESEARCHERS

Topics in Artificial Intelligence: 1)Machine Vision 2)Robotic 3)Speed Processing 4)Natural Language Processing 5)Theorem Proving 6)General Problem Solving 7)Pattern Recognition 8)Game Playing

DEVELOPING AN EXPERT SYSTEM STAGES OF DEVELOPMENT ARE ITERATIVE PROCESSES INDENTIFY PROBLEM INDENTIFY KEY CONCEPTS BOTH GENERAL AND SPECIFIC AND RELATIONSHIPS BETWEEN THEM DETERMIN WHICH TECHNIQUES WILL BE USED KNOWLEDGE REPRESENTATION SEARCH METHODS USE OF DEVELOPMENT TOOLS ADAPT EXISITING EXPERT SYSTEM? IMPLEMENT A PROTOTYPE REFINE PROTOTYPE TESTING DOES EXPERT SYSTEM WORK CORRECTLY

COMPONETS OF AN EXPERT SYSTEM USER INTERFACE COMMUNICATION IS BI-DIRECTIONAL IN SIMPLEST FORM DESCRIBE PROBLEM AND SYSTEM RESPONSE WITH ITS RECOMMENDATIONS ADDITIONAL FUNCTION ASK SYSTEM TO EXPLAIN ITS REASONING SYSTEM MAY REQUEST ADDITIONAL INFORMATION CRITICALLY IMPORTANT THAT AN EXPERT SYSTEM IS EASY TO USE NATURAL LANGUAGE FRONT END

COMPONENTS OF AN EXPERT SYSTEM INFERENCE ENGINE KNOWLEDGE IS NOT INTERTWINED WITH THE CONTROL STRUCTURE REDUCE EXPERT SYSTEM DEVELOPMENT TIME BY USING AN INFERENCE ENGINE FROM ONE EXPERT SYSTEM WITH A DIFFERENCE KNOWLEDGE BASE EMYCIN (Essential MYCIN) ELIMINATE NEED TO DEVELOP A NEW INFERENCE ENGINE

P ~ Q QS~ RQ~ P ~ SR ~ R v S ~ P v S~ R v Q ~ Q v S~ P v Q~ R v P NIL Input clauses Resolvent clauses of depth 1 Resolvent clauses of depth 2 Fig. 39 Process of resolution by breadth-first strategy R ( a ) P ( a, v ) Q ( a, y ) NIL ~S(b)S ( z ) ~ R ( u ) v P ( u, v ) ~P(x,y) v Q(x,y) ~Q(w,x) v S(z) Fig. 40 Resolution process using linear resolution

Why Knowledge-based Navigation?

Long-term memory Cognitive processor Working Memory Human cognitive architecture shown as compromising three independent elements.

Performance Level of Experience Performance Dependent on Experience

Performance Level of Experience Performance not Dependent on Experience

Modules of an Expert System An expert system can be divided into three modules: 1.A Knowledge Base 2.An Inference Engine 3.A User interface

Linda Fleming 1990 COMPONENTS OF AN EXPERT SYSTEM COMPONENTS OF THE KNOWLEDGE BASE DOMAIN OF EXPERT SYSTEM KNOWLEDGE BASE FACTS ABOUT DOMAIN HEURISTICS FOR DOMAIN

Linda Fleming 1990 COMPONENTS OF AN EXPERT SYSTEM USER INTERFACE INFERENCE ENGINE RULE INTERPRETER CONTROL STRATEGY KNOWLEDGE BASE RULES or FRAMES or SEMANTIC NETS etc. DATA BASE (WORKING MEMORY) SYSTEM STATUS INITIAL STATES PRESENT STATE FACTS USER

Search in State Spaces

Decision Trees A decision tree is a special case of a state-space graph. It is a rooted tree in which each internal node corresponds to a decision, with a subtree at these nodes for each possible outcome of the decision. Decision trees can be used to model problems in which a series of decisions leads to a solution. The possible solutions of the problem correspond to the paths from the root to the leaves of the decision tree.

Decision Trees Example: The n-queens problem How can we place n queens on an n  n chessboard so that no two queens can capture each other? A queen can move any number of squares horizontally, vertically, and diagonally. Here, the possible target squares of the queen Q are marked with an x.

Let us consider the 4-queens problem. Question: How many possible configurations of 4  4 chessboards containing 4 queens are there? Answer: There are 16!/(12!  4!) = (13  14  15  16)/(2  3  4) = 13  7  5  4 = 1820 possible configurations. Shall we simply try them out one by one until we encounter a solution? No, it is generally useful to think about a search problem more carefully and discover constraints on the problem’s solutions. Such constraints can dramatically reduce the size of the relevant state space.

Obviously, in any solution of the n-queens problem, there must be exactly one queen in each column of the board. Otherwise, the two queens in the same column could capture each other. Therefore, we can describe the solution of this problem as a sequence of n decisions: Decision 1: Place a queen in the first column. Decision 2: Place a queen in the second column.... Decision n: Place a queen in the n-th column.

Backtracking in Decision Trees Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q place 1 st queen place 2 nd queen place 3 rd queen place 4 th queen empty board

Neural Network Many inputs and a single output Trained on signal and background sample Well understood and mostly accepted in HEP Decision Tree Many inputs and a single output Trained on signal and background sample Used mostly in life sciences & business

Decision tree Basic Algorithm Initialize top node to all examples While impure leaves available –select next impure leave L –find splitting attribute A with maximal information gain –for each value of A add child to L

Decision tree Find good split Sufficient statistics to compute info gain: count matrix outlook humidity temperature windy gain: 0.25 bits gain: 0.16 bits gain: 0.03 bits gain: 0.14 bits

Decision trees Simple depth-first construction Needs entire data to fit in memory Unsuitable for large data sets Need to “scale up”

Decision Trees

Planning Tool

Decision Trees Enable a business to quantify decision making Useful when the outcomes are uncertain Places a numerical value on likely or potential outcomes Allows comparison of different possible decisions to be made

Decision Trees Limitations: –How accurate is the data used in the construction of the tree? –How reliable are the estimates of the probabilities? –Data may be historical – does this data relate to real time? –Necessity of factoring in the qualitative factors – human resources, motivation, reaction, relations with suppliers and other stakeholders

Process

Advantages

Disadvantages

Trained Decision Tree (Binned Likelihood Fit) (Limit)

Decision Trees from Data Base ExAttAttAttConcept NumSizeColourShapeSatisfied 1medbluebrickyes 2smallredwedgeno 3smallredsphereyes 4largeredwedgeno 5largegreenpillaryes 6largeredpillarno 7largegreensphereyes Choose target : Concept satisfied Use all attributes except Ex Num

Rules from Tree IF (SIZE = large AND ((SHAPE = wedge) OR (SHAPE = pillar AND COLOUR = red) ))) OR (SIZE = small AND SHAPE = wedge) THEN NO IF (SIZE = large AND ((SHAPE = pillar) AND COLOUR = green) OR SHAPE = sphere) ) OR (SIZE = small AND SHAPE = sphere) OR (SIZE = medium) THEN YES

Association Rule Used to find all rules in a basket data Basket data also called transaction data analyze how items purchased by customers in a shop are related discover all rules that have:- –support greater than minsup specified by user –confidence greater than minconf specified by user Example of transaction data:- – CD player, music’s CD, music’s book – CD player, music’s CD – music’s CD, music’s book – CD player

Association Rule Let I = {i 1, i 2, …i m } be a total set of items D a set of transactions d is one transaction consists of a set of items –d  I Association rule:- –X  Y where X  I,Y  I and X  Y =  –support = #of transactions contain X  Y D –confidence = #of transactions contain X  Y #of transactions contain X

Association Rule Example of transaction data:- – CD player, music’s CD, music’s book – CD player, music’s CD – music’s CD, music’s book – CD player I = {CD player, music’s CD, music’s book} D = 4 #of transactions contain both CD player, music’s CD =2 #of transactions contain CD player =3 CD player  music’s CD (sup=2/4, conf =2/3 );

Association Rule How are association rules mined from large databases ? Two-step process:- –find all frequent itemsets –generate strong association rules from frequent itemsets

Association Rules antecedent  consequent –if  then –beer  diaper (Walmart) –economy bad  higher unemployment –Higher unemployment  higher unemployment benefits cost Rules associated with population, support, confidence

Association Rules Population: instances such as grocery store purchases Support –% of population satisfying antecedent and consequent Confidence –% consequent true when antecedent true

2. Association rules Support Every association rule has a support and a confidence. “The support is the percentage of transactions that demonstrate the rule.” Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 1, 3, 5. 2: 1, 8, 14, 17, 12. 3: 4, 6, 8, 12, 9, : 2, 1, 8. support {8,12} = 2 (,or 50% ~ 2 of 4 customers) support {1, 5} = 1 (,or 25% ~ 1 of 4 customers ) support {1} = 3 (,or 75% ~ 3 of 4 customers)

2. Association rules Support An itemset is called frequent if its support is equal or greater than an agreed upon minimal value – the support threshold add to previous example: if threshold 50% then itemsets {8,12} and {1} called frequent

2. Association rules Confidence Every association rule has a support and a confidence. An association rule is of the form: X => Y X => Y: if someone buys X, he also buys Y The confidence is the conditional probability that, given X present in a transition, Y will also be present. Confidence measure, by definition: Confidence(X=>Y) equals support(X,Y) / support(X)

2. Association rules Confidence We should only consider rules derived from itemsets with high support, and that also have high confidence. “A rule with low confidence is not meaningful.” Rules don’t explain anything, they just point out hard facts in data volumes.

3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 3, 5, 8. 2: 2, 6, 8. 3: 1, 4, 7, 10. 4: 3, 8, 10. 5: 2, 5, 8. 6: 1, 5, 6. 7: 4, 5, 6, 8. 8: 2, 3, 4. 9: 1, 5, 7, 8. 10: 3, 8, 9, 10. Conf ( {5} => {8} ) ? supp({5}) = 5, supp({8}) = 7, supp({5,8}) = 4, then conf( {5} => {8} ) = 4/5 = 0.8 or 80%

3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 3, 5, 8. 2: 2, 6, 8. 3: 1, 4, 7, 10. 4: 3, 8, 10. 5: 2, 5, 8. 6: 1, 5, 6. 7: 4, 5, 6, 8. 8: 2, 3, 4. 9: 1, 5, 7, 8. 10: 3, 8, 9, 10. Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ? supp({5}) = 5, supp({8}) = 7, supp({5,8}) = 4, then conf( {8} => {5} ) = 4/7 = 0.57 or 57%

3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ? 57% Done. Rule ( {5} => {8} ) more meaningful then Rule ( {8} => {5} )

3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 3, 5, 8. 2: 2, 6, 8. 3: 1, 4, 7, 10. 4: 3, 8, 10. 5: 2, 5, 8. 6: 1, 5, 6. 7: 4, 5, 6, 8. 8: 2, 3, 4. 9: 1, 5, 7, 8. 10: 3, 8, 9, 10. Conf ( {9} => {3} ) ? supp({9}) = 1, supp({3}) = 1, supp({3,9}) = 1, then conf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?

3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) Conf( {9} => {3} ) = 100%. Done. Notice: High Confidence, Low Support. -> Rule ( {9} => {3} ) not meaningful

Association Rules Population –MS, MSA, MSB, MA, MB, BA –M=Milk, S=Soda, A=Apple, B=beer Support (M  S)= 3/6 –(MS,MSA,MSB)/(MS,MSA,MSB,MA,MB, BA) Confidence (M  S) = 3/5 –(MS, MSA, MSB) / (MS,MSA,MSB,MA,MB)