National Centre for Agricultural Economics and Policy Research (NCAP), New Delhi Rajni Jain

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Trees Decision tree representation ID3 learning algorithm
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Classification Algorithms
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Decision Tree Algorithm (C4.5)
ICS320-Foundations of Adaptive and Learning Systems
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
Classification Techniques: Decision Tree Learning
Decision Tree Example MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.
Decision Tree Learning
Decision Tree Algorithm
CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Ensemble Learning: An Introduction
Induction of Decision Trees
Classification Continued
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Decision Trees Chapter 18 From Data to Knowledge.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Ch 3. Decision Tree Learning
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Learning Chapter 18 and Parts of Chapter 20
ID3 and Decision tree by Tuan Nguyen May 2008.
NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav.
By Wang Rui State Key Lab of CAD&CG
Machine Learning Chapter 3. Decision Tree Learning
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.
Artificial Intelligence 7. Decision trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Learning from Observations Chapter 18 Through
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Tree Learning
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
Elsayed Hemayed Data Mining Course
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
1 By: Ashmi Banerjee (125186) Suman Datta ( ) CSE- 3rd year.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Chapter 18 From Data to Knowledge
Decision Tree Learning
Classification Algorithms
Decision Tree Learning
Artificial Intelligence
Chapter 6 Classification and Prediction
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees Decision tree representation ID3 learning algorithm
Presentation transcript:

National Centre for Agricultural Economics and Policy Research (NCAP), New Delhi Rajni Jain

Expert System: Major Constraint Expert systems are automated systems that provide knowledge like experts and are useful when it is difficult to get expert advice Development of expert system is challenging because it is to hard to develop the complete rule base which can work like an expert: – Non availability of experts – Scarcity of time – Cost of their time

DM: An aid for development of Expert System DM: Extraction of potentially useful, meaningful and novel patterns from the data using suitable algorithms It is a great idea to use data mining tools and techniques to extract rules. reduce the time required for interactions with the domain experts Rules should never be used without validation from experts

Introduction  Importance of Rules  Directly from Decision Tables  DM – DT – RS – Fuzzy – ANN

Decision TREE Used to represent a sequence of decisions to arrive at a particular result Y N Colour White Red

Decision-Tree (ID3)  A decision-tree is a set of nodes and leaves where each node tests the value of an attribute and branches on all possible values.  ID3 is the first most popular decision-tree building algorithm.  A statistical property called information gain is used to decide which attribute is best for the node under consideration.  Greedy Algorithm, all attributes till the examples reach leaf node are retained for computation. Parent node attribute is eliminated.

CLS (Concept Learning System)  Step 1: If all instances in C are positive, then create YES node and halt. – If all instances in C are negative, create a NO node and halt. – Otherwise select a feature, F with values v1,..., vn and create a decision node.  Step 2: Partition the training instances in C into subsets C1, C2,..., Cn according to the values of V.  Step 3: apply the algorithm recursively to each of the sets Ci.

CLS contd..  Note, the trainer (the expert) decides which feature to select.

ID3 Algorithm  ID3 improves on CLS by adding a feature selection heuristic.  ID3 searches through the attributes of the training instances and extracts the attribute that best separates the given examples. If the attribute perfectly classifies the training sets then ID3 stops; otherwise it recursively operates on the n (where n = number of possible values of an attribute) partitioned subsets to get their "best" attribute.  The algorithm uses a greedy search, that is, it picks the best attribute and never looks back to reconsider earlier choices.

Data Requirements for ID3  Attribute value Description: The same attributes must describe each example and have a fixed number of values  Predefined classes: An example’s attributes must already be defined as they are not learned by ID3  Discrete Classes  Sufficient Examples

Attribute Selection in ID3  How does ID3 decides which attribute is the best? – Information Gain, a statistical property is used  Information gain measures how well a given attribute separates training examples into targeted classes  The attribute with the highest information gain is selected  Entropy measures the amount of the information

Entropy Given a collection S of c outcomes Entropy(S) =  -p(I) log2 p(I) where p(I) is the proportion of S belonging to class I.  is over c. Log2 is log base 2. Note that S is not an attribute but the entire sample set.

Example 1: Entropy If S is a collection of 14 examples with 9 YES and 5 NO examples then Entropy(S) = - (9/14) Log2 (9/14) - (5/14) Log2 (5/14) = Notice entropy is 0 if all members of S belong to the same class (the data is perfectly classified). The range of entropy is 0 ("perfectly classified") to 1 ("totally random").

Information Gain Gain(S, A) is information gain of example set S on attribute A is defined as Gain(S, A) = Entropy(S) -  ((|S v | / |S|) * Entropy(S v )) Where:  is each value v of all possible values of attribute A S v = subset of S for which attribute A has value v |S v | = number of elements in S v |S| = number of elements in S

Example 2: Information Gain Suppose S is a set of 14 examples in which one of the attributes is wind speed. The values of Wind can be Weak or Strong. The classification of these 14 examples are 9 YES and 5 NO. For attribute Wind, suppose there are 8 occurrences of Wind = Weak and 6 occurrences of Wind = Strong. For Wind = Weak, 6 of the examples are YES and 2 are NO. For Wind = Strong, 3 are YES and 3 are NO. Therefore Gain(S,Wind)=Entropy(S)-(8/14)*Entropy(S weak )-(6/14)*Entropy(S strong ) = (8/14)* (6/14)*1.00 = Entropy(S weak ) = - (6/8)*log2(6/8) - (2/8)*log2(2/8) = Entropy(S s trong ) = - (3/6)*log2(3/6) - (3/6)*log2(3/6) = 1.00 For each attribute, the gain is calculated and the highest gain is used in the decision node.

Example 3: ID3  Suppose we want ID3 to decide whether the weather is amenable to playing baseball. Over the course of 2 weeks, data is collected to help ID3 build a decision tree (see table 1).  The target classification is "should we play baseball?" which can be yes or no.  The weather attributes are outlook, temperature, humidity, and wind speed. They can have the following values:  outlook = { sunny, overcast, rain }  temperature = {hot, mild, cool }  humidity = { high, normal }  wind = {weak, strong }

DayOutlookTemperatureHumidityWindPlay ball D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolNormalWeakYes D6RainCoolNormalStrongNo D7OvercastCoolNormalStrongYes D8SunnyMildHighWeakNo D9SunnyCoolNormalWeakYes D10RainMildNormalWeakYes D11SunnyMildNormalStrongYes D12OvercastMildHighStrongYes D13OvercastHotNormalWeakYes D14RainMildHighStrongNo

Real Example: Summary of dataset  NSSO carried out a nationwide survey on cultivation practices as a part of its 54 th round from Jan-June 1998  The enquiry was carried out in the rural areas of India through house hold survey to ascertain the problems and utilization of facilities and the level of adoption in farms of different sizes in the country  Focus on the spread of improved agricultural technology  Data is extracted for the Haryana State

Dataset Summary  39 independent variables from Haryana  1 dependent variable (ifpesticide) value 1,2 – 1:adoption of PIF – 2:non-adoption of PIF  36-nominal, 4 real-valued attributes  1832 cases – 48% adopting PIF – 52% not adopting

Why Haryana? Reasonable dataset size (1832 cases) Independent variable (if pesticide) is some what uniformly distributed in Haryana Lakshdweep0Arunachal35 Mizoram1Bihar40 Sikkim2Maharastra43 Dadra8A&N43 Nagaland11Haryana48 Himachal14Goa52 Meghalaya16Karnatka55 Kerala20Tripura58 UP23Gujrat65 Rajasthan24Punjab71 MP27DamanDiu74 J&K29Andhra79 Manipur31TN80 Assam32WB82 Orissa32Pondicherry88 Delhi33Chnadigarh100 Distribution of Class Variable

Variables in Haryana-Farmers Dataset (NSSO, 54th round) #Attribute NameValue set#Attribute NameValue set 1SnoCropGroup{1,2,3,4,5}21jointforest{1,2} 2CropCode{1,2,3,4,5,6,7,8,9,11,99}22managetanks{1,2} 3Season{1,2}23iftreepatta{1,2} 4NumAreaSown{1,2,3,4,5,6}24iftimberright{1,2} 5iftractor{1,2,3}25howoftenright{1,2,9} 6ifepump{1,2,3}26schhhdcpr{1,2,9} 7ifoilpump{1,2,3}27anymemberpreventcpr{1,2} 8ifmanure{1,2,3}28LandWithRightOfSale{1,2,3,4,5} 9iffertilizers{1,2,3}29TotLandOwnedNum{1,2,3,4,5} 10ifimproveseed{0,1,2,3}30LandPossessNum{1,2,3,4,5} 11seedtype{1,2,3,4,9}31NASNum{1,2,3,4,5} 12typemanure{1,2,3,9}32ifsoiltested{1,2} 13ifhired{1,2,9}33ifsoilRecfollow{1,2,9} 14ifirrigate{1,2}34ifhhdtubewell{1,2} 15ifhiredirri{1,2,9}35iftubewellunused{1,2,9} 16ifweedicide{1,2}36irrigatesource{1,2,9} 17ifunusdelectricpump{1,2,9}37ifdieselpump{1,2} 18harvested{1,2,3}38ifunuseddieselpump{1,2,9} 19woodpurpose{1,2,3,4,5,9}39ifelectricpump{1,2} 20iflivestock{1,2}40ifpesticide{1,2}

Hypothetical Farmer Dataset IDCropSeedtypeAreaSownIffertilizerIfpesticide X1wheat2smallnoyes X2wheat3mediumyesno X3wheat1mediumyesno X4rice1mediumnoyes X5fodder2largenoyes X6rice3largeno X7rice2largeno X8wheat1smallyesno

Decision Tree (DT) using RDT Architecture for Hypothetical Farmer Data no Seedtype 1 Crop rice fodderwheat no yes ? Crop rice fodderwheat yes no yes 23 The Tree is simple to comprehend The tree can be easily traversed to generate rules

Description of the codes of the attributes in the Approximate Core based RDT model Attribute name#Code specifications Ifweedicide16yes -1, no-2 CropCode2paddy -1, wheat -2, other cereals -3, pulses- 4, oil seeds - 5, mixed crop – 6, sugarcane -7, vegetables -8, fodder -9, fruits & nuts -10, other cash crops -11, others 99 Iffertilizer9entirely -1, partly-2, none-3 Ifpesticide40yes-1, no-2

If weedicide CropCode Iffertilize r Approximate Core based RDT model for Farmers Adopting Plant Protection Chemicals 1 Ifweedicide: yes -1, no-2 Ifpesticide:yes-1, no-2 Cropecode: paddy -1, wheat -2, other cereals -3, pulses- 4, oil seeds -5, mixed crop – 6, sugarcane -7, vegetables -8, fodder -9, fruits & nuts -10, other cash crops -11, others 99 Iffertilizer: entirely - 1, partly-2, none-3

Decision Rules for Adopters 1.If Weedicide= yes then pesticide=yes 2.If weedicide=no and crop=paddy then pesticide=yes 3.If weedicide=no and crop=vegetables and fertilizer=entirely then pesticide=yes 4.If weedicide=no and crop=vegetables and fertilizer=none then pesticide=yes 5.If weedicide=no and crop=cash and fertilizer=entirely then pesticide=yes 6.If weedicide=no and crop=cash and fertilizer=partly then pesticide=yes

Decision Rules for Non- adopters 1.If weedicide=no and crop=(wheat, other than rice cereals, pulses, oil seeds, mixed crop, sugarcane, fodder, other crops) then pesticide=no 2.If weedicide=no and crop = vegetables and fertilizer=partly then pesticide=no 3.If weedicide=no and crop = other cash crops and fertilizer=none then pesticide=no

A case study of forwarning Powdery Mildew of Mango disease

YEART811H811T812H812T813H813T814H814 STAT US Dataset: PWM status for 11 years and corresponding temperature and humidity conditions

Techniques for Rule based learning FS DT RS NATIONAL CENTRE FOR AGRICULTURA L ECONOMICS AND POLICY RESEARCH Or The hybrids like RDT

Selection of Algorithms Accuracy: Fraction of test instances which are predicted accurately Interpretability: Ability to understand Complexity: No of conditions in the model No of rules No f variables Selection of an algorithm involves trade-off with the parameters

DT: Advantage It can be easily mapped to rules

IDTRAINTEST , 1996, 1997, , 1997, , Train and Test Data

The Prediction model for PWM Epidemic as obtained using CJP Algorithm on data as the training dataset H811T NATIONAL CENTRE FOR AGRICULTURA L ECONOMICS AND POLICY RESEARCH <=87 >87 <=30.7 >30.7

Rules NATIONAL CENTRE FOR AGRICULTURAL ECONOMICS AND POLICY RESEARCH If (H811>87) then Status = 1 If (H811<=87) and (T814<=30.71) then Status = 0 If (H ) then Status = 1

Conclusions Rule induction can be very useful for development of expert systems Strong linkage and collaborations are required among expert system developer, domain experts and data mining experts