Download presentation
Presentation is loading. Please wait.
Published byJob Foster Modified over 9 years ago
1
Rosina Weber Knowledge acquisition and machine learning Reading textbook chapters 10 and 20 INFO 629 Dr. R. Weber
2
Rosina Weber Knowledge acquisition INFO 612 3/18/2003 Dr. R. Weber
3
Rosina Weber Knowledge acquisition Knowledge engineering knowledge acquisition & elicitation knowledge elicitation –steps, recommendations, issues, results KA tools and techniques Manual methods Interactive methods Automated methods
4
Rosina Weber + source of expertise knowledge engineer knowledge base inference procedures books documents humans facts KNOWLEDGE ENGINEERING Problem Assessment knowledge acquisition knowledge representation design testingdocumentation
5
Rosina Weber + Knowledge engineering source of expertise knowledge engineer Knowledge based system knowledge acquisition knowledge representation books documents humans facts KNOWLEDGE ENGINEERING
6
Rosina Weber Knowledge acquisition transference of expertise from a knowledge source to a program capture of expertise from a domain expert to be represented in a program Knowledge elicitation-KE
7
Rosina Weber Types of Knowledge From Durkin 1994
8
Rosina Weber Sources of Knowledge Experts End-users Multiple experts Reports Books Regulations Guidelines
9
Rosina Weber First steps in KE The knowledge engineer must: obtain a general view of the domain identify a framework to structure the new domain capture the reasoning style of the experts in the domain
10
Rosina Weber First meeting w/experts what is a KBS goals of system commitment (e.g., confidentiality) give the choice to leave
11
Rosina Weber Recommendations meet only once a week with each expert limit meetings to 40 min at most keep 2/3 of the interview to technical topics and 1/3 to general topics process each interview before the next one limit total meetings to 3 hs a day be sure never to mention other expert’s views employ same methods in the same order to all experts be consistent and provide a convenient environment From [DIA89]
12
Rosina Weber Interviews Unstructured interviews Structured interviews Observational Retrospective
13
Rosina Weber Issues in KE Compiled knowledge: can be executed but its internal structure cannot easily be understood (e.g., ride a bike). Knowledge that became so obvious that humans cannot explain. When asking something to an expert, he or she might try to answer things that are unknown or maybe compiled knowledge Psychologists do not identify an association between a verbal report and cognition
14
Rosina Weber Problems with KE Plausible lines of reasoning can have little to do with actual problem-solving. Academic knowledge may be obtained in place of compiled knowledge. Experts may be insecure. They could be afraid of losing their jobs; they may not want computers encroaching on their "private domain;" they may not want to expose their problem-solving methods to the scrutiny of colleagues or of the general public. Interpersonal interviewing problems can result when knowledge engineers are not trained in interviewing techniques. Protocol analysis (obs. & retrospec.) is labor intensive, error- prone, and results in a series of random behavior samples that must be synthesized by the knowledge engineer.
15
Rosina Weber Results of KE low productivity –knowledge engineers need to study the field –it is hard to find a framework to structure the new domain –experts reason at a low level of specificity
16
Rosina Weber KA tools and techniques 1.Manual methods 2.Interactive methods 3.Automated methods From: John H. Boose Knowledge Acquisition Tools, Methods, and Mediating Representations Copyright © 1990, John H. Boose. in Motoda, H., Mizoguchi, R., Boose, J. H., and Gaines, B. R. (Eds.) (1990). Proceedings of the First Japanese Knowledge Acquisition for Knowledge-Based Systems Workshop: JKAW-90, Ohmsha,Ltd: Japan.
17
Rosina Weber Manual methods (i) Brainstorming – rapidly generate a large number of ideas Interviewing –unstructured (general questions) – semi-structured (open questions+topics) – structured (strict agenda) – Neurolinguistic Programming (eye movement, body language) – tutorial
18
Rosina Weber Manual methods (ii) Knowledge Org. Techniques: – Card Sorting – ethnoscience techniques (names & categories) – knowledge analysis – mediating representations – overcoming bias – psychological scaling – uncertain information elicitation and representation Hoffman, (1987) describes various methods to elicit expertise with different advantages and disadvantages
19
Rosina Weber Manual methods (iii) Protocol Analysis Techniques – Participant Observation – Protocol Analysis (retrospective) User Interface Techniques – in wizard of oz technique, an expert simulates the behavior of a future system
20
Rosina Weber Interactive methods problem-to-method relationship –usually a domain specific problem employing a highly specialized method using much domain knowledge, or a general problem employing a general method with little domain knowledge) E.g., interdependency models representation languages –for defining and describing problems and methods, e.g., method ontologies intelligent editors –that help AI programmers construct large knowledge bases, e.g., CYC
21
Rosina Weber Automated Methods (i) Analogy – apply knowledge from old situations in similar new situations Apprenticeship Learning – learn by watching experts solve problems Neural Networks Discovery – Learn by experimentation and observation
22
Rosina Weber The picnic game Let’s practice how to learn rules? According to Michalski (1983) A theory and methodology of inductive learning. In Machine Learning, chapter 4, “inductive learning is a heuristic search through a space of symbolic descriptions (i.e., generalizations) generated by the application of rules to training instances.”
23
Rosina Weber Inductive Learning Definition According to Michalski (1983) A theory and methodology of inductive learning. In Machine Learning, chapter 4, “inductive learning is a heuristic search through a space of symbolic descriptions (i.e., generalizations) generated by the application of rules to training instances.”
24
Rosina Weber Inductive Learning Learning by generalization Performance of classification tasks –Also categorization Rules indicate categories Goal: –Characterize a concept
25
Rosina Weber Learner uses: –positive examples (instances ARE examples of a concept) and –negative examples (instances ARE NOT examples of a concept) Concept Learning is a Form of Inductive Learning
26
Rosina Weber Needs empirical validation Dense or sparse data determine quality of different methods Concept Learning
27
Rosina Weber The learned concept should be able to correctly classify new instances of the concept –When it succeeds in a real instance of the concept it finds true positives – When it fails in a real instance of the concept it finds false negatives Validation of Concept Learning i
28
Rosina Weber The learned concept should be able to correctly classify new instances of the concept –When it succeeds in a counterexample it finds true negatives –When it fails in a counterexample it finds false positives Validation of Concept Learning ii
29
Rosina Weber Rule Learning Learning algorithms widely used in data mining Decision Trees Neural Networks
30
Rosina Weber Decision trees Knowledge representation formalism Represent mutually exclusive rules (disjunction) A way of breaking up a data set into classes or categories Classification rules that determine, for each instance with attribute values, whether it belongs to one or another class
31
Decision trees consist of: - leaf nodes (classes) - decision nodes (tests on attribute values) - from decision nodes branches grow for each possible outcome of the test From Cawsey, 1997
32
Rosina Weber Decision tree induction Goal is to correctly classify all example data Several algorithms to induce decision trees: ID3 (Quinlan 1979), CLS, ACLS, ASSISTANT, IND, C4.5 Constructs decision tree from past data Not incremental Attempts to find the simplest tree (not guaranteed because it is based on heuristics)
33
Rosina Weber From: – a set of target classes –Training data containing objects of more than one class ID3 uses test to refine the training data set into subsets that contain objects of only one class each Choosing the right test is the key ID3 algorithm
34
Rosina Weber Information gain or ‘minimum entropy’ Maximizing information gain corresponds to minimizing entropy Predictive features (good indicators of the outcome) How does ID3 chooses tests
35
Rosina Weber Information gain is a statistical property Compute entropy How to best classify the training instances Predictive features (good indicators of the outcome) Choosing tests
36
Rosina Weber ID3 algorithm
37
Rosina Weber ID3 algorithm
38
Rosina Weber ID3 algorithm
39
Rosina Weber ID3 algorithm (cont’d)
40
Rosina Weber ID3 algorithm (cont’d) Yes No 3 times No No 4 times No Yes 3 times
41
Rosina Weber ID3 algorithm (cont’d) Yes No 3 times No No 4 times No Yes 3 times Single No 2 times Married No 3 times Divorced No 1 time Divorced Yes 1 time Single Yes 2 times
42
Rosina Weber ID3 algorithm (cont’d) Yes No 3 times No No 4 times No Yes 3 times Refund? No yes no
43
Rosina Weber ID3 algorithm (cont’d) Refund? No yes no
44
Rosina Weber ID3 algorithm (cont’d) Refund? No yes no
45
Rosina Weber ID3 algorithm (cont’d) Refund? No yes Marital Status? married Single No 2 times Married No 3 times Divorced No 1 time Divorced Yes 1 time Single Yes 2 times No
46
Rosina Weber ID3 algorithm (cont’d) Refund? No yes Marital Status? married Single, divorced Single No 2 times Married No 3 times Divorced No 1 time Divorced Yes 1 time Single Yes 2 times No Taxable Income?
47
Rosina Weber ID3 algorithm (cont’d) Refund? No yes Marital Status? married Single, divorced Single No 2 times Married No 3 times Divorced No 1 time Divorced Yes 1 time Single Yes 2 times No Taxable Income? No < 80K
48
Rosina Weber ID3 algorithm (cont’d) Refund? No yes Marital Status? married Single, divorced Single No 2 times Married No 3 times Divorced No 1 time Divorced Yes 1 time Single Yes 2 times No Taxable Income? No < 80K>80K Yes
49
Rosina Weber What rules can you use from this decision tree? Refund? No yes Marital Status? married Single, divorced No Taxable Income? No < 80K>80K Yes
50
Rosina Weber Knowledge Discovery 1 Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, and potential useful and understandable patterns in data. (R.Feldman,2000) 2 Knowledge Discovery from Processes 1.1 Data mining is one step in the KDD method. 1.2 Text mining concerns applying data mining techniques to unstructured text.
51
Rosina Weber Automated Methods (ii) Example Selection – select an appropriate set of examples for various learning techniques Explanation-Based Learning – deduce a general rule from a single example by relating it to an existing theory Function Induction – learn functions from input data Genetic Algorithm – crossing-over, mutation
52
Rosina Weber Automated Methods (iii) Performance Feedback – performance feedback is used to reinforce behavior Rule Induction Similarity-Based Learning – learn similarities from sets of positive examples and differences from sets of negative examples Systemic Principles Derivation – use general principles to derive specific laws
53
Rosina Weber References Boose, John H. (1990). Knowledge Acquisition Tools, Methods, and Mediating Representations. In Motoda, H., Mizoguchi, R., Boose, J. H., and Gaines, B. R. (Eds.) (1990). Proceedings of the First Japanese Knowledge Acquisition for Knowledge-Based Systems Workshop: JKAW-90, Ohmsha,Ltd: Japan. Buchanan, Bruce G. & Wilkins, David C. (eds.) Readings in Knowledge acquisition and learning: automating the construction and improvement of expert systems. Diaper,D. Knowledge elicitation - principles, techniques and applications. Chichester: John Wiley & Sons, 1989. p.96-97 Hoffman,R.R.. The Problem of extracting the knowledge of experts from the perspective of experimental psychology. AI Magazine, p. 53-67, 1987.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.