Searching by Authority Artificial Intelligence CMSC 25000 February 12, 2008.

Slides:



Advertisements
Similar presentations
Introduction to Artificial Intelligence CS440/ECE448 Lecture 21
Advertisements

Expert Systems Reasonable Reasoning An Ad Hoc approach.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 1.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Iterative Dichotomiser 3 (ID3) Algorithm Medha Pradhan CS 157B, Spring 2007.
Machine Learning Decision Trees. Exercise Solutions.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 7 – Classification and Regression Trees
Machine learning learning... a fundamental aspect of intelligent systems – not just a short-cut to kn acquisition / complex behaviour.
Information Extraction Lecture 6 – Decision Trees (Basic Machine Learning) CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Induction and Decision Trees. Artificial Intelligence The design and development of computer systems that exhibit intelligent behavior. What is intelligence?
Chapter 12: Expert Systems Design Examples
Sparse vs. Ensemble Approaches to Supervised Learning
Learning: Identification Trees Larry M. Manevitz All rights reserved.
Decision Tree Learning
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Decision Trees Chapter 18 From Data to Knowledge.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Classification.
Sparse vs. Ensemble Approaches to Supervised Learning
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Information Theory, Classification & Decision Trees Ling 572 Advanced Statistical Methods in NLP January 5, 2012.
Learning: Nearest Neighbor Artificial Intelligence CMSC January 31, 2002.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
EIE426-AICV1 Machine Learning Filename: eie426-machine-learning-0809.ppt.
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
CSE (c) S. Tanimoto, 2002 Expert Systems 1 Expert Systems Outline: Various Objectives in Creating Expert Systems Integration of AI Techniques into.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Nearest Neighbor & Information Retrieval Search Artificial Intelligence CMSC January 29, 2004.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
Nearest Neighbor & Information Retrieval Search Artificial Intelligence CMSC January 29, 2004.
Learning with Decision Trees Artificial Intelligence CMSC February 20, 2003.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Clinical Decision Support 1 Historical Perspectives.
Decision Tree Learning
Learning with Decision Trees Artificial Intelligence CMSC February 18, 2003.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Medical Decision Making Learning: Decision Trees Artificial Intelligence CMSC February 10, 2005.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Iterative Dichotomiser 3 (ID3) Algorithm
K Nearest Neighbor Classification
Intro to Expert Systems Paula Matuszek CSC 8750, Fall, 2004
Learning with Identification Trees
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning Chapter 3. Decision Tree Learning
Hubs and Authorities & Learning: Perceptrons
Chapter 7: Transformations
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Presentation transcript:

Searching by Authority Artificial Intelligence CMSC February 12, 2008

“A Conversation with Students” Speaker: Bill Gates Title: Bill Gates Unplugged: On Software, Innovation, Entrepreneurshop, and Giving Back Date: February 20, 2008 Tickets: By lottery

Authoritative Sources Based on vector space alone, what would you expect to get searching for “search engine”? –Would you expect to get Google?

Conferring Authority Authorities rarely link to each other –Competition Hubs: –Relevant sites point to prominent sites on topic Often not prominent themselves Professional or amateur Good Hubs Good Authorities

Google’s PageRank Identifies authorities –Important pages are those pointed to by many other pages Better pointers, higher rank –Ranks search results –t: page pointing to A; C(t): number of outbound links d: damping measure –Actual ranking on logarithmic scale –Iterate

Contrasts Internal links –Large sites carry more weight If well-designed –H&A ignores site-internals Outbound links explicitly penalized Lots of tweaks….

Web Search Search by content –Vector space model Word-based representation “Aboutness” and “Surprise” Enhancing matches Simple learning model Search by structure –Authorities identified by link structure of web Hubs confer authority

Medical Decision Making Learning: Decision Trees Artificial Intelligence CMSC February 12, 2008

Agenda Decision Trees: –Motivation: Medical Experts: Mycin –Basic characteristics –Sunburn example –From trees to rules –Learning by minimizing heterogeneity –Analysis: Pros & Cons

Expert Systems Classic example of classical AI –Narrow but very deep knowledge of a field E.g. Diagnosis of bacterial infections –Manual knowledge engineering Elicit detailed information from human experts

Expert Systems Knowledge representation –If-then rules Antecedent: Conjunction of conditions Consequent: Conclusion to be drawn –Axioms: Initial set of assertions Reasoning process –Forward chaining: From assertions and rules, generate new assertions –Backward chaining: From rules and goal assertions, derive evidence of assertion

Medical Expert Systems: Mycin Mycin: –Rule-based expert system –Diagnosis of blood infections –450 rules: ~experts, better than junior MDs –Rules acquired by extensive expert interviews Captures some elements of uncertainty

Medical Expert Systems: Issues Works well but.. –Only diagnoses blood infections NARROW –Requires extensive expert interviews EXPENSIVE to develop –Difficult to update, can’t handle new cases BRITTLE

Modern AI Approach Machine learning –Learn diagnostic rules from examples –Use general learning mechanism –Integrate new rules, less elicitation Decision Trees –Learn rules –Duplicate MYCIN-style diagnosis Automatically acquired Readily interpretable cf Neural Nets/Nearest Neighbor

Learning: Identification Trees (aka Decision Trees) Supervised learning Primarily classification Rectangular decision boundaries –More restrictive than nearest neighbor Robust to irrelevant attributes, noise Fast prediction

Sunburn Example

Learning about Sunburn Goal: –Train on labeled examples –Predict Burn/None for new instances Solution?? –Exact match: same features, same output Problem: 2*3^3 feature combinations –Could be much worse –Nearest Neighbor style Problem: What’s close? Which features matter? –Many match on two features but differ on result

Learning about Sunburn Better Solution: –Identification tree: –Training: Divide examples into subsets based on feature tests Sets of samples at leaves define classification –Prediction: Route NEW instance through tree to leaf based on feature tests Assign same value as samples at leaf

Sunburn Identification Tree Hair Color Lotion Used Blonde Red Brown Alex: None John: None Pete: None Emily: Burn NoYes Sarah: Burn Annie: Burn Katie: None Dana: None

Simplicity Occam’s Razor: –Simplest explanation that covers the data is best Occam’s Razor for ID trees: –Smallest tree consistent with samples will be best predictor for new data Problem: –Finding all trees & finding smallest: Expensive! Solution: –Greedily build a small tree

Building ID Trees Goal: Build a small tree such that all samples at leaves have same class Greedy solution: –At each node, pick test such that branches are closest to having same class Split into subsets with least “disorder” –(Disorder ~ Entropy) –Find test that minimizes disorder

Minimizing Disorder Hair Color Blonde Red Brown Alex: N Pete: N John: N Emily: B Sarah: B Dana: N Annie: B Katie: N HeightWeightLotion Short Average Tall Alex:N Annie:B Katie:N Sarah:B Emily:B John:N Dana:N Pete:N Sarah:B Katie:N Light Average Heavy Dana:N Alex:N Annie:B Emily:B Pete:N John:N No Yes Sarah:B Annie:B Emily:B Pete:N John:N Dana:N Alex:N Katie:N

Minimizing Disorder HeightWeightLotion Short Average Tall Annie:B Katie:N Sarah:B Dana:N Sarah:B Katie:N Light Average Heavy Dana:N Annie:B No Yes Sarah:B Annie:B Dana:N Katie:N

Measuring Disorder Problem: –In general, tests on large DB’s don’t yield homogeneous subsets Solution: –General information theoretic measure of disorder –Desired features: Homogeneous set: least disorder = 0 Even split: most disorder = 1

Measuring Entropy If split m objects into 2 bins size m1 & m2, what is the entropy?

Measuring Disorder Entropy the probability of being in bin i Entropy (disorder) of a split Assume -½ log 2 ½ - ½ log 2 ½ = ½ +½ = 1 ½½ -¼ log 2 ¼ - ¾ log 2 ¾ = = ¾¼ -1log log 2 0 = = 001 Entropyp2p2 p1p1

Computing Disorder Disorder of class distribution on branch i Fraction of samples down branch i N instances Branch1 Branch 2 N1 a N1 b N2 a N2 b

Entropy in Sunburn Example Hair color = 4/8(-2/4 log 2/4 - 2/4log2/4) + 1/8*0 + 3/8 *0 = 0.5 Height = 0.69 Weight = 0.94 Lotion = 0.61

Entropy in Sunburn Example Height = 2/4(-1/2log1/2-1/2log1/2) + 1/4*0+1/4*0 = 0.5 Weight = 2/4(-1/2log1/2-1/2log1/2) +2/4(-1/2log1/2-1/2log1/2) = 1 Lotion = 0

Building ID Trees with Disorder Until each leaf is as homogeneous as possible –Select an inhomogeneous leaf node –Replace that leaf node by a test node creating subsets with least average disorder Effectively creates set of rectangular regions –Repeatedly draws lines in different axes

Features in ID Trees: Pros Feature selection: –Tests features that yield low disorder E.g. selects features that are important! –Ignores irrelevant features Feature type handling: –Discrete type: 1 branch per value –Continuous type: Branch on >= value Need to search to find best breakpoint Absent features: Distribute uniformly

Features in ID Trees: Cons Features –Assumed independent –If want group effect, must model explicitly E.g. make new feature AorB Feature tests conjunctive

From Trees to Rules Tree: –Branches from root to leaves = –Tests => classifications –Tests = if antecedents; Leaf labels= consequent –All ID trees-> rules; Not all rules as trees

From ID Trees to Rules Hair Color Lotion Used Blonde Red Brown Alex: None John: None Pete: None Emily: Burn NoYes Sarah: Burn Annie: Burn Katie: None Dana: None (if (equal haircolor blonde) (equal lotionused yes) (then None)) (if (equal haircolor blonde) (equal lotionused no) (then Burn)) (if (equal haircolor red) (then Burn)) (if (equal haircolor brown) (then None))

Identification Trees Train: –Build tree by forming subsets of least disorder Predict: –Traverse tree based on feature tests –Assign leaf node sample label Pros: Robust to irrelevant features, some noise, fast prediction, perspicuous rule reading Cons: Poor feature combination, dependency, optimal tree build intractable

C4.5 vs Mycin C4.5: Decision tree implementation Learning diagnosis –Trains on symptom set + diagnosis for blood infections (like Mycin) –Constructs decision trees/rules –Classification accuracy comparable to Mycin Diagnosis training requires only records –Automatically manages rule ranking –Automatically extracts expert-type rules