Short Introduction to Machine Learning Instructor: Rada Mihalcea.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Trees Decision tree representation ID3 learning algorithm
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Demo: Classification Programs C4.5 CBA Minqing Hu CS594 Fall 2003 UIC.
Classification Algorithms
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Lecture Notes for Chapter 4 Introduction to Data Mining
Decision Trees. DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
Artificial Intelligence 6. Machine Learning, Version Space Method
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
COMP3503 Intro to Inductive Modeling
Appendix: The WEKA Data Mining Software
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
1 COMP3503 Inductive Decision Trees with Daniel L. Silver Daniel L. Silver.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Decision-Tree Induction & Decision-Rule Induction
Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29 and 30– Decision Tree Learning; ID3;Entropy.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Learning from observations
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Machine Learning Concept Learning General-to Specific Ordering
Decision Tree Learning
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
COM24111: Machine Learning Decision Trees Gavin Brown
Computational Learning Theory Part 1: Preliminaries 1.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05 Prof. Pushpak Bhattacharyya Fundamentals of Information.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
DECISION TREES An internal node represents a test on an attribute.
Decision trees (concept learnig)
Machine Learning Lecture 2: Decision Tree Learning.
Classification Algorithms
CSE543: Machine Learning Lecture 2: August 6, 2014
CS 9633 Machine Learning Concept Learning
Prepared by: Mahmoud Rafeek Al-Farra
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Machine Learning: Lecture 3
Overview of Machine Learning
Decision Trees Decision tree representation ID3 learning algorithm
Decision Trees Decision tree representation ID3 learning algorithm
Presentation transcript:

Short Introduction to Machine Learning Instructor: Rada Mihalcea

Slide 1 Learning? What can we learn from here? If Sky=Sunny and Air Temperature = Warm  Enjoy Sport = Yes If Sky=Sunny  Enjoy Sport = Yes If Air Temperature = Warm  Enjoy Sport = Yes If Sky=Sunny and Air Temperature = Warm and Wind = Strong  Enjoy Sport = Yes ??

Slide 1 What is machine learning? (H.Simon) “Any process by which a system improves performance” (T.Mitchell) “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” Machine Learning has to do with designing computer programs that improve their performance through experience

Slide 1 Related areas Artificial intelligence Probability and statistics Computational complexity theory Information theory Human language technology

Slide 1 Applications of ML Learning to recognize spoken words SPHINX (Lee 1989) Learning to drive an autonomous vehicle ALVINN (Pomerleau 1989) Learning to classify celestial objects (Fayyad et al 1995) Learning to play world-class backgammon TD-GAMMON (Tesauro 1992) Learning to translate between languages Learning to classify texts into categories Web directories

Slide 1 Main directions in ML Data mining Finding patterns in data Use “historical” data to make a decision Predict weather based on current conditions Self customization Automatic feedback integration Adapt to user “behaviour” Recommending systems Writing applications that cannot be programmed by hand In particular because they involve huge amounts of data Speech recognition Hand writing recognition Text understanding

Slide 1 Terminology Learning is performed from EXAMPLES (or INSTANCES) An example contains ATTRIBUTES or FEATURES E.g. Sky, Air Temperature, Water In concept learning, we want to learn the value of the TARGET ATTRIBUTE Classification problems. Binary case +/–  positive/negative Attributes have VALUES: A single value (e.g. Warm) ? - indicates any value possible for this attribute  - indicates that no value is acceptable. All features in an example are sometimes referred to as FEATURE VECTOR

Slide 1 Terminology Feature vector for our learning problem: (Sky, Air Temp, Humidity, Wind, Water, Forecast) and the target attribute is EnjoySport. How to represent Aldo enjoys sports only on cold days with high humidity (?, Cold, High, ?, ?, ?) How about Emma enjoys sports regardless of the weather ? Hypothesis = the entire set of vectors that cover given examples Most general hypothesis (?, ?, ?, ?, ?, ?) Most specific hypothesis ( , , , , ,  ) How many hypothesis can be generated for our feature vector ?

Slide 1 Task in machine learning Given: A set of examples X A set of hypotheses H A target concept c Determine: A hypothesis h in H such that h(x) = c(x) Practically, we want to determine those hypotheses that would best fit our examples. (Sunny, ?, ?, ?, ?, ?) Yes (?, Warm, ?, ?, ?, ?) Yes (Sunny, Warm, ?, ?, ?, ?) Yes

Slide 1 Machine learning applications Until now: toy example, decide if X enjoys sport given the current and future forecast Practical problems: Part of speech tagging. How? Word sense disambiguation Text categorization Chunking. Whatever problem that can be modeled through examples should support learning

Slide 1 Machine learning algorithms Concept learning via searching on general-specific hypotheses Decision tree learning Instance based learning Rule based learning Neural networks Bayesian learning Genetic algorithms

Slide 1 Basic elements of information theory How to determine which attribute is the best classifier? Measure the information gain of each attribute Entropy characterizes the (im)purity of an arbitrary collection of examples. Given a collection S of positive and negative examples Entropy(S) = - p log p – q log q Entropy is at its maximum when p = q = ½ Entropy is at its minimum when p = 1 and q = 0 Example: S contains 14 examples: 9 positive and 5 negative Entropy(S) = - (9/14) log (9/14) – (5/14) log (5/14) = 0.94 log 0 = 0

Slide 1 Basic elements of information theory Information gain Measures the expected reduction in entropy Many learning algorithms are making decisions based on information gain

Slide 1 Basic elements of information theory

Slide 1 Decision trees

Slide 1 Decision trees

Slide 1 Decision trees Have the capability of generating rules: IF outlook=sunny and temperature = hot THEN play tennis = no Powerful! It would be very hard to do that as a human. C4.5 (Quinlan) ID3 Integral part of MLC++ Integral part of Weka (for Java)

Slide 1 Instance based algorithms Distance between examples Remember the WSD algorithm? K-nearest neighbour Given a set of examples X (a1(x), a2(x) … an(x)) Classify a new instance based on the distance between current example and all examples in training

Slide 1 Instance based algorithms Take into account every single example: Advantage? Disadvantage? “Do not forget exceptions” Very good for NLP tasks: WSD POS tagging

Slide 1 Measure learning performance Error on test data Sample error (generalization error): wrong cases / total cases True error: estimate an error range starting with the sample error Cross validation schemes – for more accurate evaluations 10 fold cross validation scheme Divide training data into 10 sets Use one set for testing, and the other 9 sets for training Repeat 10 times, measure average accuracy

Slide 1 Practical issues – Using Weka Weka – freeware Java implementation of many learning algorithms + boosting + capability of handling very large data sets + automatic cross – validation To run an experiment: file.arff [test optional – if not present, will evaluate through cross- validation]

Slide 1 Specify the feature types Specify the feature types: Discrete: value drawn from a set of nominal values Continuous: numeric value Example : Golf data Play, Don't Play. | the target attribute outlook: sunny, overcast, rain.| features. temperature: real. humidity: real. windy: true, false.

Slide 1 Weather Data sunny, 85, 85, false, Don't Play sunny, 80, 90, true, Don't Play overcast, 83, 78, false, Play rain, 70, 96, false, Play rain, 68, 80, false, Play rain, 65, 70, true, Don't Play overcast, 64, 65, true, Play sunny, 72, 95, false, Don't Play sunny, 69, 70, false, Play rain, 75, 80, false, Play sunny, 75, 70, true, Play overcast, 72, 90, true, Play overcast, 81, 75, false, Play rain, 71, 80, true, Don't Play

Slide 1 Running Weka Check “Short Intro to Weka”“Short Intro to Weka”