Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar.

Slides:



Advertisements
Similar presentations
Data Mining in Micro array Analysis
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
INC 551 Artificial Intelligence Lecture 11 Machine Learning (Continue)
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
. Markov Chains as a Learning Tool. 2 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow.
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Instance Based Learning
Week 9 Data Mining System (Knowledge Data Discovery)
Instance Based Learning. Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return the answer associated.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
1 Interacting with Data Materials from a Course in Princeton University -- Hu Yan.
Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1.
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.
Data Mining Adrian Tuhtan CS157A Section1.
Recommender systems Ram Akella November 26 th 2008.
INSTANCE-BASE LEARNING
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Jump to first page The objective of our final project is to evaluate several supervised learning algorithms for identifying pre-defined classes among web.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
DATA MINING AND MACHINE LEARNING Addison Euhus and Dan Weinberg.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Data Clustering 1 – An introduction
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Weka: a useful tool in data mining and machine learning Team 5 Noha Elsherbiny, Huijun Xiong, and Bhanu Peddi.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 9 Instance-Based.
1 Instance Based Learning Ata Kaban The University of Birmingham.
Learning from observations
Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
CpSc 881: Machine Learning Instance Based Learning.
CpSc 810: Machine Learning Instance Based Learning.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College Bio Informatics January
يادگيري ماشين Machine Learning Lecturer: A. Rabiee
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Data Mining and Decision Support
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
CS Machine Learning Instance Based Learning (Adapted from various sources)
K-Nearest Neighbor Learning.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
What Is Cluster Analysis?
Semi-Supervised Clustering
School of Computer Science & Engineering
Data Mining: Concepts and Techniques (3rd ed
Adrian Tuhtan CS157A Section1
Data Mining: Concepts and Techniques Course Outline
Prepared by: Mahmoud Rafeek Al-Farra
Nearest Neighbors CSC 576: Data Mining.
A task of induction to find patterns
Presentation transcript:

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar Fall, Presentation 8 – 11 Incremental Learning Friday, November 16 James Plummer Reference Mitchell, Tom M. “Machine Learning” MaGraw-Hill Companies

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Outline Machine Learning –Extracting information from data –Forming concepts The Data –Arrangement of Data Attributes, Labels, and Instances –Categorization of Data –Results MLJ ( Machine Learning in Java ) –Collection of Machine Learning algorithms –Current Inducers Incremental Learning –Description of technique –Nearest Neighbor Algorithm –Distance-Weighted Algorithm Advantages and Disadvantages –Gains and Loses.

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Machine Learning Sometimes called Data Mining The process of extracting useful information from data –Marketing databases, medical databases, weather databases Finding Consumer purchase patterns Used to form concepts –Predictions –Classifications –Numeric Answers

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) The Data Arrangement of Data –A piece of data is a set of attributes a i which make up an instance x j Attributes can be considered evidence –Each instance has a label or category f(x j ) (outcome value) x j = a 1, a 2, a 3,... a i ; f(x j ); –A set of data is a set of instances Categorization –A set of instances is used as control for new query instances x q (training) –Calculate f^(x j ) based on training data f^(x j ) is the predicted value of the actual f(x j ) Results –The number of correctly predicted values over the total number of query instances –f^(x q ) correct / f(x q ) total

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Data Example Predict the values of Example 6, 7, 8 given data examples 1 through 5 ExampleOutlookAir Temp HumidityWindPlay Tennis? 1SunnyHotHighWeakNo 2SunnyHotHighStrongNo 3OvercastHotHighWeakYes 4RainMildHighWeakYes 5RainCoolNormalWeakYes 6RainCoolNormalStrong 7OvercastCoolNormalStrong 8SunnyMildHighWeak Yes No Yes

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) MLJ (Machine Learning in Java) MLJ is a collection of learning algorithms –Inducers Categorize data to learn concepts Currently in Development –ID3 Uses trees –Naïve Bayes Uses complex calculations –C4.5 Uses trees with pruning techniques Incremental Learning –Uses comparison techniques –Soon to be added

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Incremental Learning Instance Based Learning –k-Nearest Neighbor –All instances correspond to points in an n-dimensional space The distance between two instances is determined by: a r (x) is the rth attribute of instance x –Given a query instance x q to be categorized the k-nearest neighbors are calculated –f^(x q ) is assigned the most frequent value of the nearest k f(x j ) –For k = 1, f^(x q ) will be assigned f(x i ) if x i is the closest instance in the space

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Distance-Weighted Nearest Neighbor Same as k-Nearest Neighbor –Effect of f(x j ) on f^(x q ) based on d(x q, x j ) –In the case x q = x i then f^(x q ) = f(x i ) –Examine three cases for the 2 dimensional space to the right k=1 k=5 Weighted, k=5

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Advantages and Disadvantages Gains of using k-Nearest Neighbor –Individual attributes can be weighted differently –Change d(x i, x q ) to allow nearest x i to have stronger of weaker effect on f^(x q ) –Unaffected by noise in training data –Very Effective when provided a large set of training data –Flexible, f^(x q ) can be calculated in many useful ways –Very small training time Loses –Not good when training data is insufficient –Not very effective if similar x i have disimilar f^(x i ) –More computation time need to categorize new instances

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Referrences Mitchell, Tom M. “Machine Learning” MaGraw-Hill Companies Witten and Frank, “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations”. Morgan Kaufmann publishers * equation reduced for simplicity