Monday, January 31, 2000 Aiming Wu Department of Computing and Information Sciences, KSU Readings: Chown and Dietterich.

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

On-line learning and Boosting
My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
1 ICS102: Introduction To Computing King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science.
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Machine Learning Week 2 Lecture 1.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Motion Tracking. Image Processing and Computer Vision: 82 Introduction Finding how objects have moved in an image sequence Movement in space Movement.
Simple Neural Nets For Pattern Classification
x – independent variable (input)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Ensemble Learning: An Introduction
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Radial Basis Function Networks
Face Detection using the Viola-Jones Method
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Centro de Estudos e Sistemas Avançados do Recife PMBOK - Chapter 4 Project Integration Management.
Chapter 9 Neural Network.
S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.
University of Southern California Department Computer Science Bayesian Logistic Regression Model (Final Report) Graduate Student Teawon Han Professor Schweighofer,
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Friday, February 4, 2000 Lijun.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 21 Oct 28, 2005 Nanjing University of Science & Technology.
1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265
Universit at Dortmund, LS VIII
Benk Erika Kelemen Zsolt
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Mathematical Models & Optimization?
Iterative Improvement Algorithm 2012/03/20. Outline Local Search Algorithms Hill-Climbing Search Simulated Annealing Search Local Beam Search Genetic.
Chap. 5 Building Valid, Credible, and Appropriately Detailed Simulation Models.
Akram Bitar and Larry Manevitz Department of Computer Science
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, February 2, 2000.
Ensemble Methods in Machine Learning
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Building Valid, Credible & Appropriately Detailed Simulation Models
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 26, 2000.
Chapter 7. Classification and Prediction
CO Games Development 2 Week 22 Trees
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification with Perceptrons Reading:
Adaboost Team G Youngmin Jun
Data Mining Practical Machine Learning Tools and Techniques
Multilayer Perceptron & Backpropagation
Machine learning overview
CS 416 Artificial Intelligence
Akram Bitar and Larry Manevitz Department of Computer Science
Presentation transcript:

Monday, January 31, 2000 Aiming Wu Department of Computing and Information Sciences, KSU Readings: Chown and Dietterich Analytical Learning Discussion (3 of 4): Learning with Prior Knowledge Lecture 6

Paper “A divide-and-conquer approach to learning from prior knowledge” Eric Chown and Thomas G. Dietterich Reviewed by Aiming Wu

What’s the problem? Goal:calibrate the free parameters of MAPSS (Mapped Atmosphere-Plant- Soil System) Term:conceptual parameters--not directly measured, but summarize details.

MAPSS Purpose: predict the influence of global climate change on the distribution of plant ecosystems worldwide. Inputs: climate data from 1,211 USA weather stations and interpolating to 70,000sites. Outputs: amount of vegetation (LAI tuple: tree, grass, and shrub) and biome classification (Runoff?).

Calibration task Using manually chosen parameter values, predict outputs for LAI and Runoff. Define a error function: J (s, θ) = sum (predicted - actual) ^2 Find a value for θ so that sum of J(s, θ) over all sites s is minimal. Characteristics of the task: --Imagine the computational burden. --Non-linear nature: competition, threshold, exponential equations.

Approaches to attack the task Search approach: --gradient descent (hill-climbing) --simulated annealing --Powell’s method Set-interaction approach Decomposition approach: decompose the overall problems into independent sub-problems.

How to solve the problem?A divide- and-conquer calibration method Pre-requirement:there are sites where only a subset of the parameters are relevant to the MAPSS’ computations. Sub-problems: --Identify “operating regions” --Identify sites related to each region --Calibrate parameters in each operating region

Identify operating regions Identify control paths through MAPSS program that involve the relevant parameters. Problems: --MAPSS C program: hard to find a path. --Iterative search for good LAI values make the # of paths infinite. Approaches: --translate the MAPSS C program into a “declarative single- assignment, loop-free programming language” and analyze it using the partial evaluation techniques. --Using the actual LAI values instead of searching for them. --Start with M=1, and increasing. --Stop when M unknown parameters have been found.

Identify training examples for a path (1) EM-style algorithm --Initialize θ, compute the probability of each example belonging to each path. --Hold the probabilities, modify θ to minimize the expected error Problems with the algorithm --Global optimization, not good for large model. --Local maxima.

Identify training examples for a path (2) Data Gathering Algorithm --Filtering phase: initialize 40 random θs, get 40 training examples, compute J, select 20 models with the smallest J, test each example and determine its “pass” status until 40 examples have passed the filter. --Calibration phase: one-to-one map the examples to models, simulated annealing search, update the 40 models. Characteristics of the approach --Voted decision of the 40 models is more accurate than the decision of a single model. --The filtering set is robust to bad models (20 models). --The one-to-one match makes it robust to bad examples.

Calibrate the path Simulated annealing, start with parameter values from the best model found in the last filtering phase. Parameter temperature mechanism: calibrated parameters have low “temperatures”, reduced exponentially as a function of the number of previous paths that calibrate the parameters.

Was the problem solved? According to the authors, yes. “The results agree closely with the target values except for the top soil level saturated drainage parameters, which are evidentally somewhat underconstrained by the model and the data.”

Summary (1) Content critique --Key contribution: A decomposition approach of calibrating free parameters of large, complex, non-linear models --Strengths: Identify control paths of a program. Data Gathering Algorithm. --Weakness: Total is larger than the sum of its parts. Is it true here? What is the prior knowledge’s role?

Summary (2) Presentation critique --Audience: machine learning expert, with interest in large, complex scientific models. --Positive points: Detailed explanation of the MAPSS model and the divide-and-conquer approach. Good comparison of the authors’ approach with others’. --Negative points: Tiger’s head, snake’s tail. Need elaboration on the final results so that they look more convincing. Missed some important terms, such as Runoff, Vmin, etc.