Probability estimation and weights1 Weighting training sequences Why do we want to weight training sequences? Many different proposals – Based on trees.

Slides:



Advertisements
Similar presentations
Continuous Estimation in WLAN Positioning By Tilen Ma Clarence Fung.
Advertisements

Pattern Recognition and Machine Learning
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
Supervised Learning Recap
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Phylogenetic Trees Lecture 4
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
SNU BioIntelligence Lab. ( 1 Ch 5. Profile HMMs for sequence families Biological sequence analysis: Probabilistic models of proteins.
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Expectation Maximization Algorithm
Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Efficient Quantum State Tomography using the MERA in 1D critical system Presenter : Jong Yeon Lee (Undergraduate, Caltech)
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Radial Basis Function Networks
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
STATISTIC & INFORMATION THEORY (CSNB134)
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
HMM for multiple sequences
EM and expected complete log-likelihood Mixture of Experts
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Bayesian Phylogenetics. Bayes Theorem Pr(Tree|Data) = Pr(Data|Tree) x Pr(Tree) Pr(Data)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
An Introduction to Kalman Filtering by Arthur Pece
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
1 Chapter 5 Profile HMMs for Sequence Families. 2 What have we done? So far, we have concentrated on the intrinsic properties of single sequences (CpG.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
John Lafferty Andrew McCallum Fernando Pereira
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.
CS6772 Advanced Machine Learning Fall 2006 Extending Maximum Entropy Discrimination on Mixtures of Gaussians With Transduction Final Project by Barry.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
Additive Models , Trees , and Related Models Prof. Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiaotong University.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Chapter 12 Case Studies Part B. Control System Design.
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Statistical Models for Automatic Speech Recognition
Statistical Learning Dong Liu Dept. EEIS, USTC.
Hidden Markov Models Part 2: Algorithms
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Mathematical Foundations of BME Reza Shadmehr
LECTURE 05: THRESHOLD DECODING
LECTURE 23: INFORMATION THEORY REVIEW
Mathematical Foundations of BME
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A New Iteration Algorithm for Maximum Mutual Information Classifications on Factor Spaces ——Based on a Semantic information theory Chenguang Lu
Presentation transcript:

Probability estimation and weights1 Weighting training sequences Why do we want to weight training sequences? Many different proposals – Based on trees – Based on the 3D position of the sequences – Interested only in classifying family membership – Maximizing entropy

Probability estimation and weights2 Why do we want to weight training sequences? Parts of sequences can be closely related to each other and don’t deserve the same influence in the estimation process as a sequence which is highly diverted. – Phylogenetic trees – Sequences AGAA, CCTC, AGTC AGTC AGAACCTC

Probability estimation and weights3 Weighting schemes based on trees Thompson, Higgins & Gibson (1994) (Represents electric currents as calculated by Kirchhoff’s laws) Gerstein, Sonnhammer & Chothia (1994) Root weights from Gaussian parameters (Altschul-Caroll-Lipman weights for a three-leaf tree 1989)

Probability estimation and weights4 Thompson, Higgins & Gibson 123 Electric network of voltages, currents and resistances

Probability estimation and weights5 Thompson, Higgins & Gibson 123

Probability estimation and weights6 Gerstein, Sonnhammer & Chothia Works up the tree, incrementing the weights – Initially: weights are set to the edge lengths (resistances in previous example)

Probability estimation and weights7 Gerstein, Sonnhammer & Chothia

Probability estimation and weights8 Gerstein, Sonnhammer & Chothia Small difference with Thompson, Higgins & Gibson? 12

Probability estimation and weights9 Root weights from Gaussian parameters Continuous in stead of discrete members of an alphabet Probability density in stead of a substitution matrix Example: Gaussian

Probability estimation and weights10 Root weights from Gaussian parameters

Probability estimation and weights11 Root weights from Gaussian parameters Altschul-Caroll-Lipman weights for a tree with three leaves

Probability estimation and weights12 Root weights from Gaussian parameters 123

Probability estimation and weights13 Weighting schemes based on trees Thompson, Higgins & Gibson (Electric current): 1:1:2 Gerstein, Sonnhammer & Chothia: 7:7:8 Altschul-Caroll-Lipman weights for a tree with three leaves: 1:1:2

Probability estimation and weights14 Weighting scheme using ‘sequence space’ Voronoi weights = =

Probability estimation and weights15 More weighting schemes Maximum discrimination weights Maximum entropy weights – Based on averaging – Based on maximum ‘uniformity’ (entropy)

Probability estimation and weights16 Maximum discrimination weights Does not try to maximize likelihood or posterior probability It decides whether a sequence is a member of a family

Probability estimation and weights17 Maximum discrimination weights Discrimination D Maximize D, emphasis is on distant or difficult members

Probability estimation and weights18 Maximum discrimination weights Differences with previous systems – Iterative method Initial weights give rise to a model New calculated posterior probabilities P(M|x) gives rise to new weights and hence a new model until convergence is reached – It optimizes performance for that what the model is designed for : classifying whether a sequence is a member of a family

Probability estimation and weights19 More weighting schemes Maximum discrimination weights Maximum entropy weights – Based on averaging – Based on maximum ‘uniformity’ (entropy)

Probability estimation and weights20 Maximum entropy weights Entropy = A measure of the average uncertainty of an outcome (maximum when we are maximally uncertain about the outcome) Averaging:

Probability estimation and weights21 Maximum entropy weights Sequences AGAA CCTC AGTC

Probability estimation and weights22 Maximum entropy weights ‘Uniformity’:

Probability estimation and weights23 Maximum entropy weights Sequences AGAA CCTC AGTC

Probability estimation and weights24 Maximum entropy weights Solving the equations leads to:

Probability estimation and weights25 Summary of the entropy methods Maximum entropy weights (avaraging) Maximum entropy weights (‘uniformity’)

Probability estimation and weights26 Conclusion Many different methods Which one to use depends on problem Questions??