Lecture 16: Wrap-Up COMP 538 Introduction of Bayesian networks.

Lecture 16: Wrap-Up COMP 538 Introduction of Bayesian networks

Phylogeny / Slide 2 Nevin L. Zhang, HKUST Recap l Latent class models n Clustering n Clustering criterion: conditional independence n Drawback: Assumption too strong l Hierarchical latent class (HLC) models n Identifiability issues: regularity, equivalence n Hill climbing algorithm

Phylogeny / Slide 3 Nevin L. Zhang, HKUST Today l Phylogenetic (evolution) trees n Closely related to HLC models n An example of viewing existing models in the framework of BN –Another example: HMM n Interesting because –Ease understanding –Techniques in one field applied to another l Structural EM for phylogenetic trees l Dynamic BNs for speech understanding –Development of general purpose algorithms l Bayesian networks for classification n Hand waving only

Phylogeny / Slide 4 Nevin L. Zhang, HKUST Phylogenetic Tree Outline l Introduction to phylogenetic trees l Probabilistic models of evolution l Tree reconstruction

Phylogeny / Slide 5 Nevin L. Zhang, HKUST Phylogenetic Trees l Assumption n All organisms on Earth have a common ancestor n This implies that any set of species is related. l Phylogeny n The relationship between any set of species. l Phylogenetic tree n Usually, the relationship can be represented by a tree which is called a phylogenetic (evolution) tree –this is not always true

Phylogeny / Slide 6 Nevin L. Zhang, HKUST Phylogenetic Trees l Phylogenetic trees giant panda lesser panda moose goshawk vulture duck alligator Time Current-day species at bottom

Phylogeny / Slide 7 Nevin L. Zhang, HKUST Phylogenetic Trees l TAXA (sequences) identify species l Edge lengths represent evoluation time l Assumption: bifurcating tree toplogy Time AGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTT AAGACTT AGCACTT AAGGCCT AAGGCAT

Phylogeny / Slide 8 Nevin L. Zhang, HKUST l Characterize relationship between taxa using substitution probability: –P(x | y, t): probability that ancestral sequence y evolves into sequence x along an edge of length t –P(X 7 ), P(X 5 |X 7, t 5 ), P(X 6 |X 7, t 6 ), P(S 1 |X 5, t 1 ), P(S 2 |X 5, t 2 ), …. Probabilistic Models of Evolution s3s3 s4s4 s1s1 s2s2 t5t5 t6t6 t1t1 t2t2 t3t3 t4t4 x5x5 x6x6 x7x7

Phylogeny / Slide 9 Nevin L. Zhang, HKUST l What should P(x|y, t) be? l Two assumptions of commonly used models n There are only substitutions, no insertions/deletions (aligned) –One-to-one correspondence between sites in different sequences n Each site evolves independently and identically P(x|y, t) =  i=1 to m P(x(i) | y(i), t) n m is sequence length Probabilistic Models of Evolution AGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTT AAGACTT AGCACTT AAGGCCT AAGGCAT

Phylogeny / Slide 10 Nevin L. Zhang, HKUST l What should P(x(i )|y(i), t) be? n Jukes-Cantor (Character Evolution) Model [1969] –Rate of substitution  (Constant or parameter?) l Multiplicativity (lack of memory) Probabilistic Models of Evolution rtrt stst stst stst stst rtrt stst stst stst stst rtrt stst stst stst stst rtrt A C G T ACGT r t = 1/4 (1 + 3e -4  t ) s t = 1/4 (1 - e -4  t ) Limit values when t = 0 or t = infinity?

Phylogeny / Slide 11 Nevin L. Zhang, HKUST Tree Reconstruction l Given: collection of current-day taxa l Find: tree n Tree topology: T n Edge lengths: t l Maximum likelihood n Find tree to maximize P(data | tree) AGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTT AGGGCAT, TAGCCCA, TAGACTT, AGCACAA, AGCGCTT

Phylogeny / Slide 12 Nevin L. Zhang, HKUST l When restricted to one particular site, a phylogenetic tree is an HLC model where n The structure is a binary tree and variables share the same state space. n The conditional probabilities are from the character evolution model, parameterized by edge lengths instead of usual parameterization. n The model is the same for different sites Tree Reconstruction AGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTT AAGACTT AGCACTT AAGGCCT

Phylogeny / Slide 13 Nevin L. Zhang, HKUST Tree Reconstruction Current-day Taxa : AGGGCAT, TAGCCCA, TAGACTT, AGCACAA, AGCGCTT Samples for HLC model. One Sample per site. The samples are i.i.d. 1 st site : (A, T, T, A, A), 2 nd site : (G, A, A, G, G), 3 rd site : (G, G, G, C, C), n…n… AGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTT AAGACTT AGCACTT AAGGCCT

Phylogeny / Slide 14 Nevin L. Zhang, HKUST Tree Reconstruction l Finding ML phylogenetic tree == Finding ML HLC model l Model space: n Model structures: binary tree where all variables share the same state space, which is known. n Parameterization: one parameter for each edge. (In general, P(x|y) has |x||y|-1 parameters).

Phylogeny / Slide 15 Nevin L. Zhang, HKUST Bayesian Networks for Classification l The problem: n Given data: n Find mapping –(A1, A2, …, An) |- C l Possible solutions n ANN n Decision tree (Quinlan) n…n… A1A2…AnC 0110T 1011F..

Phylogeny / Slide 16 Nevin L. Zhang, HKUST Bayesian Networks for Classification l Naïve Bayes model n From data, learn –P(C), P(Ai|C) n Classification –arg max_c P(C=c|A1=a1, …, An=an) n Very good in practice

Phylogeny / Slide 17 Nevin L. Zhang, HKUST l Drawback of NB: n Attributes mutually independent given class variable n Often violated, leading to doubling counting. l Fixes: n General BN classifiers n Tree augmented Naïve Bayes (TAN) models n Hierarchical NB n…n… Bayesian Networks for Classification

Phylogeny / Slide 18 Nevin L. Zhang, HKUST l General BN classifier n Treat class variable just as another variable n Learn a BN. n Classify the next instance based on values of variables in the Markov blanket of the class variable. n Pretty bad because it does not utilize all available information Bayesian Networks for Classification

Phylogeny / Slide 19 Nevin L. Zhang, HKUST Bayesian Networks for Classification l TAN model n Friedman, N., Geiger, D., and Goldszmidt, M. (1997). Bayesian networks classifiers. Machine Learning, 29:131-163.Bayesian networks classifiers. n Capture dependence among attributes using a tree structure. n During learning, – First learn a tree among attributes: use Chow-Liu algorithm –Add class variable and estimate parameters n Classification –arg max_c P(C=c|A1=a1, …, An=an)

Phylogeny / Slide 20 Nevin L. Zhang, HKUST Bayesian Networks for Classification l Hierarchical Naïve Bayes models n N. L. Zhang, T. D. Nielsen, and F. V. Jensen (2002). Latent variable discovery in classification models. Artificial Intelligence in Medicine, to appear.Latent variable discovery in classification models. n Capture dependence among attributes using latent variables n Detect interesting latent structures besides classification n Currently, slow

Lecture 16: Wrap-Up COMP 538 Introduction of Bayesian networks.

Similar presentations

Presentation on theme: "Lecture 16: Wrap-Up COMP 538 Introduction of Bayesian networks."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 16: Wrap-Up COMP 538 Introduction of Bayesian networks.

Similar presentations

Presentation on theme: "Lecture 16: Wrap-Up COMP 538 Introduction of Bayesian networks."— Presentation transcript:

Similar presentations

About project

Feedback