Presentation is loading. Please wait.

Presentation is loading. Please wait.

The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.

Similar presentations


Presentation on theme: "The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006."— Presentation transcript:

1 The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006

2 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide2 10/30/2006 Probabilistic Graph Models Overview Bayesian network and other probabilistic graph models

3 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide3 10/30/2006 Probabilistic Graph Models Bayesian networks (informal) A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link ≈ "directly influences") a conditional distribution for each node given its parents: P (X i | Parents (X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values

4 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide4 10/30/2006 Probabilistic Graph Models Example Topology of network encodes conditional independence assertions: Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity

5 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide5 10/30/2006 Probabilistic Graph Models Example I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects "causal" knowledge: A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call

6 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide6 10/30/2006 Probabilistic Graph Models Example contd.

7 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide7 10/30/2006 Probabilistic Graph Models Semantics The full joint distribution is defined as the product of the local conditional distributions: P (X 1, …,X n ) = π i = 1 P (X i | Parents(X i )) e.g., P(j  m  a   b   e) = P (j | a) P (m | a) P (a |  b,  e) P (  b) P (  e) n

8 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide8 10/30/2006 Probabilistic Graph Models Inference Given the data that “neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call”, how do we make a decision about the following four possible explanations: Nothing at all Burglary but not Earthquake Earthquake but not Burglary Burglary and Earthquake

9 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide9 10/30/2006 Probabilistic Graph Models Learning Suppose that we only have a joint distribution, how do you “learn” the topology of a BN?

10 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide10 10/30/2006 Probabilistic Graph Models Application: Clustering Users Input: TV shows that each user watches Output: TV show “clusters” Assumption: shows watched by same users are similar Class 1 Power rangers Animaniacs X-men Tazmania Spider man Class 4 60 minutes NBC nightly news CBS eve news Murder she wrote Matlock Class 2 Young and restless Bold and the beautiful As the world turns Price is right CBS eve news Class 3 Tonight show Conan O’Brien NBC nightly news Later with Kinnear Seinfeld Class 5 Seinfeld Friends Mad about you ER Frasier

11 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide11 10/30/2006 Probabilistic Graph Models App.: Finding Regulatory Networks Expression level in each module is a function of expression of regulators Experiment Gene Expression Module Regulator 1 Regulator 2 Regulator 3 Level What module does gene “g” belong to? Expression level of Regulator 1 in experiment BMH1  GIC2  0 0 0 2 1 Module P(Level | Module, Regulators) HAP4  CMK1  0 0 0

12 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide12 10/30/2006 Probabilistic Graph Models App.: Finding Regulatory Networks Ypl230w Hap4 Xbp1Yer184cYap6Gat1 Ime4 Lsg1 Msn4 Gac1Gis1 Not3Sip2 Amino acid metabolism Energy and cAMP signaling DNA and RNA processing nuclear STRE N41 HAP234 REPCAR CAT8 N26 ADR1 HSF HAC1 XBP1 MCM1 N30 ABF_C N36 Kin82 Cmk1 Tpk1Ppt1 N11 GATA GCN4 CBF1_B Tpk2Pph3 N14 N13 Bmh1 Gcn20 GCR1 MIG1 N18 1232533414 26394730423136516 8109131415171811 Regulation supported in literature Regulator (Signaling molecule) Regulator (transcription factor) Inferred regulation 48 Module (number) Experimentally tested regulator Enriched cis-Regulatory Motif

13 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide13 10/30/2006 Probabilistic Graph Models Constructing Bayesian networks Base: We know the joint distribution of X = X 1, …,X n We know the “topology” of X X i  X, we know the parents of X i Goal: we want to create a Bayesina network that capture the joint distribution according to the topology Theorem: such BN exists n n

14 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide14 10/30/2006 Probabilistic Graph Models Prove by Construction A leaf in X is a X i  X such that X i has no child. For each X i add X i to the network select parents from X 1, …,X i-1 such that P (X i | Parents(X i )) = P (X i | X 1,... X i-1 ) X = X – {X i } This choice of parents guarantees: P (X 1, …,X n ) = π i =1 P (X i | X 1, …, X i-1 ) (chain rule) = π i =1 P (X i | Parents(X i )) (by construction)

15 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide15 10/30/2006 Probabilistic Graph Models Compactness A CPT for Boolean X i with k Boolean parents has 2 k rows for the combinations of parent values Each row requires one number p for X i = true (the number for X i = false is just 1-p) If each variable has no more than k parents, the complete network requires O(n · 2 k ) numbers I.e., grows linearly with n, vs. O(2 n ) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5 -1 = 31)

16 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide16 10/30/2006 Probabilistic Graph Models Reasoning: Probability Theory Well understood framework for modeling uncertainty Partial knowledge of the state of the world Noisy observations Phenomenon not covered by our model Inherent stochasticity Clear semantics Can be learned from data

17 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide17 10/30/2006 Probabilistic Graph Models Probability Theory A (Discrete) probability P over ( , S = 2  ) is a mapping from elements in S such that:  is a set of all possible outcomes (sample space) in a probabilistic experiment, S is a set of “events” P(  )  0 for all  S P(  ) = 1 If ,  S and  = , then P(  )=P(  )+P(  ) Conditional Probability: Chain Rule: Bayes Rule: Conditional Independence:

18 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide18 10/30/2006 Probabilistic Graph Models Random Variables & Notation Random variable: Function from  to a non-negative real value such that summation of all the values is 1. Val(X) – set of possible values of RV X Upper case letters denote RVs (e.g., X, Y, Z) Upper case bold letters denote set of RVs (e.g., X, Y) Lower case letters denote RV values (e.g., x, y, z) Lower case bold letters denote RV set values (e.g., x) Eg. P(X = x), P(X) = {P(X=x) | x   }

19 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide19 10/30/2006 Probabilistic Graph Models Joint Probability Distribution Given a group of random variables X = X 1, …,X n, X i takes value from a set x i, the joint probability distribution is a function that maps elements in  = Π x i to a non-negative value such that the summation of all the values is 1. For example, RV weather takes four values “sunny, rainy, cloudy, snow”, RV Cavity takes 2 values “true, false” P(Weather,Cavity) = a 4 × 2 matrix of values: Weather =sunnyrainycloudysnow Cavity = true 0.1440.02 0.016 0.02 Cavity = false0.5760.08 0.064 0.08

20 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide20 10/30/2006 Probabilistic Graph Models Marginal Probability Given a set of RV X and its joint probabilities, a marginal probability distribution over X’  X is: Weather =sunnyrainycloudysnow Cavity = true 0.1440.02 0.016 0.02 Cavity = false0.5760.08 0.064 0.08 P(weather=sunny) = 0.144 + 0.576 = 0.72 P(Cavity=true) = 0.144+0.02+0.016 + 0.02 = 0.2

21 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide21 10/30/2006 Probabilistic Graph Models Independence Two RV X, Y are independent, denoted as X  Y if Conditional independence: X is independent of Y given Z if:  Weather =sunnyrainycloudysnow Cavity = true 0.1440.02 0.016 0.02 Cavity = false0.5760.08 0.064 0.08 P(weather=sunny) = 0.144 + 0.576 = 0.72 P(Cavity=true) = 0.144+0.02+0.016 + 0.02 = 0.2

22 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide22 10/30/2006 Probabilistic Graph Models Representing Joint Distributions Random variables: X 1,…,X n P is a joint distribution over X 1,…,X n If X 1,..,X n binary, need 2 n parameters to describe P Can we represent P more compactly? Key: Exploit independence properties

23 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide23 10/30/2006 Probabilistic Graph Models Independent Random Variables If X and Y are independent then: P(X, Y) = P(X|Y)P(Y) = P(X)P(Y) If X 1,…,X n are independent then: P(X 1,…,X n ) = P(X 1 )…P(X n ) O(n) parameters All 2 n probabilities are implicitly defined Cannot represent many types of distributions We may need to consider conditional independence

24 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide24 10/30/2006 Probabilistic Graph Models Conditional Parameterization S = Score on test, Val(S) = {s 0,s 1 } I = Intelligence, Val(I) = {i 0,i 1 } G = Grade, Val(G) = {g 0,g 1,g 2 } Assume that G and S are independent given I Joint parameterization 2  2  3=12-1=11 independent parameters Conditional parameterization has P(I,S,G) = P(I)P(S|I)P(G|I,S) = P(I)P(S|I)P(G|I) P(I) – 1 independent parameter P(S|I) – 2  1 independent parameters P(G|I) - 2  2 independent parameters 7 independent parameters

25 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide25 10/30/2006 Probabilistic Graph Models Naïve Bayes Model Class variable C, Val(C) = {c 1,…,c k } Evidence variables X 1,…,X n Naïve Bayes assumption: evidence variables are conditionally independent given C Applications in medical diagnosis, text classification Used as a classifier: Problem: Double counting correlated evidence

26 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide26 10/30/2006 Probabilistic Graph Models Bayesian Network A Formal Study A Bayesian network on a group of random variables X = X 1, …,X n is a tupple (T, P) such that The topology T  X  X is a directed acyclic graph A joint distribution P such that for all i  [1,n], for all possible value of x i and x s P(X i = x i | X s = x s ) = P(X i = x i | parents(X i ) = x s ) S = non-descendents of X i in X Or, X i is conditional independent of any of its non-descendent variables, given its parents(X i )

27 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide27 10/30/2006 Probabilistic Graph Models Factorization Theorem If G is an Independence-Map (I-map) of P, then Proof: X 1, …,X n is an ordering consistent with G By chain rule: From assumption: Since G is an I-Map  (X i ; NonDesc(X i )| Pa(X i ))  I(P)

28 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide28 10/30/2006 Probabilistic Graph Models Factorization Implies I-Map  G is an I-Map of P Proof: Need to show that P(X i | ND(X i )) = P(X i | Pa(X i )) D is the descendents of node I, ND all nodes except i and D

29 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide29 10/30/2006 Probabilistic Graph Models Probabilistic Graphical Models Tool for representing complex systems and performing sophisticated reasoning tasks Fundamental notion: Modularity Complex systems are built by combining simpler parts Why have a model? Compact and modular representation of complex systems Ability to execute complex reasoning patterns Make predictions Generalize from particular problem

30 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide30 10/30/2006 Probabilistic Graph Models Probabilistic Graphical Models Increasingly important in Machine Learning Many classical probabilistic problems in statistics, information theory, pattern recognition, and statistical mechanics are special cases of the formalism Graphical models provides a common framework Advantage: specialized techniques developed in one field can be transferred between research communities

31 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide31 10/30/2006 Probabilistic Graph Models Representation: Graphs Intuitive data structure for modeling highly-interacting sets of variables Explicit model for modularity Data structure that allows for design of efficient general- purpose algorithms

32 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide32 10/30/2006 Probabilistic Graph Models Reference “Bayesian Networks and Beyond”, Daphne Koller (Stanford) & Nir Friedman (Hebrew U.)


Download ppt "The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006."

Similar presentations


Ads by Google