Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning with Hypergraphs: Discovery of Higher-Order Interaction Patterns from High-Dimensional Data Moscow State University, Faculty of Computational.

Similar presentations


Presentation on theme: "Learning with Hypergraphs: Discovery of Higher-Order Interaction Patterns from High-Dimensional Data Moscow State University, Faculty of Computational."— Presentation transcript:

1 Learning with Hypergraphs: Discovery of Higher-Order Interaction Patterns from High-Dimensional Data Moscow State University, Faculty of Computational Mathematics and Cybernetics, Feb. 22, 2007, Moscow, Russia Byoung-Tak Zhang Biointelligence Laboratory School of Computer Science and Engineering Brain Science, Cognitive Science, Bioinformatics Programs Seoul National University Seoul , Korea I will talk about evolving DNA-encoded genetic programs in a test tube. We evaluate the potentials of this approach by solving a medical diagnosis problem on a simulated DNA computer. The individual genetic program represents a decision list of variable length and the whole population takes part in making probabilistic decisions.

2 Probabilistic Graphical Models (PGMs)
Represent the joint probability distribution on some random variables in graphical form. Undirected PGMs Directed PGMs Generative: The probability distribution for some variables given values of other variables can be obtained. Probabilistic inference C and D are independent given B. C asserts dependency between A and B. B and E are independent given C. A B C D E © 2007, SNU Biointelligence Lab,

3 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Kinds of Graphical Models Graphical Models Undirected Directed - Boltzmann Machines - Markov Random Fields - Bayesian Networks Latent Variable Models - Hidden Markov Models - Generative Topographic Mapping Non-negative Matrix Factorization © 2007, SNU Biointelligence Lab,

4 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Bayesian Networks BN = (S, P) consists of a network structure S and a set of local probability distributions P <BN for detecting credit card fraud> Structure can be found by relying on the prior knowledge of causal relationships © 2007, SNU Biointelligence Lab,

5 From Bayes Nets to High-Order PGMs
(1) Naïve Bayes J A F G S J (2) Bayesian Net F A G S (3) High-Order PGM J A F G S © 2007, SNU Biointelligence Lab,

6 The Hypernetworks

7 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Hypergraphs A hypergraph is a (undirected) graph G whose edges connect a non-null number of vertices, i.e. G = (V, E), where V = {v1, v2, …, vn}, E = {E1, E2, …, En}, and Ei = {vi1, vi2, …, vim} An m-hypergraph consists of a set V of vertices and a subset E of V[m], i.e. G = (V, V[m]) where V[m] is a set of subsets of V whose elements have precisely m members. A hypergraph G is said to be k-uniform if every edge Ei in E has cardinality k. A hypergraph G is k-regular if every vertex has degree k. Rem.: An ordinary graph is a 2-uniform hypergraph. © 2007, SNU Biointelligence Lab,

8 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
An Example Hypergraph E1 E3 G = (V, E) V = {v1, v2, v3, …, v7} E = {E1, E2, E3, E4, E5} E1 = {v1, v3, v4} E2 = {v1, v4} E3 = {v2, v3, v6} E4 = {v3, v4, v6, v7} E5 = {v4, v5, v7} v1 v2 E2 E4 v3 v4 v6 v5 v7 E5 © 2007, SNU Biointelligence Lab,

9 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Hypernetworks [Zhang, DNA-2006] A hypernetwork is a hypergraph of weighted edges. It is defined as a triple H = (V, E, W), where V = {v1, v2, …, vn}, E = {E1, E2, …, En}, and W = {w1, w2, …, wn}. An m-hypernetwork consists of a set V of vertices and a subset E of V[m], i.e. H = (V, V[m], W) where V[m] is a set of subsets of V whose elements have precisely m members and W is the set of weights associated with the hyperedges. A hypernetwork H is said to be k-uniform if every edge Ei in E has cardinality k. A hypernetwork H is k-regular if every vertex has degree k. Rem.: An ordinary graph is a 2-uniform hypergraph with wi=1. © 2007, SNU Biointelligence Lab,

10 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
A Hypernetwork x1 x2 x15 x3 x14 x4 x13 x5 x12 x6 x11 x7 x10 x8 x9 © 2007, SNU Biointelligence Lab,

11 Learning with Hypernetworks

12 The Hypernetwork Model of Learning
[Zhang, 2006] © 2007, SNU Biointelligence Lab,

13 Deriving the Learning Rule
© 2007, SNU Biointelligence Lab,

14 Derivation of the Learning Rule
© 2007, SNU Biointelligence Lab,

15 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
x1 =1 x2 =0 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 y = 1 1 x1 =0 x2 =1 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 y = 0 2 4 examples x1 =0 x2 x3 =1 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 y 3 x1 =0 x2 x3 x4 x5 x6 x7 x8 =1 x9 x10 x11 x12 x13 x14 x15 y 4 x8 x9 x12 x1 x2 x3 x4 x5 x6 x7 x10 x11 x13 x14 x15 Round 3 Round 1 Round 2 x4 x10 y=1 x1 x4 x12 y=1 x1 1 x10 x12 y=1 x4 x3 x9 y=0 x2 x3 x14 y=0 x2 2 x9 x14 y=0 x3 x6 x8 y=1 x3 x6 x13 y=1 x3 3 x8 x13 y=1 x6 x11 x15 y=0 x8 4 © 2007, SNU Biointelligence Lab,

16 Molecular Self-Assembly of Hypernetworks
xi xj y Molecular Encoding Hypernetwork Representation X1 X2 X8 X3 X7 X4 X6 X5 © 2007, SNU Biointelligence Lab,

17 Encoding a Hypernetwork with DNA
z1 : z2 : z3 : z4 : b) x1 x2 x3 x4 x5 y 1 where z1 : (x1=0, x2=1, x3=0, y=1) z2 : (x1=0, x2=0, x3=1, x4=0, x5=0, y=0) z3 : (x2=1, x4=1, y=1) z4 : (x2=1, x3=0, x4=1, y=0) a) AAAACCAATTGGAAGGCCATGCGG AAAACCAATTCCAAGGGGCCTTCCCCAACCATGCCC AATTGGCCTTGGATGCGG AATTGGAAGGCCCCTTGGATGCCC GG AAAA AATT AAGG CCTT CCAA ATGC CC Collection of (labeled) hyperedges Library of DNA molecules corresponding to (a) For example, a program x sub one equals one and x sub three equals one and x sub five equals one and y equals one in the form of decision lists or its DNA encoding denotes a decision rule saying diagnose the DNA sample as positive for disease y if contains all the three markers x sub one, x sub three and x sub five. © 2007, SNU Biointelligence Lab,

18 DNA Molecular Computing
Nanostructure Molecular recognition Self-assembly Self-replication Heat Cool Polymer Repeat © 2007, SNU Biointelligence Lab,

19 Learning the Hypernetwork (by Molecular Evolution)
Next generation Library of combinatorial molecules Library Example + The aim is to build a decision making system f that outputs label Select the library elements matching the example Amplify the matched library elements by PCR Hybridize [Zhang, DNA11] © 2007, SNU Biointelligence Lab,

20 Molecular Information Processing
MP4.avi © 2007, SNU Biointelligence Lab,

21 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
The Theory of Bayesian Evolution Evolution as a Bayesian inference process Evolutionary computation (EC) is viewed as an iterative process of generating the individuals of ever higher posterior probabilities from the priors and the observed data. generation 0 generation g P(A |D) P(A |D) ... P0(Ai) Pg(Ai |D) Pg(Ai) [Zhang, CEC-99] © 2007, SNU Biointelligence Lab,

22 Evolutionary Learning Algorithm for Hypernetwork Classifiers
1. Let the hypernetwork H represent the current distribution P(X,Y). 2. Get a training example (x,y). 3. Classify x using H as follows 3.1 Extract all molecules matching x into M. 3.2 From M separate the molecules into classes: Extract the molecules with label Y=0 into M0 Extract the molecules with label Y=1 into M1 3.3 Compute y*=argmaxY{0,1}| MY |/|M| 4. Update H If y*=y, then Hn ← Hn-1+{c(u, v)} for u=x and v=y for (u, v) Hn-1, If y*≠y, then Hn ← Hn-1{c(u, v)} for u=x and v ≠ y for (u, v) Hn-1 5.Goto step 2 if not terminated. © 2007, SNU Biointelligence Lab,

23 Learning with Hypergraphs: Application Results

24 Biological Applications
DNA-Based Molecular Diagnosis MicroRNA-Based Diagnosis Aptamer-Based Diagnosis

25 DNA-Based Diagnosis 120 samples from 60 leukemia patients
& 120 samples from 60 leukemia patients Gene expression data Class: ALL/AML Training Hypernets with 6-fold validation Diagnosis [Cheok et al., Nature Genetics, 2003] © 2007, SNU Biointelligence Lab,

26 Learning Curve Fitness evolution of the population of hyperedges
© 2007, SNU Biointelligence Lab,

27 Order Effects on Learning
Fitness curves for runs with fixed-cardinality hyperedges (card = 1, 4, 7, 10) © 2007, SNU Biointelligence Lab,

28 Aptamer-Based Cardiovascular Disease Diagnosis

29 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Training Data ▷ Disease : Cardiovascular Disease (CVD) ▷ Classes : 4 Classes [Normal / 1st / 2nd / 3rd Stages] ▷ The number of Samples : 135 Samples [N : 40 / 1st : 38 / 2nd : 19 / 3rd : 18] ▷ Preprocessing Convert to Real-value Feature Selection Using Gain Ratio Binarization Using MDL 3K Aptamer Array 3K Real-value Data 150 Real-value Data 150 Boolean Data ▷ Simulation Parameter Value 1) Order : 2 ~ 70 2) Sampling Rate : 50 3) In each case, 10 times repeated and averaged ▷ Classification : Majority voting with The Sum of Library Element Weight ▷ Training / Test Size : Traing 108 (80%) / Test 27 (20%) © 2007, SNU Biointelligence Lab,

30 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Learning & Classification by Hypernetworks Training Data X0=1 X1=1 X2=0 X3=0 X4=1 X5=1 X6=1 X7=0 X149=1 C=1 Sampling X0=0 X1=0 X2=0 X3=1 X4=1 X5=1 X6=0 X7=0 X149=1 C=0 Binarization X0=0 X1=0 X2=1 X3=1 X4=0 X5=1 X6=0 X7=1 X149=1 C=1 Test Data X0=0 X1=1 X2=1 X3=1 X4=0 X5=0 X6=0 X7=1 X149=1 C=1 X0=1 X1=0 X2=1 X3=1 X4=0 X5=0 X6=0 X7=1 X149=1 C=0 X0=1 X1=1 X2=0 X3=0 C=1 W=1000 Data Set Source Data X0=1 X4=1 X6=1 X7=0 C=1 W=1000 X18=1 X35=0 X68=1 X82=0 C=1 W=1000 Learining Loop [Evolution Stage] X6=0 X7=0 X8=0 X9=1 C=0 W=1000 X14=0 X4=1 X5=1 X7=0 C=0 W=1000 Adjust Learning Rate X22=0 X4=1 X6=0 X149=1 C=0 W=1000 X0=1 X1=1 X2=0 X3=0 C=1 W’=1 X1=0 X33=1 X4=0 X9=1 C=1 W=1000 X0=1 X4=1 X6=1 X7=0 C=1 W’=45 X3=1 X6=0 X52=1 X8=0 C=1 W=1000 X18=1 X35=0 X68=1 X82=0 C=1 W’=4000 X0=0 X2=1 X4=0 X5=1 C=1 W=1000 Library Weight Update Test X6=0 X7=0 X8=0 X9=1 C=0 W’=12 X14=0 X4=1 X5=1 X7=0 C=0 W’=8530 Weight Update Rule (Learning) : Error Correction In case that all index-value matched, If Class is correct, w = w*1.0001 Else w = w*0.95. X22=0 X4=1 X6=0 X149=1 C=0 W’=500 X1=0 X33=1 X4=0 X9=1 C=1 W’=1300 Training Data X3=1 X6=0 X52=1 X8=0 C=1 W’=4 Test Data X0=0 X2=1 X4=0 X5=1 C=1 W’=14 Library © 2007, SNU Biointelligence Lab,

31 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Simulation Result (1/3) ▷ Training & test errors as learning goes on (order k=12) © 2007, SNU Biointelligence Lab,

32 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Simulation Result (2/3) ▷ Accuracy on test data as learning goes on (order k=12) © 2007, SNU Biointelligence Lab,

33 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Simulation Result (3/3) ▷ The effect of learning © 2007, SNU Biointelligence Lab,

34 Mining Cancer-Related MicroRNA Modules from miRNA Expression Profiles

35 Gene Regulation by microRNAs
MicroRNAs (miRNAs) are endogenous about 22 nt RNAs that can play important regulatory roles in animals, plants and viruses. Post-transcriptional gene regulation Binding target genes for degradation or translational repression Recently, miRNAs are reported that related to the cancer development and progression. © 2007, SNU Biointelligence Lab,

36 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Dataset Tissue type Cancer Normal Bladder 1 6 Breast 3 Colon 4 7 Kidney Lung 2 5 Pancreas 8 Prostate Uterus 10 Melanoma Mesothelioma Ovary All tissues 21 68 The miRNA expression microarray data The expression profiles of miRNA in human among 11 tumors, which were bladder, breast, colon, kidney, lung, pancreas, prostate, uterus, melanoma, mesothelioma, ovary tissue (Lu et al., 2005). This dataset consists of an expression matrix of 151 miRNAs (rows) and 89 samples (columns). © 2007, SNU Biointelligence Lab, 36

37 Representing a Hypernetwork from miRNA Expression Data
Class cancer X=1 0 X=3 0 X=6 0 ……. X=151 1 Class normal 1 2 89 Data item : 151 miRNAs samples Library (normal or cancer classification rules) A hypernetwork H = (X, E, W) of DNA Molecules 1 2 X=1 X=2 cancer X=10 X=20 normal X=1 X=45 cancer X=10 X=31 cancer X=1 X=80 normal X=31 X=20 normal X=1 X=2 cancer 89 X=1 X=2 cancer X=1 X=45 cancer X=1 X=45 cancer X=1 X=2 cancer © 2007, SNU Biointelligence Lab,

38 Performance Leave-one-out cross-validation 79.77 % 83.15 % 88.76 %
Algorithms Correct classification rate Bayesian Network 79.77 % Naïve Bayes 83.15 % ID3 88.76 % Hypernetworks 90.00% Sequential Minimal Optimization (SMO) 91.01 % Multi-layer perceptron (MLP) 92.13 % © 2007, SNU Biointelligence Lab,

39 Accuracy vs. Order for Test Data (sampling only)
© 2007, SNU Biointelligence Lab,

40 Learning Curves for Training Data
© 2007, SNU Biointelligence Lab,

41 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
miRNA Data Mining miRNA modules related to cancer miRNAs related to cancer Weight miRNA modules a b hsa-miR-215 1 hsa-miR-7 hsa-miR-194 hsa-miR-30d hsa-miR-214 hsa-miR-30e hsa-miR-21 hsa-miR-321 hsa-miR-142-3p hsa-miR-34b hsa-miR-96 hsa-miR-126 hsa-miR-30c hsa-miR-26b hsa-miR-29b hsa-let-7f hsa-miR-9* hsa-miR-224 hsa-miR-301 miRNAs weight hsa-miR-155 hsa-miR-105 hsa-miR-223 hsa-miR-21 hsa-let-7c hsa-miR-142-3p hsa-miR-29b 263159 hsa-miR-224 hsa-miR-183 hsa-miR-184 hsa-let-7a © 2007, SNU Biointelligence Lab,

42 Non-Biological Applications
Digit Recognition Face Classification Text Classification Movie Title Prediction

43 Digit Recognition: Dataset
Original Data Handwritten digits (0 ~ 9) Training data: 2,630 (263 examples for each class) Test data: 1,130 (113 examples for each class) Preprocessing Each example is 8x8 binary matrix. Each pixel is 0 or 1. © 2007, SNU Biointelligence Lab,

44 Pattern Classification
“Layered” Hypernetwork Probabilistic Library (DNA Representation) © 2007, SNU Biointelligence Lab,

45 Simulation Results – without Error Correction
|Train set| = 3760, |Test set| = 1797. © 2007, SNU Biointelligence Lab,

46 Performance Comparison
Methods Accuracy MLP with 37 hidden nodes 0.941 MLP with no hidden nodes 0.901 SVM with polynomial kernel 0.926 SVM with RBF kernel 0.934 Decision Tree 0.859 Naïve Bayes 0.885 kNN (k=1) 0.936 kNN (k=3) 0.951 Hypernet with learning (k = 10) 0.923 Hypernet with sampling (k = 33) 0.949 © 2007, SNU Biointelligence Lab,

47 Error Correction Algorithm
Initialize the library as before. maxChangeCnt := librarySize. For i := 0 to iteration_limit trainCorrectCnt := 0. Run classification for all training patterns. For each correctly classifed patterns, increase trainCorrectCnt. For each library elements Initialize fitness value to 0. For each misclassified training patterns if a library element is matched to that example if classified correctly, then fitness of the library element gains 2 points. Else it loses 1 points. changeCnt := max{ librarySize * (1.5 * (trainSetSize - trainCorrectCnt) / trainSetSize ), maxChangeCnt * 0.9 }. maxChangeCnt := changeCnt. Delete changeCnt library elements of lowest fitness and resample library elements whose classes are that of deleted ones. © 2007, SNU Biointelligence Lab,

48 Simulation Results – with Error Correction
iterationLimit = 37, librarySize = 382,300, © 2007, SNU Biointelligence Lab,

49 Performance Comparison
Algorithms Correct classification rate Random Forest (f=10, t=50) 94.10 % KNN (k=4) Hypernetwork (Order=26) 93.49 % 92.99 % AdaBoost (Weak Learner: J48) 91.93 % SVM (Gaussian Kernel, SMO) 91.37 % MLP 90.53 % Naïve Bayes J48 87.26 % 84.86 % © 2007, SNU Biointelligence Lab,

50 Face Classification Experiments

51 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Face Data Set Yale dataset 15 people 11 images per person Total 165 images © 2007, SNU Biointelligence Lab,

52 Training Images of a Person
10 for training The remaining 1 for test © 2007, SNU Biointelligence Lab,

53 Bitmaps for Training Data (Dimensionality = 480)
© 2007, SNU Biointelligence Lab,

54 Classification Rate by Leave-One-Out
© 2007, SNU Biointelligence Lab,

55 Classification Rate (Dimensionality = 64 by PCA)
© 2007, SNU Biointelligence Lab,

56 Text Classification Experiments

57 Text Classification . . . 1. Documents 2. Bag-of-words representation
3. Term vectors 1 2 3 1 1 2 d1 d2 d3 dn baseball specs graphics hockey unix space 1 x1=0 x2=1 y=1 x3=1 x2=0 y=0 x3=0 4. Binary term-document matrix 5. DNA encoded kernel functions © 2007, SNU Biointelligence Lab,

58 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Text Classification Data from Reuters (‘ACQ’ and ‘EARN’) Learning curves: average for 10 runs © 2007, SNU Biointelligence Lab,

59 Performance Comparison
‘ACQ’ data (4,724 documents) ‘EARN’ data (7,888 documents) Higher-dimensional kernel functions can improve the performance further. © 2007, SNU Biointelligence Lab,

60 Learning from Movie Captions Experiments

61 Learning Hypernets from Movie Captions
Order Sequential Range: 2~3 Corpus Friends Prison Break 24 © 2007, SNU Biointelligence Lab,

62 Learning Hypernets from Movie Captions
© 2007, SNU Biointelligence Lab,

63 Learning Hypernets from Movie Captions
© 2007, SNU Biointelligence Lab,

64 Learning Hypernets from Movie Captions
© 2007, SNU Biointelligence Lab,

65 Learning Hypernets from Movie Captions
Classification Query generation - I intend to marry her : I ? to marry her I intend ? marry her I intend to ? her I intend to marry ? Matching - I ? to marry her order 2: I intend, I am, intend to, …. order 3: I intend to, intend to marry, … Count the number of max-perfect-matching hyperedges © 2007, SNU Biointelligence Lab,

66 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Learning Hypernets from Movie Captions Completion & Classification Examples Query Completion Classification who are you Corpus: Friends, 24, Prison Break ? are you who ? you who are ? what are you Friends you need to wear it Corpus: 24, Prison Break, House ? need to wear it you ? to wear it you need ? wear it you need to ? it you need to wear ? i need to wear it you want to wear it you need to do it you need to wear a 24 House © 2007, SNU Biointelligence Lab,

67 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Conclusion Hypernetworks are a graphical model employing higher-order nodes explicitly and allowing for a more natural representation for learning higher-order graphical models. We introduce an evolutionary learning algorithm that makes use of the high information density and massive parallelism of molecular computing to solve the combinatorial explosion problems. Applied to pattern recognition (and completion) problems in IT and BT. Obtained a performance competitive to conventional ML classifiers. Why does this work? Exploits the huge population size available in DNA computing to build an ensemble machine, i.e. a hypernetwork, of simple random hyperedges. A new kind of evolutionary algorithm where a very simple “molecular” operators are applied to a “huge” population of individuals in a “massively parallel” way. Another potential of hypernetworks is for application to solving biological problems where data are given as “wet” DNA or RNA molecules. © 2007, SNU Biointelligence Lab,

68 © 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Acknowledgements Simulation Experiments Joo-Kyoung Kim, Sun Kim, Soo-Jin Kim, Jung-Woo Ha, Chan-Hoon Park, Ha-Young Jang Collaborating Labs - Biointelligence Laboratory, Seoul National University - RNomics Lab, Seoul National University - DigitalGenomics, Inc. - GenoProt, Inc. Supported by - National Research Lab Program of Min. of Sci. & Tech. ( ) - Next Generation Tech. Program of Min. of Ind. & Comm. ( ) More Information at - - © 2007, SNU Biointelligence Lab,


Download ppt "Learning with Hypergraphs: Discovery of Higher-Order Interaction Patterns from High-Dimensional Data Moscow State University, Faculty of Computational."

Similar presentations


Ads by Google