衛資所 生物資訊組 陳俊宇 April 07, 03
graph nodeedge Chromosomegenepositional correlations Pathwayenzymefunctional correlations Gene expression genecoexpressed Protein interaction proteinprotein-protein interaction Protein structure protein3D structural similarity
What questions they want to answer? C i : correlated gene cluster (correlated cluster) h i : hyperedge To extract a set of correlated genes with respect to multiple biological features. Provide biological information to classify genes.
Method Clustering of hyperedges!! Input datasets: graph G = {G 1, …, G n } hyperedges H = {h 1, …, h m } Distance between hyperedges:
E.coli correlated gene clusters E.coli genome dataset (G 1 : 4,396 nodes and 4,396 edges) E.coli pathway dataset (G 2 : 761 nodes and 1,223 edges) E.coli structure similarity dataset (G 3 : 538 nodes and 3,823 edges) 917 hyperedges threshold parameters p 1 = 2, p 2 = 3, p 3 = 0
Screening the two-hybrid protein-protein interaction dataset. (yeast protein interaction) Compared this dataset with the following datasets: –S.cerevisiae coexpression dataset –S.cerevisiae pathway dataset –E.coli genome dataset If an interaction or a relation is also observed in biological attributes other than protein-protein interactions, we judge the interaction is relevant.
S.cerevisiae two-hybrid vs. E.coli genome
Discussions graphs being compared really can provide biological information to classify genes. It’s not clear whether the inclusion of the yeast genome dataset can improve the confidence of screening the two-hybrid dataset. Deriving sub-networks that indicate how genes are connected in a correlated gene cluster. edge -> weight (normalization of edge weights among graphs for comparison!)