ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING
BACKGROUND Completion of sequencing projects Need for functional discovery Emerging area of study: Large scale genomic analysis Similarity of living systems
GENETIC NETWORKS Modelling genetic networks Interaction of genes and proteins Relationship between topology and function
MOTIVATION Common biological processes Comparison of networks Discovering missing interactions Discovering missing genes
GRAPH MATCHING mpn132mpn124mpn141mpn145mpn134mpn133mge234mge235mge236mge312mge314mge310mge313mge336mge337 Search-based Algorithm Pruning Techniques G1 G2
ROADMAP Scale-Free Networks Modelling Genetic Networks Graph Matching Algorithm Results
SCALE-FREE NETWORKS
COMPLEX NETWORKS Small-world model –WWW –Human acquaintances network –Citation networks –Biological networks
SMALL-WORLD Features: –Characteristic path length –Clustering coefficient –Sparseness
SMALL-WORLD Somewhere in between regular & random graphs
SMALL-WORLD Highly clustered Short diameter
SCALE-FREE NETWORKS Complex networks: biological, social, www, power grid, citation etc. Power low connectivity: P(k) = k - Hubs - authorities
SCALE-FREE NETWORKS Application for testing scale free behavior Yeast Helicobacter Pylori Mycoplasma Pnuemonia Mycoplasma Genitelium Linear log-log graph Slope =
SCALE-FREE NETWORKS Slope is calculated by least mean square method
TOPOLOGY & FUNCTIONALITY Small diameter – ease of dissemination of information – ease of restoring after disturbance Cliquishness –Alternate paths are found Heterogeneity –Random removal does not effect the network –Hubs are vulnerable to attack
BIOLOGICAL ASPECTS Multifunctionality –Grouped into functional units Stability Reason: Most of the interactions are between hubs and authorities
MODELLING GENETIC NETWORKS
TYPES OF GENETIC NETWORKS Categorized by data sources –Metabolic pathways –Gene expression arrays –Protein interactions –Gene interactions
INTERACTION MAPS High level perspective –Nodes: Genes or proteins –Edges: Presence of an interaction Data sources –Two-hybrid analysis –Fusion analysis –Chromosomal proximity –Phylogenetic analysis
GRAPH MATCHING
PROBLEM DEFINITION Attributed Relational Graph (ARG) G = { V, E, X}. V = {v 1, v 2, …, v n } Nodes E = {e 1, e 2, …, e m } Edges X = {x 1, x 2,…,x n } Attributes
INEXACT SUBGRAPH MATCHING Allow for : Mismatching attribute values Missing nodes Missing links Also called error-correcting subgraph isomorphism NP-Complete
SEARCH TECHNIQUES Cost function Pruning (Structure Constraints) Backtracking
ATTRIBUTED GRAPH MATCHING TOOL
ATTRIBUTE MATCHING -Amino Acid Sequence Content Composition – array of 20, percentage of each aa –Amino acid grouped into classes: array of 6 –Amino acid triples grouped into classes: array of 216 MKVLNKNEL 6 x 6 x 6
ATTRIBUTE MATCHING Difference in amino acid composition values of gene pairs for M. Genitalium and M. Pneumoniae. Score observations
STRUCTURAL CONSTRAINTS Effect of scale-free behaviour –Connectivity information: Highly heterogeneous, thus start with most connected and work around it –Pruning strategy: comparibility is determined by power low
STRUCTURAL CONSTRAINTS Neigborhood connectivity –Choose the neighbor at the next stage Backtracking –Component by component –Go back to the neighbor with the most connectivity within the component
TEST CASE Mycoplasma Genitalium: –smallest genome (470 ORFs) Mycoplasma Pnuemoniae: –Very similar, superset (688 ORFs)
TEST CASE... Mycoplasma Genitalium: –232 nodes –211 links Mycoplasma Pnuemoniae: –267 nodes –257 links Inputs: MGE links MPN links MGE synonyms MPN synonyms MGE amino acid sequence MPN amino acid sequence
RESULTS MGEMPN
DISCOVERY OF MISSING DATA Missing link Link between in MPN632 and MPN637 is missing in our data but exists in literature
DISCOVERY OF MISSING DATA Missing node with known COG MPN MPN237---MPN238---MPN678 MG MG MG MG459 MG459 is ortholog of MPN678
DISCOVERY OF MISSING DATA Missing node without known ortholog
CONCLUSION Large-scale genomics Interaction data captures system structure and dynamics Graph matching exploits the scale-free characteristics Novel interactions and genes can be identified
ACKNOWLEDGEMENT YASEMİN TÜRKELİ