TEMPLATE DESIGN © 2008 www.PosterPresentations.com Molecular Re-Classification of Renal Disease Using Approximate Graph Matching, Clustering and Pattern.

Slides:



Advertisements
Similar presentations
Analysis of High-Throughput Screening Data C371 Fall 2004.
Advertisements

Mining Association Rules from Microarray Gene Expression Data.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Fingerprint Minutiae Matching Algorithm using Distance Histogram of Neighborhood Presented By: Neeraj Sharma M.S. student, Dongseo University, Pusan South.
Clustering approaches for high- throughput data Sushmita Roy BMI/CS 576 Nov 12 th, 2013.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Aki Hecht Seminar in Databases (236826) January 2009
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
HCS Clustering Algorithm
Experimental and computational assessment of conditionally essential genes in E. coli Chao WANG, Oct
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
What is Cluster Analysis
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
Gene expression & Clustering (Chapter 10)
Automatic methods for functional annotation of sequences Petri Törönen.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Similarity Methods C371 Fall 2004.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Graph and Topological Structure Mining on Scientific Articles Fan Wang, Ruoming Jin, Gagan Agrawal and Helen Piontkivska The Ohio State University The.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Tutorial session 3 Network analysis Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
Whole Genome Repeat Analysis Package A Preliminary Analysis of the Caenorhabditis elegans Genome Paul Poole.
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Cluster validation Integration ICES Bioinformatics.
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Data Mining and Decision Support
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Predicting Protein Function Annotation using Protein- Protein Interaction Networks By Tamar Eldad Advisor: Dr. Yanay Ofran Computational Biology.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Metagenomic Species Diversity.
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
CSCI2950-C Lecture 13 Network Motifs; Network Integration
SEG5010 Presentation Zhou Lanjun.
Basic Local Alignment Search Tool
Presentation transcript:

TEMPLATE DESIGN © Molecular Re-Classification of Renal Disease Using Approximate Graph Matching, Clustering and Pattern Mining Ramakrishna Varadarajan 1, Felix Eichinger 2, Jignesh Patel 1 and Matthias Kretzler 2. 1 University of Wisconsin-Madison, Madison, WI and 2 University of Michigan, Ann Arbor, MI. Abstract Classification of patients with a chronic disease course, such as kidney diseases, uses mainly descriptive disease definitions. To develop molecular based disease stratification, we aimed to define patient subgroups by conserved transcriptional networks. Defining similarity of patients on a regulatory network level, rather than on an individual gene level, might yield more robust indicators of function. Network nodes for each patient were derived from Affymetrix microarrays of kidney biopsies compared to healthy controls. Subsequently, relations between the nodes were established by natural language processing of PubMed abstracts and automated promoter analysis for transcription factor binding sites. The resulting networks are typically noisy or incomplete in nature; therefore network similarities are determined through an approximate graph-matching tool, allowing a degree of mismatching (within a preset threshold) in the displayed transcriptional networks. Based on a similarity score the patient networks are clustered - with the goal of attaining high intra-cluster similarity (networks within a cluster are highly similar) and low inter- cluster similarity (networks from different clusters are dissimilar). To extract underlying biological mechanism inside each cluster, we employ graph mining techniques and search for frequently occurring motifs (recurring subnetworks) within each cluster, indicative of characteristic disease processes (commonly occurring phenomenon within each cluster). Motifs across each cluster are compared to define mechanistic similarities and differences between network clusters. Finally, both clusters and motifs are matched back to the established descriptive clinical classifications to compare molecular and clinical classification. Introduction Gene Selection How to do that for each patient ? No significance - revert to fold-change to make a binary decision if gene is “differently expressed”. Compare to controls: For each gene, calculate median and standard deviation in the controls. Subtract medians of controls from patients expression values. Result - genes with little change will have values close to 0. If value is smaller than 2 x standard deviation, then discard gene. Result : Gene list for each patient that differ in length and composition. Construct Networks Feed those ~250 gene lists into Bibliosphere.  Generate gene networks from pub med abstracts.  Edges are co-citations if genes in abstracts.  Level of expression does not play ANY role.  All networks are created on the same knowledge base => subnets of the same core. Merge Networks For all combinations of the 250 networks  Perform approximate graph matching (using TALE) This again works solely on structure, the expression levels play no role. Approximate matching helps to account for noise and redundancy Result : Pair-wise similarity of networks. Pair-wise Network Similarity Computation Clustering Algorithm We use MCL algorithm for clustering networks, based on the pair- wise network similarities. The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for networks (also known as graphs ) based on simulation of (stochastic) flow in graphs. Before clustering the networks, we use a similarity threshold to eliminate some insignificant pair-wise network similarities. Different network similarity give different cluster results. A higher threshold would result in many smaller clusters and vice versa. Clustering Pattern Mining in Clusters (find common motifs in clusters) Sample Cluster Pattern Output Page Summary Two Approaches: 1) Network-based Approach: For each patient, create a network with genes as nodes and additional information (PPI, literature search) as edges. Run approximate graph matching algorithm (TALE) to determine which networks are similar. 2) Annotation-based Approach: Group genes by annotation and cluster patients for each of those groups. In this poster – we focus on the networks approach Why Networks ? Cross-reference information of gene lists with independent knowledge Don’t compare only identities, but also structures. Can help stabilizing. Will also introduce bias. Since we compare individual patients (n=1), the potential profit is estimated higher then the loss. 1.Select genes for each patient. 2.Generate network for each patient. 3.Merge networks using TALE. 4.Cluster networks using similarity determined by TALE. 5.Within each of the clusters: search for common motifs (sub-networks). Patterns are frequently occurring sub-graphs within the networks present in a cluster. Note that, we currently only mine patterns within each cluster. This means, we have a set of patterns for each cluster.  In each of the clusters:  Find common sub-networks (motifs).  Could be used for patient classification.  They might be a starting point to define function specific to a patient group. In this paper, we are particularly interested in mining contrast patterns in each cluster. Contrast patterns are those with high frequency in one cluster and low frequencies in the remaining clusters. Contrast patterns are unique to a cluster and hence are particularly interesting. Frequent pattern mining has attracted a lot of interest recently. Frequent substructures are very basic patterns that can be discovered in a collection of graphs. Recent studies have developed several frequent substructure-mining methods. A cluster is an aggregation of networks, that share some similarity. The goal of clustering is to maximize intra-cluster similarity and minimize inter-cluster similarity. Goal : Group patients by network similarity thresholds. Key problem → Find appropriate parameters: Reasonable # of members per cluster. Most/all patients are present in any cluster. We load all graphs into the database, and use TALE to query each of the 250 networks against the database. So, basically, there are 250 X 250 comparisons and we get the pair-wise matching results. We consider both:  Size of the match (the number of matching nodes) and  How similar the connectivity of nodes is in the match. The similarity scores are computed after the shared network between the graphs is computed. To be precise, we use the following measure to access the quality of the match: Under this similarity model, a higher score means more similar. Note that this similarity score is asymmetric. Therefore, for each pair of maps, we use the maximum of the two as the similarity score between the two maps. StructDist is the summation of the shortest distances between every matching pair of nodes in the two networks.