Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,

Slides:



Advertisements
Similar presentations
Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
Advertisements

1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
An Association Analysis Approach to Biclustering website:
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Putting genetic interactions in context through a global modular decomposition Jamal.
Introduction to Microarry Data Analysis - II BMI 730
University at BuffaloThe State University of New York Interactive Exploration of Coherent Patterns in Time-series Gene Expression Data Daxin Jiang Jian.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
A New Biclustering Algorithm for Analyzing Biological Data Prashant Paymal Advisor: Dr. Hesham Ali.
Threshold selection in gene co- expression networks using spectral graph theory techniques Andy D Perkins*,Michael A Langston BMC Bioinformatics 1.
Teresa Przytycka NIH / NLM / NCBI RECOMB 2010 Bridging the genotype and phenotype.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
. Differentially Expressed Genes, Class Discovery & Classification.
Mining Phenotypes and Informative Genes from Gene Expression Data Chun Tang, Aidong Zhang and Jian Pei Department of Computer Science and Engineering State.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Radiogenomics in glioblastoma multiforme
“An Extension of Weighted Gene Co-Expression Network Analysis to Include Signed Interactions” Michael Mason Department of Statistics, UCLA.
A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong.
Gene Set Enrichment Analysis (GSEA)
Graph preprocessing. Common Neighborhood Similarity (CNS) measures.
Ranking-Aware Integration and Explorative Search of Distributed Bio-Data Dipartimento di Elettronica e Informazione NETTAB 2012 Integrated Bio-Search November.
The Impact of Big Data on Health Science Research Vipin Kumar University of Minnesota Delivery Science Summit, Mayo.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles Jin Chen Sep 2012.
Agent-based methods for translational cancer multilevel modelling Sylvia Nagl PhD Cancer Systems Science & Biomedical Informatics UCL Cancer Institute.
Apostolos Zaravinos and Constantinos C Deltas Molecular Medicine Research Center and Laboratory of Molecular and Medical Genetics, Department of Biological.
Clustering by Pattern Similarity in Large Data Sets Haixun Wang, Wei Wang, Jiong Yang, Philip S. Yu IBM T. J. Watson Research Center Presented by Edmond.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Bioinformatics lectures at Rice University Li Zhang Lecture 11: Networks and integrative genomic analysis-3 Genomic data
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Biclustering of Expression Data by Yizong Cheng and Geoge M. Church Presented by Bojun Yan March 25, 2004.
Achim Tresch Computational Biology Gene Center Munich (The Sound of One-Hand Clapping) Modeling Combinatorial Intervention Effects in Transcription Networks.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Knowledge Discovery and Data Mining from Big Data Vipin Kumar Department of Computer Science University of Minnesota
University at BuffaloThe State University of New York Pattern-based Clustering How to cluster the five objects? qHard to define a global similarity measure.
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
Advances and challenges in computational modeling and statistical learning of biological systems Qi Liu Department of Biomedical Informatics Vanderbilt.
Simultaneous identification of causal genes and dys-regulated pathways in complex diseases Yoo-Ah Kim, Stefan Wuchty and Teresa M Przytycka Paper to be.
Overview of Biomedical Informatics
Data Mining Techniques For Correlating Phenotypic Expressions With Genomic and Medical Characteristics This work has been supported by DTC, IBM and NSF.
William Norris Professor and Head, Department of Computer Science
Dept of Biomedical Informatics University of Pittsburgh
William Norris Professor and Head, Department of Computer Science
Data Mining for Biomedical Informatics
Association Analysis Techniques for Bioinformatics Problems
William Norris Professor and Head, Department of Computer Science
Discriminative Pattern Mining
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
Computational Discovery of miR-TF Regulatory Modules in Human Genome
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
Presentation transcript:

Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar Department of Computer Science and Engineering RECOMB Systems Biology 12/05/2009

Differential Expression (DE) –Traditional analysis targets the changes of expression level Differential Expression (DE) Expression over samples in controls and cases Expression level controls cases [Golub et al., 1999], [Pan 2002], [Cui and Churchill, 2003] etc. genes controlscases [Kostka & Spang, 2005]

Matrix of expression values Differential Coexpression (DC) –Targets changes of the coherence of expression controlscases Question: Is this gene interesting, i.e. associated w/ the phenotype? Answer: No, in term of differential expression (DE). However, what if there are another two genes ……? Yes! Expression over samples in controls and cases Differential Coexpression (DC) [Silva et al., 1995], [Li, 2002], [Kostka & Spang, 2005], [Rosemary et al., 2008], [Cho et al. 2009] etc. Biological interpretations of DC: Dysregulation of pathways, mutation of transcriptional factors, etc. genes controlscases [Kostka & Spang, 2005]

Existing work on differential coexpression –Pairs of genes with differential coexpression [Silva et al., 1995], [Li, 2002], [Li et al., 2003], [Lai et al. 2004] –Clustering based differential coexpression analysis [Ihmels et al., 2005], [Watson., 2006] –Network based analysis of differential coexpression [Zhang and Horvath, 2005], [Choi et al., 2005], [Gargalovic et al. 2006], [Oldham et al. 2006], [Fuller et al., 2007], [Xu et al., 2008] –Beyond pair-wise (size-k) differential coexpression [Kostka and Spang., 2004], [Prieto et al., 2006] –Gene-pathway differential coexpression [Rosemary et al., 2008] –Pathway-pathway differential coexpression [Cho et al., 2009] Differential Coexpression (DC)

Full-space differential coexpression May have limitations due to the heterogeneity of –Causes of a disease (e.g. genetic difference) –Populations affected (e.g. demographic difference) Existing DC work is “full-space” Motivation: Such subspace patterns may be missed by full- space models Full-space measures: e.g. correlation difference

Definition of Subspace Differential Coexpression Pattern –A set of k genes = {g 1, g 2,…, g k } – : Fraction of samples in class A, on which the k genes are coexpressed – : Fraction of samples in class B, on which the k genes are coexpressed Extension to Subspace Differential Coexpression Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010] as a measure of subspace differential coexpression Problem: given n genes, find all the subsets of genes, s.t. SDC≥d Given n genes, there are 2 n candidates of SDC pattern ! How to effectively handle the combinatorial search space ? Similar motivation and challenge as biclustering, but here differetial biclustering !

Direct Mining of Differential Patterns [Fang, Pandey, Gupta, Steinbach and Kumar, TR , Refined SDC measure: “direct” A measure M is antimonotonic if V A,B: A B  M(A) >= M(B) Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010] >> ≈

Advantages: 1) Systematic & direct 2) Completeness 3) Efficiency An Association-analysis Approach systematic and efficient combinatorial search [ Agrawal et al. 1994] Refined SDC measure A measure M is antimonotonic if V A,B: A B  M(A) >= M(B) Disqualified Prune all the supersets

Three lung cancer datasets –[Bhattacharjee et al. 2001], [Stearman et al. 2005], [Su et al. 2007] All are from Affymetrix microarrays (first two: HG-U95A, and the third: HG-U133A) –Lung cancer samples & normal samples Combined dataset –More samples –Proper normalizations before combining: (RMA, DWD, XPN) –Lung cancer samples (102) –normal samples (67) Validation RMA [Irizarry et al., 2003], DWD [Benito et al., 2004], XPN [Shabalin et al., 2008]

Statistical Significance Phenotype permutation test (n=1000 ) A B C

Could Subspace DC patterns have been discovered in full-space? Full-space DC measures DC (Differential Coexpression) Subspace DC measures Phenotype permutation based significant cutoff for the full-space measure 88 statistically significant size-3 patterns (stars) Can also be found in full-space Can NOT be found in full-space

A 10-gene Subspace DC Pattern www. ingenuity.com: enriched Ingenuity subnetwork ≈ 60% ≈ 10% Enriched with the TNF-α/NFkB signaling pathway (6/10 overlap with the pathway, P-value: 1.4*10 -5 ) Suggests that the dysregulation of TNF-α/NFkB pathway may be related to lung cancer

Specific interpretation –Enriched cancer-related signaling pathways TNF-α/NFkB WNT –Target gene sets of cancer-related microRNA & TFs microRNA: –miR-101 ({PIK3C2B,TSC22D1} + AKAP12) Transcriptional factor (TF): –ATF2 ({ETV4,PTHLH} + CBX5) Biological Interpretations miR-101 is shown down-regulated in cancer [Friedman et al 2009] Mutations of ATF2 are shown to be related to cancer [Woo et al. 2002]

Summary –Proposed the problem definition & a systematic approach for subspace DC –Subspace DC analysis can identify many statistically significant & biologically relevant patterns that would have been missed in full-space Potential Biomedical utility –Study the demographic and genetic difference within each class –Phenotype classification with subspace DC patterns Combine DE and Subspace DC patterns Summary & Future Directions DE (Differential Expression); DC (Differential Coexpression) Compare

Co-authors at Dept. Computer Science, Univ. of Minnesota Conference organizers NSF grants #CRI #IIS #ITR UMR-IBM-Mayo BICB Fellowship Acknowledgement Rui Kuang Gaurav Pandey Michael Steinbach Chad Myers Vipin Kumar Data Mining for Biomedical Informatics Group Comp. Bio. Group Comp. Bio. & Func. Genomic Group

Paper –Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar, Subspace Differential Coexpression Analysis: Problem Definition and a General Approach Proceedings of 15 th Pacific Symposium on Biocomputing, 2010 Source codes: Questions: –Gang Fang: Thanks!