A Combinatorial Approach to the Analysis of Differential Gene Expression Data The Use of Graph Algorithms for Disease Prediction and Screening.

Slides:

Advertisements

Similar presentations

Publications Reviewed Searched Medline Hand screening of abstracts & papers Original study on human cancer patients Published in English before December.

Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"

Molecular Systems Biology 3; Article number 140; doi: /msb

Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.

Instance-based Classification Examine the training samples each time a new query instance is given. The relationship between the new query instance and.

Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.

Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.

Chapter 17 Overview of Multivariate Analysis Methods

HCS Clustering Algorithm

Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.

Part II: Discriminative Margin Clustering Joint work with: Rob Tibshirani, Dept of Statistics Patrick O. Brown, School of Medicine Stanford University.

University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.

Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,

Clustering (Part II) 10/07/09. Outline Affinity propagation Quality evaluation.

Protein Classification A comparison of function inference techniques.

Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.

Health and CS Philip Chan. DNA, Genes, Proteins What is the relationship among DNA Genes Proteins ?

Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Gene expression profiling identifies molecular subtypes of gliomas

Gene expression & Clustering (Chapter 10)

Whole Genome Expression Analysis

Analysis and Management of Microarray Data Dr G. P. S. Raghava.

Potential Data Mining Techniques for Flow Cyt Data Analysis Li Xiong.

A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.

PROGRESS REVIEW Mike Langston’s Research Team Department of Computer Science University of Tennessee with collaborative efforts at Oak Ridge National Laboratory.

A Graph-based Friend Recommendation System Using Genetic Algorithm

1 Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting Authors: A. Dupuy and R.M. Simon.

A Short Overview of Microarrays Tex Thompson Spring 2005.

Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희.

Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.

Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.

Anis Karimpour-Fard ‡, Ryan T. Gill †,

Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.

An efficient hybrid clustering algorithm for molecular sequences classification Wei-Bang Chen.

An Overview of Clustering Methods Michael D. Kane, Ph.D.

Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:

Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.

Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.

Gene Sleuthing Lorraine Sartori Majid Masso Paul R. McCreary.

Analyzing Expression Data: Clustering and Stats Chapter 16.

Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University.

Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)

Overlapping Community Detection in Networks

Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.

Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.

D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

The Expanded Central Dogma DNARNA mRNA tRNA rRNA hnRNA Etc. Protein transcription translation Protein/RNA Regulate Initiation Protein/RNA/AA’s Regulate.

Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.

Classification Using Top Scoring Pair Based Methods Tina Gui.

Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.

David Amar, Tom Hait, and Ron Shamir

Correlation Clustering

Semi-Supervised Clustering

Cluster Analysis II 10/03/2012.

Molecular Classification of Cancer

Computational Diagnostics

Host transcriptional profiling distinguishes patients with acute LRTI (LRTI+C+M) from those with noninfectious acute respiratory illness (no-LRTI). Host.

Spectral methods for Global Network Alignment

Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.

Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar

Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.

Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.

Network-Based Coverage of Mutational Profiles Reveals Cancer Genes

Subtype classification of breast functional screening results.

Presentation transcript:

A Combinatorial Approach to the Analysis of Differential Gene Expression Data The Use of Graph Algorithms for Disease Prediction and Screening

The Goal To classify patients based on expression profiles –Presence of cancer –Type of cancer –Response to treatment To identify the genes required for accurate classification –Too many = unnecessary noise –Too few = insufficient information

Classic Clustering Problem Current techniques: –Hierarchical Clustering –K-Means Clustering –Self-Organizing Maps –Others Drawbacks: –Determining cluster boundaries difficult with diffuse data –Objects can only belong to one group

Eliminate Poorly Covering Genes Raw Data Set of Discriminatory Genes Gene Scores Verify by Classification Calculate Sample Similarities Apply Threshold Eliminate Poorly Discriminating Genes Algorithmic Training Dominating Set Maximal Cliques Gene Scoring

Raw Data Eliminate Poorly Discriminating Genes Algorithmic Training

The Gene Scoring Function: Identifying Discriminators vs.

Eliminate Poorly Covering Genes Raw Data Eliminate Poorly Discriminating Genes Algorithmic Training

Eliminate Poorly Covering Genes SamplesGenes Class 2 Class 1

Eliminate Poorly Covering Genes Raw Data Calculate Sample Similarities Apply Threshold Eliminate Poorly Discriminating Genes Algorithmic Training

Create Unweighted Graph Complete, edge-weighted graph –Vertices = samples –Edge weight = similarity metric Remove edge weights –If edge weight < threshold, remove edge from graph –Otherwise, keep edge, ignore weight Result: incomplete unweighted graph

The Edge Weight Function where, expression value ij = expression value of gene i for sample j

Eliminate Poorly Covering Genes Raw Data Set of Discriminatory Genes Gene Scores Verify by Classification Calculate Sample Similarities Apply Threshold Eliminate Poorly Discriminating Genes Algorithmic Training

A completely connected subset of vertices in a graph Maximal clique = local optimization NP-complete What is a Clique?

Classification Using Clique Class2 Class 1 Class 3 Class 2 GRAPH

A Selection of Discriminators ADH1Balcohol dehydrogenase IBalcohol dehydrogenase activity FHL1four and a half LIM domains 1cell growth, cell differentiation HBBhemoglobin, betaoxygen transport CYP4B1cytochrome P450 4B1electron transport TNAtetranectinplasminogen binding protein TGFBR2transforming growth factor, beta receptor II transmembrane receptor protein serine/threonine kinase signaling pathway

Raw Data Classify Unknown Samples Calculate Sample Similarities Apply Threshold Set of Discriminatory Genes, Scores The Algorithm - Unsupervised

Summary Intersection of clique and dominating set techniques improves results Combined orthogonal scoring identifies limited number of discriminatory genes Clique offers means of validating obtained scores and weights Our technique identifies differing set of discriminatory genes from original paper Clique-based classification a viable complement to present clustering methods

Ongoing and Future Research Reverse Training Train to distinguish among types of cancer Experiment with different weight functions (ex. Pearson’s coefficient) Investigate using less stringent techniques –Near-cliques –Neighborhood search –K-dense subgraphs Port codes to SGI Altix supercomputer

Our Research Group Mike Langston, Ph. D. Lan Lin Chris Symons Xinxia Peng Bing Zhang, Ph. D.