Pathway Ranking Tool Dimitri Kosturos Linda Tsai SoCalBSI, 8/21/2003.

Slides:



Advertisements
Similar presentations
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Introduction to Microarry Data Analysis - II BMI 730
Gene regulation in cancer 11/14/07. Overview The hallmark of cancer is uncontrolled cell proliferation. Oncogenes code for proteins that help to regulate.
Distinguishing Regulators of Biomolecular Pathways Mentor: Dr. Xiwei Wu City of Hope Sean Caonguyen SoCalBSI 8/21/08.
A New Biclustering Algorithm for Analyzing Biological Data Prashant Paymal Advisor: Dr. Hesham Ali.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Exploring gene pathway interactions using SOM Keala Chan SoCalBSI August 20, 2004.
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
Gene Expression Data Analyses (3)
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
An Exploratory Method to Reconstruct Pathways Cory Tobin.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Pathway Analysis Michael Sneddon Southern California Bioinformatics Institute August 20, 2004.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Enhancing the C-48 STAT3 Inhibitor
Larry Lam Southern California Bioinformatics Summer Institute 2009 Graeber Lab – Crump Institute for Molecular Imaging UCLA A Data Management and Analysis.
Evaluation of Two Methods to Cluster Gene Expression Data Odisse Azizgolshani Adam Wadsworth Protein Pathways SoCalBSI.
Microarray-based Disease Prognosis using Gene Annotation Signatures Michael Kovshilovsky Swapna Annavarapu SoCalBSI 2005.
Evaluation of Signaling Cascades Based on the Weights from Microarray and ChIP-seq Data by Zerrin Işık Volkan Atalay Rengül Çetin-Atalay Middle East Technical.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Introduce to Microarray
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Microarray Preprocessing
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
A Bioinformatics Meta-analysis of Differentially Expressed Genes in Colorectal Cancer Simon Chan, Thursday Trainee Seminar – October 11.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
GO::TermFinder Gavin Sherlock Department of Genetics Stanford University
Radiogenomics in glioblastoma multiforme
Gene Set Enrichment Analysis (GSEA)
Analysis of Molecular and Clinical Data at PolyomX Adrian Driga 1, Kathryn Graham 1, 2, Sambasivarao Damaraju 1, 2, Jennifer Listgarten 3, Russ Greiner.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
GSEA Overview -- Workflow GSEA is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant.
1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Cell Cycle Stages cells pass through from 1 cell division to the next.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Statistical Testing with Genes Saurabh Sinha CS 466.
Clustering Algorithms to make sense of Microarray data: Systems Analyses in Biology Doug Welsh and Brian Davis BioQuest Workshop Beloit Wisconsin, June.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
GO enrichment and GOrilla
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
HISPIG – A Discriminative Model Refinement Approach with Iterations for Detecting Regulatory Regions Takuma Tsukahara
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)
Molecular Basis Of Cancer
Analysis of GO annotation at cluster level by Agnieszka S. Juncker
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Presentation transcript:

Pathway Ranking Tool Dimitri Kosturos Linda Tsai SoCalBSI, 8/21/2003

Project Overview BioDiscovery, Inc. at Marina del Rey BioDiscovery, Inc. Analyzing microarray data on pathway level instead of individual gene level Methods: -Enrichment Analysis -Permutational Statistics -S. Metric -Multivariate test Project Overview

Validation of statistical methods 2 data sets: Brain Tumor, Interferon-gamma. Sources of annotation: BioCarta, Kegg, Gene Ontology. Project Overview, cont.

phenotype microarrayalgorith m pathway Dimitri, (Computer Scientist) Linda (biologist) Project Flowchart

GeneSight is a data analysis software Feature: -Statistical significance testing -Multiple Data Visualizations -Automated gene annotation -Complete result reports -Pathway analysis (?) Research and Development in GeneSight

Glioblastoma multiforme(GBM) is the most malignant of the glial tumors, classified as grade IV. Many brain tumors are currently incurable. Average survival time: 1 year Biology of Brain Tumor

Oncogenes: promote normal cell growth Tumor suppressor genes: retard cell growth Bad Genes Foment Trouble

Interferon is a class of cytokines that mediate antiviral, antiproliferative, antitumor activites, etc. IFN gamma is produced by T lymphocytes in response to mitogens or to antigens. IFNs bind to their receptors and initiate JAK- STAT signaling cascade. Biology of Interferon

Biology of Interferon, cont.

Grouping related genes together into pathways (A)BioCartaBioCarta Ex: p53 Signaling Pathway (B)KEGGKEGG Ex:Citrate cycle (TCA cycle) Grouping genes into structured, controlled vocabularies (ontologies) Gene Ontology -Biological Process. Ex: angiogenesis, apoptosis -Molecular Function. Ex: DNA binding activity -Cellular Component.Ex: nucleus, mitochondria Gene Annotations

Traditional method of ranking gene pathways Steps: 1. Mann-Whitney Test: obtain list of probe sets that satisfy a certain p-value. 2. Cluster analysis: see how many of listed probe occur in a cluster (pathway). Example: 1. Original data: 12,625 genes. Select genes p-value < =>narrow to 927 genes. 2.Cluster those 927 genes into clusters.

4 of the genes in SODD/TNFR1 Signaling Pathway satisfy p-value<0.001 Mann-Whitney Test, Denovo Glioblastoma p<0.001

How Affy. Microarray Chips Work Best results: Genes hybridize perfectly with Perfect Match, and not at all with Mismatch. PM: Perfect Match MM: Mismatch

Example of GeneSight PlotData Normal Tumor Probe Set A Probe Set B Probe Set C Probe Set A Theoretical Tumor Expression Levels (Log Transformed) Conditions Genes Notice column replicates, Probe Set replicates.

Given Data Sets Given two data sets: Brain Tumor, IFN- γ Brain Tumor Data Set has 5+ tumor types,however, only 2 Tumor types were used (Denovo Glioblastoma, Progressive Glioblastoma) IFN- γ Data Set: the entire data set was used.

What and why? Goal: write a prototype extension to GeneSight that uses permutational statistics to develop a custom distribution for a given Microarray data set. Overall significance: the software provides a list of (potentially) significant pathways that enables researchers to focus their work.

What is permutational statistics? EECC 1234 Choose different Control and Experiment groupings (permute). ECEC 1234 By iterating through an adequate number of permutations, we can determine if a pathway is likely to be significant (p-value). (In this context.)

Permutational Stats. There are two versions of the S. Metric currently implemented. S. Metric I = S. Metric II = M = Number of Genes flagged as significant Total = Total number of Genes in the Pathway

(Layman's) How Statistics Works DataStatisticP-Value Permute Here S. Metric I, II After all permutations are done, calculate the p-Value

Algorithm Take at least 10,000 unique permutations. A unique permutation is determined by a Permute class. For each condition For each permutation For each gene Calc. Mean diff. Calc. T-stat End For For each pathway store the statistic End for End for calcPvalue(stored statistic) End For S. Metric Initial Significance Flagging pValue

Limitations Computational Power (Memory, CPU) Required number of replicates (8,8)

Output of result

Validation of pathway analysis Method 1 Problem: lack of insignificant pathways ????

Validation of pathway analysis Method 2 Best algorithm Random Worst Comparision of Prediction Methods # of Pathways in BioCarta sorted by P-value # of identified significant pathways

Result Brain Tumor-BioCarta

Result IFNG-Molecular Function (GO)

Biological Limitations Prediction of pathways to be significant in the conditions of interest is subjective. Assumption of similar biological states between Denovo Glioblastoma and Progressive Glioblastoma.

Future Direction Finish modifying the Multivariate Statistic for use in the permutational method. This method uses PCA and Multivariate statistics. Finish Validating the data produced using the Multivariate Statistic.

Initial Results of Multivariate Stat. Sorted by p-value.

Conclusion It is not clear which is better the S. metric or traditional Enrichment Analysis. Improvements can be made to the S. metric.

Acknowledgements Dr. Bruce Hoff Dr. Anton Petrov SoCalBSI: Dr. Jamil Momand, Dr. Sandra Sharp, Dr. Nancy Warter-Perez, Dr. Wendie Johnston National Science Foundation National Institute of Heath