PaLS: Pathways and Literature Strainer Filtering common literature, ontology terms and pathway information. Andrés Cañada Pallarés Instituto Nacional de.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Asking translational research questions using ontology enrichment analysis Nigam Shah
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Pathways analysis Iowa State Workshop 11 June 2009.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
RNA-seq analysis case study Anne de Jong 2015
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Yeast Dataset Analysis Hongli Li Final Project Computer Science Department UMASS Lowell.
Cluster analysis  Function  Places genes with similar expression patterns in groups.  Sometimes genes of unknown function will be grouped with genes.
1 Intro & materials. 2 Overview Monday –MA experimental basic –MA data analysis –Introduction to lab 1 –lab 1 Tuesday –Introduction to lab 2 –lab 2 Bio-Informatic.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Protein and Function Databases
Emergent Biology Through Integration and Mining Of Microarray Datasets Lance D. Miller GIS Microarray & Expression Genomics.
Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.
Automatic methods for functional annotation of sequences Petri Törönen.
Identification of network motifs in lung disease Cecily Swinburne Mentor: Carol J. Bult Ph.D. Summer 2007.
Chapter 7 Essential Concepts in Molecular Pathology Companion site for Molecular Pathology Author: William B. Coleman and Gregory J. Tsongalis.
Frédéric Schütz Statistics and bioinformatics applied to –omics technologies Part II: Integrating biological knowledge Center.
Analysis of Molecular and Clinical Data at PolyomX Adrian Driga 1, Kathryn Graham 1, 2, Sambasivarao Damaraju 1, 2, Jennifer Listgarten 3, Russ Greiner.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
Copyright OpenHelix. No use or reproduction without express written consent1.
BETWEEN CF HUMAN AIRWAY AND NORMAL CELLS Institute for Research in Immunology and Cancer, Department of Computer Science and Operation Research, Research.
Networks and Interactions Boo Virk v1.0.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.
Managing Data Modeling GO Workshop 3-6 August 2010.
Finish up array applications Move on to proteomics Protein microarrays.
Supplemental figure 1: Correlation coefficients between signal intensities from biological replicates of wild.
Supplemental data 2. Breast cancer primary tumor, metastasis and xenograft Total copy number gain (green), loss (red) and unchanged (black) for primary.
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
UBio Training Courses Micro-RNA web tools Gonzalo
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Statistical Testing with Genes Saurabh Sinha CS 466.
Clustering Algorithms to make sense of Microarray data: Systems Analyses in Biology Doug Welsh and Brian Davis BioQuest Workshop Beloit Wisconsin, June.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
A Report on CAMDA’01 Biointelligence Lab School of Computer Science and Engineering Seoul National University Kyu-Baek Hwang and Jeong-Ho Chang.
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
GO enrichment and GOrilla
Copyright OpenHelix. No use or reproduction without express written consent1.
Microarray Data Analysis The Bioinformatics side of the bench.
CuffDiff ran successfully. Output files include gene_exp.diff What are the next steps? Use Navigation bar to find files; they may be under DNA Subway if.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Supplemental Table S1 Table S1. Quantitative real-time RT-PCR primer sequences for genes used in the publication. All sequences are listed in the 5’ –
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
A Combinatorial Approach to the Analysis of Differential Gene Expression Data The Use of Graph Algorithms for Disease Prediction and Screening.
AN INTRODUCTION TO GENE EXPRESSION ANALYSIS BY MICROARRAY TECHNIQUE (PART II) DR. AYAT B. AL-GHAFARI MONDAY 10 TH OF MUHARAM 1436.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
ARCH/VCDE F2F BoF And the Presentation Subtitle Goes Here Ravi Madduri December 2008.
CellExpress Tutorial A Comprehensive Microarray-Based Cancer Cell Line and Clinical Sample Gene Expression Analysis Online System :8080 NTU.
GSEA-Pro Tutorial Anne de Jong University of Groningen.
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Van’t Veer et al, Nature 415: (2002)
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Presentation transcript:

PaLS: Pathways and Literature Strainer Filtering common literature, ontology terms and pathway information. Andrés Cañada Pallarés Instituto Nacional de Bioinformática

-Studies of differential expression and, specially, gene selection in the context of classification and prediction with microarray data, usually output lists of “interesting genes”. -some of the members of those lists have a function in common or do they belong to the same metabolic pathway? -PaLS takes a list or set of lists of gene or protein identifiers and shows which ones share certain descriptors -Variable selection with microarray data (where number of variables>>number of samples) can lead to many solutions. Different rounds of the same algorithms often return different lists of “interesting genes”. It is a problem for the interpretability of the results. -PaLS allows us to try to discover the major biological themes that are shared among different solutions. Even if the identity of genes in each solution is different

#Run.1.component.1 NM_ NM_ NM_ NM_ NM_ NM_ NM_ NM_ #Run.2.component.1 NM_ NM_ NM_ NM_ NM_ NM_ Main input file. Text Plain -List or several lists of gene/proteins -Each list can have its own name -Type of identifiers accepted: -Ensembl Gene IDs -UniGene Cluster IDs -Gene names (HUGO) -GenBank accessions -Clone IDs -Affymetrix IDs -EntrezGene IDs -RefSeq_RNAs -RefSeq_peptides -SwissProt Names -Organisms accepted: -Human -Mouse -Rat

-PaLS has three different methods of filtering annotations: 1.- Filter descriptors referenced with more than a given percentage, giving results for each list separately. Intended to be used to discern which list has some common published information that shows that those genes/proteins share a similar function. 3.- Look for those descriptors that are referenced by more than a given threshold of identifiers in more than a given percentage of lists. Looking for commonalities present within and among sets of lists. 2.- Group all lists in one list (removing duplicates) and display those descriptors that are more referenced in the global list. To see commonalities even if they are not seen within each list. -Threshold values are part of input information needed. Defaults to 50% -Lower values are suggested

-For lists of less of 100 nodes, graph plots that describe the data structure of the lists are created. These plots show the genes/proteins that share at least one descriptor. The more descriptors they share the closer they appear. -Output are lists of those descriptors that fulfill the threshold criteria selected by the user. Every input identifier related to each descriptor is linked to IDClight to present the user as much information as possible. Most time cosuming process is the first search. After that, the user can change thresholds for each type of descriptor and filtering method, obtaining an answer in a short time (Redo Analysis button, see figure later)

-Data set from van’t Veer et al (Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415(6871), ) -Lists of genes obtained using our cnio application SignS (Díaz-Uriarte, R) -at 50% threshold, GO terms in most lists refer to “nucleus” -at 40% threshold, the term “cell cycle” appears in several of the lists. As reported in the original van’t Veer et al. paper, genes involved in cell cycle are upregulated in the poor prognosis signature -at 20% threshold, the term “mitosis” appears in most of the lists -If we examine PaLS results from Reactome at the 20% threshold we see “cell cycle. Mitotic” in most of the lists. -The list “6 th. Cross-validation run” shows “E2F mediated regulation of DNA replication”

-Ramón Díaz-Uriarte. Structural Biology and Biocomputing. CNIO -Andreu Alibés. EMBL-CRG Systems Biology Unit. -Edward R. Morrissey. Systems Biology DTC. University of Warwick