Strategies & Examples for Functional Modeling

Slides:



Advertisements
Similar presentations
Pathways analysis Iowa State Workshop 11 June 2009.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Modeling Functional Genomics Datasets CVM Lesson 3 13 June 2007Fiona McCarthy.
GO-based tools for functional modeling GO Workshop 3-6 August 2010.
Pathways & Networks analysis COST Functional Modeling Workshop April, Helsinki.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Threshold selection in gene co- expression networks using spectral graph theory techniques Andy D Perkins*,Michael A Langston BMC Bioinformatics 1.
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
CACAO - Penn State Gene Function and Gene Ontology January 2011
Pathway Informatics 6 th July, 2015 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University of.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
GO Enrichment analysis COST Functional Modeling Workshop April, Helsinki.
Modeling Functional Genomics Datasets CVM Lessons 4&5 10 July 2007Bindu Nanduri.
Immune Cell Ontology for Networks (ICON) Immunology Ontologies and Their Applications in Processing Clinical Data June 11-13, Buffalo, NY.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Gene Set Enrichment Analysis (GSEA)
Bioinformatics Dr. Víctor Treviño BT4007
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Copyright OpenHelix. No use or reproduction without express written consent1.
Networks and Interactions Boo Virk v1.0.
Examples of functional modeling. NCSU GO Workshop 29 October 2009.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.
Managing Data Modeling GO Workshop 3-6 August 2010.
Inferring Function From Known Genes Naomi Altman Nov. 06.
WIIFM: examples of functional modeling GO Workshop 3-6 August 2010.
Adding GO for Large Datasets COST Functional Modeling Workshop April, Helsinki.
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
Strategies for functional modeling TAMU GO Workshop 17 May 2010.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
GO-based tools for functional modeling TAMU GO Workshop 17 May 2010.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Workshop Aims NMSU GO Workshop 20 May Aims of this Workshop  WIIFM? modeling examples background information about GO modeling  Strategies for.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Reactome - a curated knowledgebase of human biological pathways and processes.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Increasing GO Annotation Through Community Involvement Fiona McCarthy*, Nan Wang*, Susan Bridges** and Shane Burgess** GO.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Central dogma: the story of life RNA DNA Protein.
WIIFM: examples of functional modeling NMSU GO Workshop 20 May 2010.
Introduction to the Gene Ontology GO Workshop 3-6 August 2010.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Examples of functional modeling. Iowa State Workshop 11 June 2009.
9/10/06 GO Users Meeting 2006 Seattle, Washington The AgBase GO Annotation Tools Susan Bridges 1,3, Fiona McCarthy 2,3, Nan Wang 1,3, G. Bryce Magee 1,3,
Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
GO based data analysis Iowa State Workshop 11 June 2009.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Phenotype And Trait Ontology (PATO) and plant phenotypes
Prioritization of Avian GO Annotation , , Chicken ,06949,5163.4Rat ,69664, Mouse ,83036, Human.
Pathway Informatics 30 th March, 2016 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University.
Networks and Interactions
Annotating with GO: an overview
Strategies for functional modeling
GO : the Gene Ontology & Functional enrichment analysis
Workshop Aims TAMU GO Workshop 17 May 2010.
Department of Genetics • Stanford University School of Medicine
Workshop Aims GO Workshop 3-6 August 2010.
Functional Annotation of the Horse Genome
Strategy for working on your own data sets.
Pathway Informatics December 5, 2018 Ansuman Chattopadhyay, PhD
Pathway Visualization
Presentation transcript:

Strategies & Examples for Functional Modeling COST Functional Modeling Workshop 22-24 April, Helsinki

Types of data sets and modeling Commercial array data – more likely to have tools that support the use of array IDs. Custom/USDA array data – problems with updating IDs, linking to function and using array IDs directly in functional modeling tools. Proteomics data – larger data sets; need to make background references to determine enrichment. RNA-Seq data – largerand more complex data sets; novel transcripts currently can’t be included in modeling (contact AgBase to assign GO). Real-time data or quantitative proteomics data – hypothesis testing.

Functional Modeling Strategies GO summary (using Slim sets) GO enrichment (statistical!) Pathways analysis Interaction or networks analysis Hypothesis testing Note: Functional modeling should be integrated. Approaches are complementary, not exclusive. Modeling is driven by the biology (not the other way round).

Modeling Strategy Think about using multiple functional approaches. GO, pathways, networks complementary What is available for your species? What GO is available? What species does the pathways/network analysis use? What resources do you have? at your institute (e.g. commercial pathways analysis) open source (e.g. GO Enrichment analysis) using online vs installed Iterative – further functional modeling based on initial results GO hypothesis testing?

1. GO Functional Summary high throughput data sets gives us 1000s -10,000s of gene products can’t know everything about all gene products tendency to ‘cherry pick’ ones you recognize instead, can group gene products by function this gives us a manageable number of categories to process enables us to see trends, patterns, etc Use GO Slim sets to ‘summarize’ data Lose details (but can gain perspective). Some GO Slim sets are ageing – not being updated as changes to the GO are made. Different Slim sets have different terms – which is best for your data? AgBase GOSlimViewer tool.

http://www.agbase.msstate.edu/help/slimviewerhelp.htm The Slim set you use matters - need to determine which one to use & report it in Methods.

Functional Summary Not all GO terms are annotated equally, e.g., metabolism! can slim the complete GO for a species as a background set and then determine terms in your data are disproportionately expressed. Can use Slims to compare two data sets (e.g., control vs treatment). Use Slims for your own sanity – are you seeing what you expect to see?

Membrane proteins grouped by GO BP: B-cells Stroma cell cycle/cell proliferation cell adhesion cell growth apoptosis immune response ion/proton transport cell migration cell-cell signaling function unknown development endocytosis proteolysis and peptidolysis signal transduction protein modification

Membrane proteins grouped by GO BP: B-cells Stroma cell cycle/cell proliferation apoptosis immune response cell migration cell-cell signaling function unknown

BVDV Infection – cytopathic (CP) vs non-cytopathic (NCP) infection (comparing function between 2 different conditions)

2. Determining over-represented or under-represented function. most typically used functional analysis method many, many tools that do this – see: http://www.geneontology.org/GO.tools.microarray.shtml very different visualization will use some of these tools in practical session

http://david.abcc.ncifcrf.gov/home.jsp

Some useful expression analysis tools: Database for Annotation, Visualization and Integrated Discovery (DAVID) http://david.abcc.ncifcrf.gov/ AgriGO -- GO Analysis Toolkit and Database for Agricultural Community http://bioinfo.cau.edu.cn/agriGO/ used to be EasyGO chicken, cow, pig, mouse, cereals, dicots adding new species by request Onto-Express http://vortex.cs.wayne.edu/projects.htm#Onto-Express can provide your own gene association file Ontologizer WebStart widget (requires Java); now on Galaxy http://compbio.charite.de/contao/index.php/ontologizer2.html requires OBO file & GAF (enables users to select their own annotations)

GO Enrichment tools that support agricultural species.

structurally and functionally re-annotated a microarray quantified the impact of this re-annotation based on GO annotations & pathways represented on the array tested using a previously published experiment that used this microarray re-annotation allows more comprehensive GO based modeling and improves pathway coverage re-annotation resulted in a different model from previously published research findings

Evaluating GO tools Some criteria for evaluating GO Tools: Does it include my species of interest (or do I have to “humanize” my list)? What does it require to set up (computer usage/online) What was the source for the GO (primary or secondary) and when was it last updated? Does it report the GO evidence codes (and is IEA included)? Does it report which of my gene products has no GO? Does it report both over/under represented GO groups and how does it evaluate this? Does it allow me to add my own GO annotations? Does it represent my results in a way that facilitates discovery?

RNASeq GO Enrichment RNASeq experiments: longer transcripts and more highly expressed transcript are more likely to be differentially expressed. Current GO enrichment tools do not account for RNASeq platform bias (most based upon arrays). assume that all genes are independent and equally likely to be selected as DE

3. Pathway Analysis Freely available tools: from public databases, e.g. KEGG & Reactome Freely available tools, e.g. Cytoscape Commercial pathways analysis tools: e.g., Ingenuity Pathways Analysis (IPA), Pathway Studio, etc. some tools only have limited species – need to “humanize” animal data, etc for plants with Arabidopsis everything gives you cancer Many pathways analysis tools combine pathways analysis, network analysis.

Reactome Skypainter http://www.reactome.org/cgi-bin/skypainter2

KEGG Pathways http://www.kegg.jp/kegg/download/kegtools.html

Analysis tools (commercial) Networks Ingenuity Pathway Analysis Pathways functions and diseases http://www.ingenuity.com Gene Ontology (GO) groups Pathway Studio GSEA Pathways http://www.ariadnegenomics.com/ IPA analysis included as IPA.txt

Data Curation Ingenuity: Manually curated database by Ph.D level scientists (mining 32 different peer reviewed journals). Pathway studio: Automated curation by Medscan Reader using Natural language processing (NLP) technology. Mining Pubmed abstracts and peer reviewed journals users can do their own text mining

(Comparison by Divya Peddinti) Comparison Criteria Features Proportion of proteins involved in modeling Data generation Display Test Dataset: 3,600 bovine spermatozoa proteins (Comparison by Divya Peddinti)

Feature Ingenuity Pathway analysis (IPA) Pathway studio Input GI number Microarray ID Affymetrix ID GenBank Swiss Prot Accession Unigene ID Name orAlias HUGO ID Entrez gene Name or Alias HUGO ID Databases Contains biological interactions data for human, mouse, rat Orthologous mapping available for dog, Cow, Chimp, Chicken, Rhesus macaque monkey, Arabidopsis thaliana, Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio Contains biological data for human, mouse, rat, bacteria, chicken, Zebra fish, frog, cow, bee, dog, Arabidopsis, Drosophila, Yeast, and transplantation research etc.

Builds networks with a maximum of 35 genes/ proteins Ingenuity Pathway analysis (IPA) Pathway studio Statistical test The significance value (p value) assigned to the function / pathways using Fischer’s exact test The statistical significance of the overlap between the protein list and a GO group or pathway using the Fischer’s exact test. Updates Quarterly Networks Builds networks with a maximum of 35 genes/ proteins -

Proteins involved in modeling

Data generation 37 7 26

Pathway display EGF signaling pathway

4. Network Analysis IPA & Pathway Studio equally efficient at drawing networks of relationships. IPA : simplifies the pathway display and creates more manageable user friendly network for users to analyze. Pathway Studio: Shows the relations in a table format. STRING Database - known and predicted protein interactions.

http://string-db.org/

http://www.cytoscape.org/

5. Hypothesis Testing high throughput data sets – ‘fishing expedition’ or hypothesis generation but GO also serves as a repository of biological function – can be used for hypothesis testing based on these data sets

The critical time point in MD lymphomagenesis 18 16 Genotype Hypothesis At the critical time point of 21 dpi, MD-resistant genotypes have a T-helper (Th)-1 microenvironment (consistent with CTL activity), but MD-susceptible genotypes have a T-reg or Th-2 microenvironment (antagonistic to CTL). 14 Susceptible (L72) Resistant (L61) 12 mean total lesion score 10 Non-MHC associated resistance and susceptibility 8 6 4 2 20 40 60 80 100 days post infection 39

CYTOKINES AND T HELPER CELL DIFFERENTIATION T reg NAIVE CD4+ T CELL APC Th-2 Th-1 Shyamesh Kumar

Th-1, Th-2, T-reg ? Inflammatory? T reg IL 12 IL 4 NAIVE CD4+ T CELL L6 Whole APC L7 Whole Smad 7 L7 Micro IL 12 IL 4 Th-1, Th-2, T-reg ? Inflammatory? Th-2 Th-1 TGFβ IL 4 IL10 IFN γ IL 12 IL 18 CTL Macrophage NK Cell 41

Step II. Multiply by quantitative data for each gene product. Step III. Inclusion of quantitative data to the phenotype scoring table and calculation of net affect. Step I. GO-based Phenotype Scoring. 1 -1 SMAD-7 GPR-83 CTLA-4 TGF-b IFN-g IL-18 ND IL-13 IL-12 IL-10 IL-8 IL-6 IL-4 IL-2 Inflammation Treg Th2 Th1 Gene product ND = No data Gene product Th1 Th2 Treg Inflammation IL-2 1.58 -1.58 IL-4 0.00 IL-6 -1.20 1.20 IL-8 1.18 IL-10 IL-12 IL-13 1.51 -1.51 IL-18 0.91 IFN-g TGF-b -1.71 1.71 CTLA-4 -1.89 1.89 GPR-83 -1.69 1.69 SMAD-7 Net Effect -1.29 -5.38 10.15 -5.98 Step II. Multiply by quantitative data for each gene product.

Microscopic lesions L6 (R) L7 (S) 60 50 40 Net Effect 30 20 10 Th-1 5mm Microscopic lesions 60 L6 (R) 50 40 L7 (S) Net Effect 30 20 10 Th-1 Th-2 T-reg - 10 Inflammation Phenotype - 20

L6 Resistant L7 Susceptible Pro T-reg Pro T-reg Pro Th-1 Pro Th-2 Anti Anti CTL Pro CTL Anti CTL Pro CTL

Concluding thoughts on functional modeling. “By doing just a little every day, I can gradually let the task overwhelm me.” Ashleigh Brilliant

Bringing it all together… There is no one “correct” way; there is no “right” answer. Using multiple functional modeling strategies (e.g., GO, pathways, networks) can help with insights. Need to use biological knowledge to bring these different approaches together. Functional modeling is often iterative. Need to focus not only on what is known but what is new!

Overview of Functional Modeling Strategy Genes/Proteins with no GO annotations Microarrays ArrayIDer GORetriever GOanna Blast2GO Protein/Gene identifiers Proteomics GO annotations Genome2seq RNASeq GO Enrichment analysis Ingenuity Pathways Analysis (IPA) Pathway Studio Cytoscape DAVID AgriGO Onto-tools GOSlimViewer AutoSlim Pathways and network analysis Ingenuity Pathways Analysis (IPA) Pathway Studio Cytoscape DAVID Yellow boxes represent AgBase tools Green boxes are non-AgBase resources

Functional Modeling Considerations Should I add my own GO? use GOProfiler to see how much GO is available for your species use GORetriever to find existing GO for your dataset Does analysis tool allow me to add my own GO? Should I do GO analysis and pathway analysis and network analysis? different functional modeling methods show different aspects about your data (complementary) is this type of data available for your species (or a close ortholog)? What tools should I use? which tools have data for your species of interest? what type of accessions are accepted? availability (commercial and freely available)

Some Limitations Annotation is not complete. not all the data is annotated some gene products have no functional information Gene Ontology is only one aspect of functional modeling. anatomy, tissue expression, phenotype, disease, etc Gene nomenclature – need to know what we are annotating! Functional modeling tools need to handle larger data sets (& multiple ontologies?).