Download presentation
Presentation is loading. Please wait.
1
Strategies & Examples for Functional Modeling
COST Functional Modeling Workshop 22-24 April, Helsinki
2
Types of data sets and modeling
Commercial array data – more likely to have tools that support the use of array IDs. Custom/USDA array data – problems with updating IDs, linking to function and using array IDs directly in functional modeling tools. Proteomics data – larger data sets; need to make background references to determine enrichment. RNA-Seq data – largerand more complex data sets; novel transcripts currently can’t be included in modeling (contact AgBase to assign GO). Real-time data or quantitative proteomics data – hypothesis testing.
3
Functional Modeling Strategies
GO summary (using Slim sets) GO enrichment (statistical!) Pathways analysis Interaction or networks analysis Hypothesis testing Note: Functional modeling should be integrated. Approaches are complementary, not exclusive. Modeling is driven by the biology (not the other way round).
4
Modeling Strategy Think about using multiple functional approaches.
GO, pathways, networks complementary What is available for your species? What GO is available? What species does the pathways/network analysis use? What resources do you have? at your institute (e.g. commercial pathways analysis) open source (e.g. GO Enrichment analysis) using online vs installed Iterative – further functional modeling based on initial results GO hypothesis testing?
5
1. GO Functional Summary high throughput data sets gives us 1000s -10,000s of gene products can’t know everything about all gene products tendency to ‘cherry pick’ ones you recognize instead, can group gene products by function this gives us a manageable number of categories to process enables us to see trends, patterns, etc Use GO Slim sets to ‘summarize’ data Lose details (but can gain perspective). Some GO Slim sets are ageing – not being updated as changes to the GO are made. Different Slim sets have different terms – which is best for your data? AgBase GOSlimViewer tool.
6
The Slim set you use matters - need to determine which one to use & report it in Methods.
9
Functional Summary Not all GO terms are annotated equally, e.g., metabolism! can slim the complete GO for a species as a background set and then determine terms in your data are disproportionately expressed. Can use Slims to compare two data sets (e.g., control vs treatment). Use Slims for your own sanity – are you seeing what you expect to see?
10
Membrane proteins grouped by GO BP:
B-cells Stroma cell cycle/cell proliferation cell adhesion cell growth apoptosis immune response ion/proton transport cell migration cell-cell signaling function unknown development endocytosis proteolysis and peptidolysis signal transduction protein modification
11
Membrane proteins grouped by GO BP:
B-cells Stroma cell cycle/cell proliferation apoptosis immune response cell migration cell-cell signaling function unknown
12
BVDV Infection – cytopathic (CP) vs non-cytopathic (NCP) infection
(comparing function between 2 different conditions)
14
2. Determining over-represented or under-represented function.
most typically used functional analysis method many, many tools that do this – see: very different visualization will use some of these tools in practical session
16
Some useful expression analysis tools:
Database for Annotation, Visualization and Integrated Discovery (DAVID) AgriGO -- GO Analysis Toolkit and Database for Agricultural Community used to be EasyGO chicken, cow, pig, mouse, cereals, dicots adding new species by request Onto-Express can provide your own gene association file Ontologizer WebStart widget (requires Java); now on Galaxy requires OBO file & GAF (enables users to select their own annotations)
17
GO Enrichment tools that support agricultural species.
20
structurally and functionally re-annotated a microarray
quantified the impact of this re-annotation based on GO annotations & pathways represented on the array tested using a previously published experiment that used this microarray re-annotation allows more comprehensive GO based modeling and improves pathway coverage re-annotation resulted in a different model from previously published research findings
22
Evaluating GO tools Some criteria for evaluating GO Tools:
Does it include my species of interest (or do I have to “humanize” my list)? What does it require to set up (computer usage/online) What was the source for the GO (primary or secondary) and when was it last updated? Does it report the GO evidence codes (and is IEA included)? Does it report which of my gene products has no GO? Does it report both over/under represented GO groups and how does it evaluate this? Does it allow me to add my own GO annotations? Does it represent my results in a way that facilitates discovery?
23
RNASeq GO Enrichment RNASeq experiments: longer transcripts and more highly expressed transcript are more likely to be differentially expressed. Current GO enrichment tools do not account for RNASeq platform bias (most based upon arrays). assume that all genes are independent and equally likely to be selected as DE
24
3. Pathway Analysis Freely available tools:
from public databases, e.g. KEGG & Reactome Freely available tools, e.g. Cytoscape Commercial pathways analysis tools: e.g., Ingenuity Pathways Analysis (IPA), Pathway Studio, etc. some tools only have limited species – need to “humanize” animal data, etc for plants with Arabidopsis everything gives you cancer Many pathways analysis tools combine pathways analysis, network analysis.
25
Reactome Skypainter
26
KEGG Pathways
27
Analysis tools (commercial)
Networks Ingenuity Pathway Analysis Pathways functions and diseases Gene Ontology (GO) groups Pathway Studio GSEA Pathways IPA analysis included as IPA.txt
28
Data Curation Ingenuity: Manually curated database by Ph.D level scientists (mining 32 different peer reviewed journals). Pathway studio: Automated curation by Medscan Reader using Natural language processing (NLP) technology. Mining Pubmed abstracts and peer reviewed journals users can do their own text mining
29
(Comparison by Divya Peddinti)
Comparison Criteria Features Proportion of proteins involved in modeling Data generation Display Test Dataset: 3,600 bovine spermatozoa proteins (Comparison by Divya Peddinti)
30
Feature Ingenuity Pathway analysis (IPA) Pathway studio Input GI number Microarray ID Affymetrix ID GenBank Swiss Prot Accession Unigene ID Name orAlias HUGO ID Entrez gene Name or Alias HUGO ID Databases Contains biological interactions data for human, mouse, rat Orthologous mapping available for dog, Cow, Chimp, Chicken, Rhesus macaque monkey, Arabidopsis thaliana, Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio Contains biological data for human, mouse, rat, bacteria, chicken, Zebra fish, frog, cow, bee, dog, Arabidopsis, Drosophila, Yeast, and transplantation research etc.
31
Builds networks with a maximum of 35 genes/ proteins
Ingenuity Pathway analysis (IPA) Pathway studio Statistical test The significance value (p value) assigned to the function / pathways using Fischer’s exact test The statistical significance of the overlap between the protein list and a GO group or pathway using the Fischer’s exact test. Updates Quarterly Networks Builds networks with a maximum of 35 genes/ proteins -
32
Proteins involved in modeling
33
Data generation 37 7 26
34
Pathway display EGF signaling pathway
35
4. Network Analysis IPA & Pathway Studio equally efficient at drawing networks of relationships. IPA : simplifies the pathway display and creates more manageable user friendly network for users to analyze. Pathway Studio: Shows the relations in a table format. STRING Database - known and predicted protein interactions.
38
5. Hypothesis Testing high throughput data sets – ‘fishing expedition’ or hypothesis generation but GO also serves as a repository of biological function – can be used for hypothesis testing based on these data sets
39
The critical time point in MD lymphomagenesis
18 16 Genotype Hypothesis At the critical time point of 21 dpi, MD-resistant genotypes have a T-helper (Th)-1 microenvironment (consistent with CTL activity), but MD-susceptible genotypes have a T-reg or Th-2 microenvironment (antagonistic to CTL). 14 Susceptible (L72) Resistant (L61) 12 mean total lesion score 10 Non-MHC associated resistance and susceptibility 8 6 4 2 20 40 60 80 100 days post infection 39
40
CYTOKINES AND T HELPER CELL DIFFERENTIATION
T reg NAIVE CD4+ T CELL APC Th-2 Th-1 Shyamesh Kumar
41
Th-1, Th-2, T-reg ? Inflammatory? T reg IL 12 IL 4 NAIVE CD4+ T CELL
L6 Whole APC L7 Whole Smad 7 L7 Micro IL 12 IL 4 Th-1, Th-2, T-reg ? Inflammatory? Th-2 Th-1 TGFβ IL 4 IL10 IFN γ IL 12 IL 18 CTL Macrophage NK Cell 41
42
Step II. Multiply by quantitative data for each gene product.
Step III. Inclusion of quantitative data to the phenotype scoring table and calculation of net affect. Step I. GO-based Phenotype Scoring. 1 -1 SMAD-7 GPR-83 CTLA-4 TGF-b IFN-g IL-18 ND IL-13 IL-12 IL-10 IL-8 IL-6 IL-4 IL-2 Inflammation Treg Th2 Th1 Gene product ND = No data Gene product Th1 Th2 Treg Inflammation IL-2 1.58 -1.58 IL-4 0.00 IL-6 -1.20 1.20 IL-8 1.18 IL-10 IL-12 IL-13 1.51 -1.51 IL-18 0.91 IFN-g TGF-b -1.71 1.71 CTLA-4 -1.89 1.89 GPR-83 -1.69 1.69 SMAD-7 Net Effect -1.29 -5.38 10.15 -5.98 Step II. Multiply by quantitative data for each gene product.
43
Microscopic lesions L6 (R) L7 (S) 60 50 40 Net Effect 30 20 10 Th-1
5mm Microscopic lesions 60 L6 (R) 50 40 L7 (S) Net Effect 30 20 10 Th-1 Th-2 T-reg - 10 Inflammation Phenotype - 20
44
L6 Resistant L7 Susceptible Pro T-reg Pro T-reg Pro Th-1 Pro Th-2 Anti
Anti CTL Pro CTL Anti CTL Pro CTL
46
Concluding thoughts on functional modeling.
“By doing just a little every day, I can gradually let the task overwhelm me.” Ashleigh Brilliant
47
Bringing it all together…
There is no one “correct” way; there is no “right” answer. Using multiple functional modeling strategies (e.g., GO, pathways, networks) can help with insights. Need to use biological knowledge to bring these different approaches together. Functional modeling is often iterative. Need to focus not only on what is known but what is new!
48
Overview of Functional Modeling Strategy
Genes/Proteins with no GO annotations Microarrays ArrayIDer GORetriever GOanna Blast2GO Protein/Gene identifiers Proteomics GO annotations Genome2seq RNASeq GO Enrichment analysis Ingenuity Pathways Analysis (IPA) Pathway Studio Cytoscape DAVID AgriGO Onto-tools GOSlimViewer AutoSlim Pathways and network analysis Ingenuity Pathways Analysis (IPA) Pathway Studio Cytoscape DAVID Yellow boxes represent AgBase tools Green boxes are non-AgBase resources
49
Functional Modeling Considerations
Should I add my own GO? use GOProfiler to see how much GO is available for your species use GORetriever to find existing GO for your dataset Does analysis tool allow me to add my own GO? Should I do GO analysis and pathway analysis and network analysis? different functional modeling methods show different aspects about your data (complementary) is this type of data available for your species (or a close ortholog)? What tools should I use? which tools have data for your species of interest? what type of accessions are accepted? availability (commercial and freely available)
50
Some Limitations Annotation is not complete.
not all the data is annotated some gene products have no functional information Gene Ontology is only one aspect of functional modeling. anatomy, tissue expression, phenotype, disease, etc Gene nomenclature – need to know what we are annotating! Functional modeling tools need to handle larger data sets (& multiple ontologies?).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.