Benefits of a GO integrated analysis environment.

Slides:



Advertisements
Similar presentations
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
All images © Mat Wright Role of English in the Development of World Class Universities Mark Baumfield Senior Adviser Education.
UNDP RBA MDG Based National Development Planning Workshop Incorporating Gender Across All Sectors UN Millennium Project February 27-March 3, 2006.
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
Copyright 2006 Digital Enterprise Research Institute. All rights reserved. MarcOnt Initiative Tools for collaborative ontology development.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
NCBO-I2B2 Collaboration Overview and Use Cases Nigam Shah
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop November 2007.
New Release Announcements and Product Roadmap Chris DiPierro, Director of Software Development April 9-11, 2014
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
BI Web Intelligence 4.0. Business Challenges Incorrect decisions based on inadequate data Lack of Ad hoc reporting and analysis Delayed decisions.
Project Overview for IW:LEARN The GEF International Waters Learning Exchange and Resource Network Dann Sklarew, Ph.D. Chief Technical.
KEOD 2013 – 20 th September 2013 A Comprehensive Framework for Semantic Annotation of Web Content Manuel Fiorelli 1, Maria Teresa Pazienza 2, Armando Stellato.
The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway.
Collaboration-Protocol Profile and Agreement Specification Armin Haller Digital Enterprise Research Institute
CX Analytics: Best Practices in Measuring For Success
Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.
NON-FUNCTIONAL PROPERTIES IN SOFTWARE PRODUCT LINES: A FRAMEWORK FOR DEVELOPING QUALITY-CENTRIC SOFTWARE PRODUCTS May Mahdi Noorian
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Developing Health Geographic Information Systems (HGIS) for Khorasan Province in Iran (Technical Report) S.H. Sanaei-Nejad, (MSc, PhD) Ferdowsi University.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Automated Data Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Data Management Training Cairo, Egypt April.
Presentation Outline (hidden slide) Technical Level: 100 Intended Audience: TDMs, ITPros, ITDMs, BI specialists Objectives (what do you want the audience.
GCMD/IDN STATUS AND PLANS Stephen Wharton CWIC Meeting February19, 2015.
material assembled from the web pages at
Best of Both Worlds: Information Management Solutions SmartCore Management Dashboards.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Copyright OpenHelix. No use or reproduction without express written consent1.
Enterprise Reporting Solution
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
The Brain Project – Building Research Background Part of JISC Virtual Research Environments (Phase 3) Programme Based at Coventry University with Leeds.
Metadata in the iPlant Collaborative Cyberinfrastructure Birds of a Feather meeting at PAG XXII, Jan. 14, 2014.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Towards Web Semantics Spreadsheets and the US Government Lee Feigenbaum, Cambridge Semantics Brand Niemann, U.S. EPA SICoP Special Conference February.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
A portal interface to my Grid workflow technology Stefan Rennick Egglestone University of Nottingham
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
1.Registration block send request of registration to super peer via PRP. Process re-registration will be done at specific period to info availability of.
The Semantic Logger: Supporting Service Building from Personal Context Mischa M Tuffield et al. Intelligence, Agents, Multimedia Group University of Southampton.
OWL Representing Information Using the Web Ontology Language.
1 © Xchanging 2010 no part of this document may be circulated, quoted or reproduced without prior written approval of Xchanging. MOSS Training – UI customization.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
University of Illinois at Urbana-Champaign. BeeSpace Project 5-year NSF-funded project Project Goals  Develop open bioinformatics resources  Support.
PDS4 Demonstration Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Prizms for Data Publication and Management Katie Chastain May 9, 2014.
ECHO Technical Interchange Meeting 2013 Timothy Goff 1 Raytheon EED Program | ECHO Technical Interchange 2013.
Getting the Most outof EPM Converting FDM to FDMEE – What’s it all about? March 16, 2016 Joe Mizerk
1 © 2004 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Technical Support Seminar Using the Cisco Technical Support Website.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
CyVerse Discovery Environment
Microsoft SharePoint Server 2016
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
A portal interface to myGrid workflow technology
Customization Guidelines for BMC Remedy IT Service Management 7.5
Staying afloat in the sensor data deluge
BusinessObjects 4.2 SP3 What's new for System Administration in CMC
Learning Objectives Understand the purpose of Workflow System
Customization Guidelines for BMC Remedy IT Service Management 7.5
Colectica 5 A New Generation of Open Metadata Tools
AI Discovery Template IBM Cloud Architecture Center
Integrated Statistical Production System WITH GSBPM
Presentation transcript:

Benefits of a GO integrated analysis environment

Problem 1: Selecting and effectively using tools Enrichment analysis is a ‘killer app’ for GO – Should be more central to what we do – Also other tools: e.g. function prediction Problem: – Multiple tools with different characteristics Statistical method Environment / customizability Visualization – Can we better help users: Select the right tool(s) for the job Run their analysis Build scalable workflows that allow replication 2http://geneontology.org

Problem 2: Evaluating improvements in GO How do we know how well we’re doing? – Progress in GO difficult to quantify – Same enrichment analysis may give completely different results one year to the next 3http://geneontology.org

Solution: GO Tools Environment Tools: – Selecting the right tool Solution: Detailed, accurate, up-to-date metadata on each tool – Galaxy: A standard platform for running analyses ‘operating system’ for bioinformatics analyses allows plug and play – Combining tools Common community interchange standards for GO analysis tools – Common term enrichment result format plus converters Evaluation of GO: – Use Term Enrichment Analyses (TEAs) and standard gene sets to evaluate GO 4http://geneontology.org

Tool metadata: background We have ~130 GO tools registered – ~50 TEA tools – We don’t have all of them – Some info out of date We need to capture more metadata – We want to be able to quickly answer queries like Find an EA tool that – uses hypergeometric tests – can be used for – has not updated their annotation sets in > 6 mo – has visualization – I can use for my RNAseq data

Tool metadata: progress/tasks Progress: – Infrastructure migration: Migrated from custom tool metadata format and web infrastructure using curated NIF registry – Built on semantic mediawiki Ongoing tasks: 1.Add more fields (curated and automated) Recommendations? 2.Integrate with go-help 3.Improve integration with main GO website seamlessness allow intuitive filtering of tools See Seth’s presentation

New Tools Registry

Example

Standard Term Enrichment Analysis Platform: background Tools run in their own environment – Difficult to Compare Integrate into larger workflows Provide uniform interface Solution: – Standard workflow environment Variety of workflow systems – Kepler – Galaxy – Taverna Galaxy has a number of advantages – Simple to set up and extend – heavily used for next-gen analyses – Tools for intermine etc

Standard Term Enrichment Analysis Platform: progress/tasks Progress – proof of concept prototype of GO environment Ongoing tasks 1.Stabilize environment 2.Add more enrichment tools 3.Seamless integration with GO website 4.Outreach User documentation Bio-curators 2013 presentation ONTO-ToolKit: enabling bio-ontology engineering via Galaxy E Antezana, A Venkatesan, C Mungall, V Mironov, M Kuiper BMC bioinformatics 11 (Suppl 12), S8 2010

Screenshot

Interchange Standards: Outline We need a Term Enrichment Result Format (TERF) – Standard output format for TE tools Compliant tools will export TERF We will write wrappers for non-compliant tools Allows capture of detailed metadata about each result set Example of use: 1.User runs gene set through tool X 2.User takes output of tool X and feeds it into visualizer Y

Interchange Standards: progress/tools Progress – google code project created – preliminary format specified TSV form and RDF/turtle form – some converters written ermine/J, ontologizer Ongoing tasks: 1.complete specification public working draft for comments incorporate comments final specification 2.Outreach work with tool developers 3.write additional converters target command-line tools that provide diverse capabilities

GO evaluation: background How do changes in the GO affect user’s queries and analyses? – We would hope that concerted ontology and annotation improvements would yield more complete and informative results – But we don’t know!

GO evaluation: background How do changes in the GO affect queries and analyses? – We don’t know! – We would hope that concerted ontology and annotation improvements would yield more informative results – TE results are frequently reported in papers but rarely replicated If we had a repository of gene sets we could systematically re- analyze these with different variables – version of GO, version of annotations – different tools If these gene sets were annotated with ‘expected’ results we could systematically determine if GO was delivering expected answers – danger of overfitting, but still useful ESlvaluationSlide + analysis by: Erik Clarke (Su Group)

GO evaluation: plan Collect gene sets – Others can help us here Select subset of tools Analyze gene sets with this tool set and previous versions of GO + annotations – Cache results in TERF Every month/quarter re-analyze gene sets – Can be done in Jenkins or Galaxy Investigate anomalies Explore as metric for GO

Conclusions An integrated analysis environment can help – Users – The GOC More work to be done – Need to dedicate resources

Example user scenario User (optionally) specifies their problem. E.g. – species – specific hypothesis; e.g. immune system – identifier type (e.g. ensembl IDs) User is given a list of appropriate tools – Either: link to external web interface link to download, install and run tool ability to run tool(s) in GO analysis environment User chooses to run ontologizer in GO environment – ensembl IDs are mapped using external ID mapper – GO immune subset is created on-the-fly – annotations filtered – subset, filtered annotations, mapped IDs fed to ontologizer – ontologizer results translated to common format – results can be plugged into choice of visualization tool