BridgeDb Martijn van Iersel BiGCaT Maastricht. The 7 Virtues of Bioinformatics 1.Solve a problem 2.Start small 3.Modularity 4.Design for code re-use 5.Open.

Slides:



Advertisements
Similar presentations
Martijn van Iersel PathVisio & WikiPathways BiGCaT Bioinformatics Maastricht University.
Advertisements

Test Automation: Coded UI Test
KompoZer. This is what KompoZer will look like with a blank document open. As you can see, there are a lot of icons for beginning users. But don't be.
Spring, Hibernate and Web Services 13 th September 2014.
Feature requests for Case Manager By Spar Nord Bank A/S IBM Insight 2014 Spar Nord Bank A/S1.
JIRA – An Introduction -Arpit Jindal
How we assist knowledge collection Serving the monks Chris Evelo Dept of Bioinformatics – BiGCaT Maastricht University.
1 RUP Workshop By George Merguerian Senior Partner Business Management Consultants
Where we are and where we are going From biology to data and back again Chris Evelo Department of Bioinformatics - BiGCaT Maastricht University.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
CS 501 : An Introduction to SCM & GForge An Introduction to SCM & GForge Lin Guo
Chapter 3.1 Teams and Processes. 2 Programming Teams In the 1980s programmers developed the whole game (and did the art and sounds too!) Now programmers.
Aleksi Kallio CSC – IT Center for Science Chipster and collaboration with other bioinformatics platforms.
Presented By: Shashank Bhadauriya Varun Singh Shakti Suman.
User Group 2015 Version 5 Features & Infrastructure Enhancements.
Which should you be using and why? OPEN SOURCE / PROPRIETARY SOFTWARE.
Android Application Development 2013 PClassic Chris Murphy 1.
Rice KRAD Data Layer JPA Design Eric Westfall July 2013.
1 SSDG Connector Overview. 2 Applications Connectors SSDG SSDG Stack Service Access Providers (SAP) or Service providers (SP)‏ Implemented by IA Consultancy.
Joel Bapaga on Web Design Strategies Technologies Commercial Value.
What are we discussing? Show me the value Understanding the costs What is right for me? Mythbusters Foundation vs SharePoint What is MSF2010? Francois.
Semantic Web. Course Content
Peter Hinrichsen TechInsite Pty Ltd Rolling your own Object Persistence Framework (OPF) Please consider the following questions:
Tutorial session 1 Network generation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
A Tale of Two Apps WHY DEVELOPMENT PRACTICES MATTER Zendcon Oct
Planning and Tracking Projects with VSTS 2010 By Ahmed Nasr 1.
FIX Repository based Products Infrastructure for the infrastructure Presenter Kevin Houstoun.
The Exclusive Network Always WINs. Lets Build a Bigger Network.
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
Copyright OpenHelix. No use or reproduction without express written consent1.
Nobody’s Unpredictable Ipsos Portals. © 2009 Ipsos Agenda 2 Knowledge Manager Archway Summary Portal Definition & Benefits.
User Group Housekeeping in Gold. Regular routines make housekeeping easier.
Copyright OpenHelix. No use or reproduction without express written consent1.
Managing Data Modeling GO Workshop 3-6 August 2010.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Putting it all together Dynamic Data Base Access Norman White Stern School of Business.
Tutorial session 2 Network annotation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
ISetup – A Guide/Benefit for the Functional User! Mohan Iyer January 17 th, 2008.
Farcry Not just a game anymore…. What is Farcry?  Farcry is a Content Management System (CMS)  It is designed to separate the jobs of site creation/design.
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
ScanOcean 2 November Goals The next generation of ScanSee Provide online and offline data browsing, analysis, and visualization File based data.
Oracle Dependencies Analyzer ODA Over time, in large companies we see many Legacy systems that work with several Databases, this.
BIological NetwOrk Manager Cytoscape plugin Andrei Zinovyev Institut Curie/INSERM/Ecole de Mines, UMR 900 “Computational Systems Biology of Cancer”
A powerful network monitoring system
L8 - March 28, 2006copyright Thomas Pole , all rights reserved 1 Lecture 8: Software Asset Management and Text Ch. 5: Software Factories, (Review)
UBio Training Courses Micro-RNA web tools Gonzalo
Network-Ontology Visualization and Analysis (AVALON) Chao Zhang Computer Science Department 1.
United Nations Economic Commission for Europe Statistical Division The Importance of Databases in the Dissemination Process Steven Vale, UNECE.
Packaging for Voracity Solutions Control Panel David Turner.
The Report Generator Viewing Student Outcomes. Install the Report Generator In a browser, go to Click.
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
Methods of Software Development Karl Lieberherr Spring 2007.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Copyright OpenHelix. No use or reproduction without express written consent1.
Around the triangle Chris Evelo BiGCaT Bioinformatics Maastricht May arrays QTLs paths.
Web 2.0 Do we need the facilities at our school?.
GROUP PresentsPresents. WEB CRAWLER A visualization of links in the World Wide Web Software Engineering C Semester Two Massey University - Palmerston.
Canadian Bioinformatics Workshops
Introduction of Wget. Wget Wget is a package for retrieving files using HTTP and FTP, the most widely-used Internet protocols. Wget is non-interactive,
Reactor An ORM framework for ColdFusion Presentation By: Doug Hughes
PROJECT SECME Carthik A. Sharma Juan Carlos Vivanco Majid Khan Santhosh Kumar Grandai. Software Engineering Fall 2002.
GPML Plugin for Cytoscape Thomas Kelder Maastricht University
Introduction to gathering and analyzing data via APIs Gus Cavanaugh
Continuous Integration (CI)
Leveraging BI in SharePoint with PowerPivot and Power View
Un</br>able’s MySecretSecrets
ADO.NET Entity Framework Marcus Tillett
Patrick Flynn | Link Group Australia
Mark Quirk Head of Technology Developer & Platform Group
Presentation transcript:

BridgeDb Martijn van Iersel BiGCaT Maastricht

The 7 Virtues of Bioinformatics 1.Solve a problem 2.Start small 3.Modularity 4.Design for code re-use 5.Open Source 6.Attention to detail 7.Eat your own dog-food

Solve a problem What problem are you solving?

Problem: Identifier Mapping ? Agilent reporter A46_P45789 Entrez Gene 3643

Solution: Conversion tools

Problem: Usability Check for double IDs Check for missing IDs Only 1000 at once Check alignment of Excel columns Manual Error-prone

Solution: Built-in Mapping Generic bioinformatics platforms should have identifier mapping built-in. BioConductor PathVisio Cytoscape... Batteries Included

Solution: Built-in Mapping Mapping service Entrez Gene 3643 Agilent reporter A46_P45789

Synergizer EnsMart DAVID CRONOS AliasServer MatchMiner OntoTranslate Problem: Which mapping service?

Solution: Abstraction Layer

interface IDMapper class IDMapperRdb relational database class IDMapperFile tab-delimited text class IDMapperBiomart web service

CyThe- saurus Wiki Pathways PathVisio Network Merge BridgeDb Internet webservices BioMart BridgeDb- REST Local Database Tab- delimited text files Tools Mapping Services PICR Cytoscape Plugins BMC Bioinformatics Jan 4;11(1):5

BridgeDb interface 1: JAVA interface2: REST interface

API Overview BridgeDb.connect(...) IDMapper.mapID(...) Xref.getUrl() DataSource.getUrl()

Easy & Flexible Code

BridgeDb interface 1: JAVA interface2: REST interface

REST API ILMN_ Illumina Affy NP_ RefSeq IPI IPI GO: GeneOntology NM_033282RefSeq Affy 94233Entrez Gene ENSG Ensembl Human _atAffy A6NEB4Uniprot/TrEMBL Illumina GO: GeneOntology OMIM A_23_P24234Agilent 14449HUGO

REST API / / [ /... ]\

R Example

Types of Mapping Services TypeAdvantages Webservice+ always up-to-date + no disk-space required + no installation required Relational Database + highly efficient + versioned: updated only when you want to. Flat file+ easy to customize

Available Mapping Services NameTypeMaintainer Gene Databases (Ensembl based) DatabaseUs Metabolite databases (HMDB-based) DatabaseUs BridgeWebserviceWebserviceUs BioMartWebserviceEBI CRONOSWebserviceHemholtz Zentrum SynergizerWebserviceHarvard Medical School PICRWebserviceEBI

Problem: Custom Microarrays Custom probe #QXZCY!34 ?

EnsMart Custom table Solution: Stacking

Ensembl EntrezCustom microarray Relation defined by mapping source A Relation defined by mapping source B Inferred, transitive relationship

Comparison

CyThesaurus

MIRIAM Resources

Solution: MIRIAM Resources Regular expression for autodetection Pattern for generating URLs Link to documentation

The 7 Virtues of Bioinformatics 1.Solve a problem 2.Start small 3.Eat your own dog-food 4.Attention to detail 5.Modularity 6.Design for code re-use 7.Open Source

A Question to Linus Torvalds Q: “Do you have any tips for people who want to undertake a large open source project?” A: “Nobody should start to undertake a large project. You start with a small trivial project, and you should never expect it to get large.… … If it doesn't solve some fairly immediate need, it's almost certainly over-designed.… …You need to get something half-way useful first, and then others will say "hey, that almost works for me", and they'll get involved in the project”

Also from Linus Torvalds “I'm right and anyone who disagrees is stupid and ugly” “My name is Linus Torvalds and I am your god.”

Code Re-Use Reinventing the wheel is one of the 7 Deadly sins of Bioinformatics

Code Re-Use

Q: How to design re-usable code? A: Actually use it in more than one project from the start bridgedb Cytoscape PathVisio

Modularity

Open source Public money -> Public code Reproducibility Academic ideal Trust Insurance against vendor lock-in

Open source Now where are all those free programmers?

Open Source Web site Version controlMailing list Bug tracker

Eat your own dog food

Are you named “alkfdjlkdsf”? Why not “Hélène O’Brian?” …or “Bobby Tables”?

Eat your own dog food Real data has missing values Real data has commas instead of dots Real data has duplicate identifiers Real data starts with “ID” in the first cell* *Which Excel doesn’t like

User friendliness

Hallway usability testing Grab a passer-by from the hallway and put them in front of your program (We usually use students)

Thanks Alex Pico (UCSF) Kristina Hanspers (UCSF) Isaac Ho (UCSF) Bruce Conklin (UCSF) Jianjiong Gao (U. Missouri) Thomas Kelder (BiGCaT, Maastricht) Chris Evelo (BiGCaT, Maastricht) Brian Turner (U. Toronto) Igor Rodchenkov (U. Toronto)

Ways to run BridgeDb (1/3)

Ways to run BridgeDb (2/3)

Ways to run BridgeDb (3/3)

Open source Is it difficult?

Open source = = rw

Open source = = rw * = r