BridgeDb Martijn van Iersel BiGCaT Maastricht
The 7 Virtues of Bioinformatics 1.Solve a problem 2.Start small 3.Modularity 4.Design for code re-use 5.Open Source 6.Attention to detail 7.Eat your own dog-food
Solve a problem What problem are you solving?
Problem: Identifier Mapping ? Agilent reporter A46_P45789 Entrez Gene 3643
Solution: Conversion tools
Problem: Usability Check for double IDs Check for missing IDs Only 1000 at once Check alignment of Excel columns Manual Error-prone
Solution: Built-in Mapping Generic bioinformatics platforms should have identifier mapping built-in. BioConductor PathVisio Cytoscape... Batteries Included
Solution: Built-in Mapping Mapping service Entrez Gene 3643 Agilent reporter A46_P45789
Synergizer EnsMart DAVID CRONOS AliasServer MatchMiner OntoTranslate Problem: Which mapping service?
Solution: Abstraction Layer
interface IDMapper class IDMapperRdb relational database class IDMapperFile tab-delimited text class IDMapperBiomart web service
CyThe- saurus Wiki Pathways PathVisio Network Merge BridgeDb Internet webservices BioMart BridgeDb- REST Local Database Tab- delimited text files Tools Mapping Services PICR Cytoscape Plugins BMC Bioinformatics Jan 4;11(1):5
BridgeDb interface 1: JAVA interface2: REST interface
API Overview BridgeDb.connect(...) IDMapper.mapID(...) Xref.getUrl() DataSource.getUrl()
Easy & Flexible Code
BridgeDb interface 1: JAVA interface2: REST interface
REST API ILMN_ Illumina Affy NP_ RefSeq IPI IPI GO: GeneOntology NM_033282RefSeq Affy 94233Entrez Gene ENSG Ensembl Human _atAffy A6NEB4Uniprot/TrEMBL Illumina GO: GeneOntology OMIM A_23_P24234Agilent 14449HUGO
REST API / / [ /... ]\
R Example
Types of Mapping Services TypeAdvantages Webservice+ always up-to-date + no disk-space required + no installation required Relational Database + highly efficient + versioned: updated only when you want to. Flat file+ easy to customize
Available Mapping Services NameTypeMaintainer Gene Databases (Ensembl based) DatabaseUs Metabolite databases (HMDB-based) DatabaseUs BridgeWebserviceWebserviceUs BioMartWebserviceEBI CRONOSWebserviceHemholtz Zentrum SynergizerWebserviceHarvard Medical School PICRWebserviceEBI
Problem: Custom Microarrays Custom probe #QXZCY!34 ?
EnsMart Custom table Solution: Stacking
Ensembl EntrezCustom microarray Relation defined by mapping source A Relation defined by mapping source B Inferred, transitive relationship
Comparison
CyThesaurus
MIRIAM Resources
Solution: MIRIAM Resources Regular expression for autodetection Pattern for generating URLs Link to documentation
The 7 Virtues of Bioinformatics 1.Solve a problem 2.Start small 3.Eat your own dog-food 4.Attention to detail 5.Modularity 6.Design for code re-use 7.Open Source
A Question to Linus Torvalds Q: “Do you have any tips for people who want to undertake a large open source project?” A: “Nobody should start to undertake a large project. You start with a small trivial project, and you should never expect it to get large.… … If it doesn't solve some fairly immediate need, it's almost certainly over-designed.… …You need to get something half-way useful first, and then others will say "hey, that almost works for me", and they'll get involved in the project”
Also from Linus Torvalds “I'm right and anyone who disagrees is stupid and ugly” “My name is Linus Torvalds and I am your god.”
Code Re-Use Reinventing the wheel is one of the 7 Deadly sins of Bioinformatics
Code Re-Use
Q: How to design re-usable code? A: Actually use it in more than one project from the start bridgedb Cytoscape PathVisio
Modularity
Open source Public money -> Public code Reproducibility Academic ideal Trust Insurance against vendor lock-in
Open source Now where are all those free programmers?
Open Source Web site Version controlMailing list Bug tracker
Eat your own dog food
Are you named “alkfdjlkdsf”? Why not “Hélène O’Brian?” …or “Bobby Tables”?
Eat your own dog food Real data has missing values Real data has commas instead of dots Real data has duplicate identifiers Real data starts with “ID” in the first cell* *Which Excel doesn’t like
User friendliness
Hallway usability testing Grab a passer-by from the hallway and put them in front of your program (We usually use students)
Thanks Alex Pico (UCSF) Kristina Hanspers (UCSF) Isaac Ho (UCSF) Bruce Conklin (UCSF) Jianjiong Gao (U. Missouri) Thomas Kelder (BiGCaT, Maastricht) Chris Evelo (BiGCaT, Maastricht) Brian Turner (U. Toronto) Igor Rodchenkov (U. Toronto)
Ways to run BridgeDb (1/3)
Ways to run BridgeDb (2/3)
Ways to run BridgeDb (3/3)
Open source Is it difficult?
Open source = = rw
Open source = = rw * = r