BioJava in 2002 An Open-Source Java Library for Bioinformatics (Matthew Pocock, BioJava Consulting LTD)

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.
Multi-Model Digital Video Library Professor: Michael Lyu Member: Jacky Ma Joan Chung Multi-Model Digital Video Library LYU9904 Multi-Model Digital Video.
Edoclite and Managing Client Engagements What is Edoclite? How is it used at IU? Development Process?
Web Applications Development Using Coldbox Platform Eddie Johnston.
UNDERSTANDING JAVA APIS FOR MOBILE DEVICES v0.01.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Fast Track to ColdFusion 9. Getting Started with ColdFusion Understanding Dynamic Web Pages ColdFusion Benchmark Introducing the ColdFusion Language Introducing.
Application architectures
An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra,
Editing Description Logic Ontologies with the Protege OWL Plugin.
Faculty Advisor – Dr. Suraj Kothari Client – Jon Mathews Team Members – Chaz Beck Marcus Rosenow Shaun Brockhoff Jason Lackore.
BioPerl. cpan Open a terminal and type /bin/su - start "cpan", accept all defaults install Bio::Graphics.
Android Core Logging Application Keith Schneider Introduction The Core Logging application is part of a software suite that is designed to enable geologic.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.
What’s New in Kinetic Task 3.0 Ben Christenson 3 About Me  Ben Christenson  Employee at Kinetic Data for 13 years and a member of the Product Development.
Architecture Of ASP.NET. What is ASP?  Server-side scripting technology.  Files containing HTML and scripting code.  Access via HTTP requests.  Scripting.
BioRuby and the KEGG API Toshiaki Katayama Bioinformatics center, Kyoto U., Japan Toshiaki Katayama Bioinformatics center,
Customized cloud platform for computing on your terms !
Database Design for DNN Developers Sebastian Leupold.
Introducing Axis2 Eran Chinthaka. Agenda  Introduction and Motivation  The “big picture”  Key Features of Axis2 High Performance XML Processing Model.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Patient Empowerment for Chronic Diseases System Sifat Islam Graduate Student, Center for Systems Integration, FAU, Copyright © 2011 Center.
Introduction to MDA (Model Driven Architecture) CYT.
File Processing - Database Overview MVNC1 DATABASE SYSTEMS Overview.
O|B|F Flatfile Indexing Andrew Dalke Dalke Scientific Software, LLC One of the Biohackathon projects.

File Systems (1). Readings r Reading: Disks, disk scheduling (3.7 of textbook; “How Stuff Works”) r Reading: File System Implementation ( of textbook)
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Supported by ELTE IKKK, Ericsson Hungary, in cooperation with University of Kent Erlang refactoring with relational database Anikó Víg and Tamás Nagy Supervisors:
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
© 2006 IBM Corporation Agile Planning Web UI. © 2006 IBM Corporation Agenda  Overview of APT Web UI  Current Issues  Required Infrastructure  API.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Kuali Enterprise Workflow Kuali Days – November 2008 Scott Gibson, University of Maryland Bryan Hutchinson, Cornell University James Smith, University.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
Bio-Linux 3.0 An integrated bioinformatics solution for the EG community ClustalX showing DNA polymerase alignment GeneSpring showing yeast transcriptome.
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 Slides from Michael Dicuccio’s Genome Workbench.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
© 2006 Intland Software1 Aron Gombas Architect, Intland Software Extending & customizing CodeBeamer.
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
BioRuby 2005 Toshiaki Katayama Human Genome Center,University of Tokyo, Japan Toshiaki Katayama Human Genome Center,University.
BioPerl Ketan Mane SLIS, IU. BioPerl Perl and now BioPerl -- Why ??? Availability Advantages for Bioinformatics.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
MVC WITH CODEIGNITER Presented By Bhanu Priya.
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Biojava org.biojava.bio org.biojava.bio.dist org.biojava.bio.dp org.biojava.bio.dp.onehead org.biojava.bio.dp.twohead org.biojava.bio.gui org.biojava.bio.gui.sequence.
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED INTERSTAGE BPM ARCHITECTURE BPMS.
Not Your Father’s Laserfiche AA101 Michael Allen.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Cut down on the time it takes employees to process invoices using Square 9’s SmartSearch integration with Microsoft Dynamics GP. SmartSearch allows invoice.
GeneConnect Use Cases and Design August 3, GeneConnect Database IDs are linked by Direct Annotation, Inferred Annotation, or Sequence Alignment.
Best 3 Software Development Languages. Hibernate Training Hibernate is a high-performance object-relational mapping tool and query service. Hibernate.
CS 540 Database Management Systems
Overview: Fedora Architecture and Software Features
…and web frameworks in general
DUCKS – Distributed User-mode Chirp-Knowledgeable Server
Introduction to Database Systems
Sequence Based Analysis Tutorial
Genome Workbench Chuong Huynh NIH/NLM/NCBI New Delhi, India
…and web frameworks in general
ICOM 5016 – Introduction to Database Systems
SDMX IT Tools SDMX Registry
Presentation transcript:

BioJava in 2002 An Open-Source Java Library for Bioinformatics (Matthew Pocock, BioJava Consulting LTD)

What is BioJava? Java code (Java2 required – 1.2 and higher) Open-Source Bioinformatics Library for building Applications Sequence Centric (we’d love to do more) Part of the Open Bioinformatics Foundation (OBF) Drop biojava.jar into your CLASSPATH & go

Where is BioJava? #biojava on irc.openprojects.net

Who is BioJava? 35+ Developers in most continents and time- zones Core team >5 individuals Ever expanding user group

A look at some API Stuff

What’s Been There for a While? Sequences with hierarchical features Sequence databases Sequence IO – Various sequence formats (embl, genbank, gff, swissprot…) – Object model can be bypassed for high-performance scanning Probability distributions over symbols and Dynamic programming toolkit Blast Parsers

What’s Reasonably New? TagValue parser API Sequence Search APIs – Interoperable with BioJava XML-based parsers for many common sequence search algorithms Pure-Java SSAHA implementation Bit-packed sequence storage Taxonomies Literature References Phred

What’s Recently Improved? Gap handling – Consistent algebra for representing ambiguities (e.g. n), compound symbols (e.g. codons) and gaps DAS Client is now very robust – Distributed sequence API allows DAS-like distributed sequence databases to be easily built and implemented More ‘framey’ annotation bundles Sequence Rendering – Looks much better now – Handles ‘dotter-style’ 2d rendering We now actually write JUnit Tests!

Java 1.4-reliant Source Java 1.4 offers APIs that are really useful for Bioinformatics – Logging – NIO interfaces for fast IO and raw data access – Regular expressions – Cascading Exceptions Biojava code relying on 1.4 APIs are conditionally built – SSAHA implementation – Some parsers and handlers for TagValue – Restriction enzyme digests

OBDA and Fun Trips Sponsored by O’Reilly and Electric Genetics Developers attended a two-part Hackathon in Tuscon, AZ, USA and Cape Town, South Africa Representatives from BioJava, BioPerl, BioPython, BioRuby, Ensembl, Emboss and others We hammered out and implemented a range of standards designed from the ground up to be – Interoperable between the Bio* projects – Relatively easy to implement from scratch We drank lots of red wine

OBDA Support BIOCORBA – corba sequence interfaces BioSQL – relational tables and standard semantics for storing sequences BioFetch – cgi-bin-based sequence fetching XEMBL – xml-based sequence fetching Bio Directories – configuration file for resolving resources Flat-file Indexing – fetch records by ID and secondary ID from multiple ASCII files

Things We’d Like To Do in the Near Future Support non-DNA areas of Bioinformatics – Cladistics, evolutionary trees, clusters – Expression data – Proteomics – Networks/pathways – Biochemical reactions Integrate pre- and post-1.4 exception systems Modify the change notification system – Better synchronization and transaction support – Easier to optimize events that don’t have listeners – More robust handling of event cascades

What Will We See in BioJava 2? Pervasive use of Ontologies – Storing annotating data – Definition of processing pipelines (e.g. customizing parsers) – Bindings between BioJava interfaces and external data sources Das, biosql, biocorba – Pervasive querying making any BioJava application an Object Data Store with easy routes for data-providers to optimize searches Much more code generation – Push most repetitive code into code generators – Auto-generate much of the event notification web Much better transactionallity Reduce implementation cost for developers Expose any/all BioJava instances through SOAP Naming and Directory Services

And the Biggest Change of All? Make the library accessible to casual developers for writing throw-away scripts as well as system architects – Documentation – Tutorials – Training – Utility classes (e.g. SeqIOTools)

Some Contributors Brian GilmanBrian KingBrian Osborne Colin Hardman David H. Klatte David HuenDavid WaringGerald Loeffler Greg CoxHanning Ni Jason StajichKalle Näslund Keith JamesKim Rutherford Lei Lai Mark Schreiber Martin Senger Mathieu Wiepert Matthew Pocock Michael Jones Mike JonesNimesh Singh Ron KuhnSamiul Hasan Simon Brocklehurst Stuart Johnston Thad WelchThomas Down Tim DilksO|B|F