Attie Bioinformatics Server Redesign Andrew Broman & Brian Yandell October 2010 October 2010 Attie Bioinformatics Server Redesign
Attie Bioinformatics Server Redesign overview scanone tool web page for biologists Attie islet mRNA data minimal functionality for now system architecture scanone and other tools as services authentication and authorization access to and use of databases interface to R and other analysis engines collaboration with off-site scientists October 2010 Attie Bioinformatics Server Redesign
Attie Bioinformatics Server Redesign big picture user user services page security authenticate scanone service authorize October 2010 Attie Bioinformatics Server Redesign
Attie Bioinformatics Server Redesign security modules authenticate: who is this? authorize: what can this person do/see? off-the-shelf tools well tested popular easy to implement authenticate & authorize are service units model-view-control architecture October 2010 Attie Bioinformatics Server Redesign
Attie Bioinformatics Server Redesign scanone service unit Dataset: UCLA Tissue: liver Task: scanone plot summary MongoDB R analysis engine October 2010 Attie Bioinformatics Server Redesign
Attie Bioinformatics Server Redesign service philosophy each service is self-contained, modular IT team designed or provided by other locations each service can contain other services use URLs to find data, code, etc. could be anywhere allows expansion to multiple centers REpresentational State Transfer (REST) key design idiom stateless client-server architecture web services are resources identified by URLs RESTful Web Services (2007) by Richardson and Ruby October 2010 Attie Bioinformatics Server Redesign
benefits of service architecture decoupled/modular easier to create new tools easier to test & modify isolated parts of the system scalable any isolated service can be moved to a new server no need to alter to the rest of the system enables remote mirrors to be transparent to user understandable architecture easy to grasp isolated services easy to understand easily to maintain/extend individual services October 2010 Attie Bioinformatics Server Redesign
Attie Bioinformatics Server Redesign MongoDB document-oriented database system not relational (MySQL, Oracle, …) DB is collection of documents each document can have user-specified parts accommodates huge data files quick access to desired components no schemas required: flexible data formats GenePattern has only two data formats October 2010 Attie Bioinformatics Server Redesign
Attie Bioinformatics Server Redesign data and metadata metadata describes what data are provenance/history of data creation/acquisition type of data, size of data, other characteristics small “flat” file template to design new data data can be raw or processed large data object save time/space by passing metadata to R access data only as needed October 2010 Attie Bioinformatics Server Redesign
scanone service MVC components view controller modify view Dataset: UCLA Tissue: liver pass details plot summary Task: scanone return objects pass details MongoDB model R analysis engine October 2010 Attie Bioinformatics Server Redesign
Attie Bioinformatics Server Redesign project timeline task duration completion date scanone HTML mockup now summer islet scanone results database integration 1 wk 15 oct merge annotation, values (mRNA) expect speed, organization benefits multiple tissues 2 wks 1 nov tissues plus clinical MVC service architecture 2 wks 15 nov security integration 1 wk 1 dec authenticate, authorize services communication between services multiple services 4 wks 1 jan means, hotspots, qtlnet multiple projects 4 wks 1 feb UCLA, Florida, yeast October 2010 Attie Bioinformatics Server Redesign
MVC service architecture plans view (what you see) extract from HTML mockup modular redesign controller (how information is passed) extract Ruby-on-Rails from HTML mockup add communication features (RESTful API) model (how tasks are performed) little modification needed October 2010 Attie Bioinformatics Server Redesign
analyst pipeline integration R analysis engine raw data processed data get put MongoDB October 2010 Attie Bioinformatics Server Redesign
analyst pipeline details R engine analysis libraries housed at github.org CHTC cluster offloads major workload get/put functions automate with periodic revision standardized metadata sheet owner, project, tissue, etc. dropdown menu of data service type scanone, peaks, causal negotiated by IT team each data type will have MVC service architecture(s) October 2010 Attie Bioinformatics Server Redesign
Attie Bioinformatics Server Redesign future enhancements ideas not fully formed yet use sockets to connect objects save on I/O: don’t pass large objects, just open them avoid CSV, PDF, PNG unless user wants them plot, summary, result tables from R operations model passes socket information to tools connect R and MongoDB database directly controller passes socket info from model to view display results by opening RESTful resource October 2010 Attie Bioinformatics Server Redesign