Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB
Overview Current system (UCSB) –Status of collection –Capabilities –Architecture Current system (CMU) –Status of collection –Capabilities –Architecture Joint system under development –Capabilities –Architecture Future –Layered databases –Distributed databases
Current collection Retinal –Confocal microscope –EM (Electron micrograph) TypeCurrentBacklogRate/yExpected 4YrsTotal size Retinal EM ,00020GB Retinal confocal P ,00010GB Retinal confocal Z ,00065GB Microtubule light ,00012GB Microtubule AFM GB Microtubule DIC002.7M10M10TB Microtubule –Light –Atomic Force Microscopy –DIC/Nomarski
Current capabilities Import process Image and meta storage Web access and browsing Limited access by content
Screenshots (browsing)
Screenshots (search)
Screenshot (metadata edit)
Screenshot (retina meta)
Current architecture Metadata Database implementation Front end implementation Image import API Software and hardware infrastructure
Metadata Standard (image types, parameters) –File, size, type, tiff data, channel info, etc. Retinal –Visible cells –Antibody labeling –Experimental conditions –Researcher Microtubule –Track (hand captured) AFM –Machine parameters Metadata sources –Researcher –Annotated excel files –Proprietary image formats
Database implementation MySql First generation schema –image parameters File, size, type, tiff data, etc. –Metadata Experimenter, condition, antibodies, tissue, notes, etc.
Front end Apache, Php, Javascript Import proprietary image types Browse images Search by metadata Search by similarity Multi user and release protection
Image and metadata import Excel parser for metadata Image import library –Image Format API and C/C++ library for database and client applications were developed. –Currently supported proprietary image formats: Metamorph Stack, Fluoview TIFF, BioRad PIC, PSIA TIFF, Nanoscope, + common: JPEG, TIFF, BMP, PNG…
Hardware and software infrastructure Hardware –Dell Server with dual Intel Xeon cpu at 2.4Ghz –140GB scsi hard drive set up as RAID 1 –Gigabit network switch Software –Linux, version Fedora 2 –Apache Web server with PHP, PERL and graphical modules –MySQL Database server
Overview Current system (UCSB) –Status of collection –Capabilities –Architecture Current system (CMU) –Status of collection –Capabilities –Architecture Joint system under development –Capabilities –Architecture Future –Layered databases –Distributed databases
Overview Current system (UCSB) –Status of collection –Capabilities –Architecture Current system (CMU) –Status of collection –Capabilities –Architecture Joint system under development –Capabilities –Architecture Future –Layered databases –Distributed databases
Motivation Common schema between UCSB and CMU Support greater functionality –Analysis and interpretation tools –Ground truth –Semantics –Uncertainty –Complex features and distance metrics MPEG-7 features Other features –Querying and relevance feedback
Capabilities Image and metadata storage Web access and browsing Access and search by content Import/Export –Streamlined XML import/export for external tools Schema extensions –Image5d, semantic, uncertainty, analysis Image processing modules and tools
Infrastructure – Interchange XML Unified interchange XML format is being developed for database feeding and extraction procedures, external client application interaction and database intercommunication. DB XML External clients Image library External DB interchange Import/export remote access Ground truth tools Image processing tools
Ground truth acquisition tools Image processing and infrastructure teams are developing universal “ground truth” collection tools able to retrieve data from data-base and feed user defined information back to the database. The main communication vehicle is XML interchange format. At the current stage stand alone tools are being developed and tested that later on will be grouped in the universal application able to communicate directly to the data-base. +
Image processing API Fast development of image processing tools concentrated on problem solving. API provides simple access to multi-channel image and mask information. Allows progress output, acquisition of user defined parameters and automatically created filter preview. Example of API usage: Noise removal for Fluoview images result noise input
Semantic data modules Integration of current research in automatic image analysis: –Cell identification –Layer detection –Cell counting –Microtubule detection and tracking –Microtubule dynamicity and global characterization
Modeling uncertainty Uncertain identification/analysis –Simple probability (e.g., 0.8) –“Is this a rod bipolar cell?” Imprecise location/extent/count –90% accuracy in cell count –Line segment (single or sequence), polygon Identified by a sequence of points Each point Gaussian Store mean x, mean y, and standard deviation –Circle Center Gaussian point, as above Radius mean r and standard deviation
Schema Image5d Analysis and interpretation tools –Quantitative data generation –Semantic Labeling Experimental description Shape and geometry Domain knowledge –Ground truth –Semantic objects Uncertainty Features and distance metrics MPEG-7 features Other features Querying and relevance feedback
Schema (image5d) 5d images Image is a set of bit- planes Group planes by which dimensions vary Permits –Multiple formats –Caching
Schema (semantic objects) Capture semantics Capture uncertainty Type of object : confidence Position of object: Gaussian domain
Schema (analysis and features) Capture provenance Support type checking Support feature substitution
Hardware and software components Hardware requirements –Same as original system Software –Postgresql backend –JSP / JSF front end Migrate php/javascript current code into components
Architecture Web Page UI Generation View MenuTable Semantic Interface DB Storage Image Cell Dynamic JSF Components Programmable Image API Model API Object Relational (Postgresql) HTML XML
Overview Current system (UCSB) –Status of collection –Capabilities –Architecture Current system (CMU) –Status of collection –Capabilities –Architecture Joint system under development –Capabilities –Architecture Future –Layered databases –Integration with other databases BIRN OME metadata and schema exchange
Layered database Overlay model (interpretation) on image (raw) data Multiple interpretations of data URI references between databases Pro: Logical distinction, multiple interpretations, flexible implementation
BIRN (Biomedical Informatics Research Network) Goals: –Link multiple databases with different schemas, maintained at different research institutions 19 universities, 26 research groups Current collection –Three test beds centered around brain imaging of human neurological disorders and associated animal models: Functional BIRN Morphometry BIRN Mouse BIRN
Integration with BIRN Databases at UCSB/CMU Centers can be integrated into the BIRN federation UCSB/CMU infrastructure supports –Extensive metadata for images –Standard XML interchange format for 5d images –Computational tools to refine data Web based visualization and analysis tools We need to: –Translate UCSB/CMU Schema to F-logic (Knowledge-based mediation) –Link UCSB/CMU dataset to UMLS (Unified Medical Language System) ontology –Reference a common spatial framework Standard atlas coordinate system, e.g., SMART Atlas
OME Open Microscopy Environment –a set of software that interacts with a database to manage images, image meta data, image analysis and analysis results Designed to perform as a local system Integration with OME –Adapt OME XML image interchange mechanism –Adapt the database oriented modular analysis approach of OME
Conclusion Built prototype and collected ~4000 images –Being used internally Concurrent work on 2 nd generation system –Image loading –Integration of tools –New front end