Regions of Interest.  What’s in a ROI?  Use cases  Requirements  Current Storage System  Problems  Alternative Storage.

Slides:



Advertisements
Similar presentations
Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Advertisements

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
In 10 minutes Mohannad El Dafrawy Sara Rodriguez Lino Valdivia Jr.
Reporter: Haiping Wang WAMDM Cloud Group
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
What is MongoDB? Developed by 10gen It is a NoSQL database A document-oriented database It uses BSON format.
A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria Supervisor: Dr. Jian Yang.
A Study in NoSQL & Distributed Database Systems John Hawkins.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
IST Databases and DBMSs Todd S. Bacastow January 2005.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
Systems analysis and design, 6th edition Dennis, wixom, and roth
ZhangGang, Fabio, Deng Ziyan /31 NoSQL Introduction to Cassandra Data Model Design Implementation.
WTT Workshop de Tendências Tecnológicas 2014
Goodbye rows and tables, hello documents and collections.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Himanshu Grover.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built.
NOSQL Implementation and examples Maciej Matuszewski.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Nov 2006 Google released the paper on BigTable.
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
CIS 250 Advanced Computer Applications Database Management Systems.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
Introduction to Databases Angela Clark University of South Alabama.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
NoSQL databases A brief introduction NoSQL databases1.
Context Aware RBAC Model For Wearable Devices And NoSQL Databases Amit Bansal Siddharth Pathak Vijendra Rana Vishal Shah Guided By: Dr. Csilla Farkas Associate.
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
1 Analysis on the performance of graph query languages: Comparative study of Cypher, Gremlin and native access in Neo4j Athiq Ahamed, ITIS, TU-Braunschweig.
Dive into NoSQL with Azure Niels Naglé Hylke Peek.
Solr Power FTW Alex #solrnosql. What Will I Cover? Who I am What Bazaarvoice does SOLR and NoSQL Can SOLR handle 20K queries per second?
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
NoSQL: Graph Databases
Neo4j: GRAPH DATABASE 27 March, 2017
Databases and DBMSs Todd S. Bacastow January
CS 405G: Introduction to Database Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
NoSQL: Graph Databases
and Big Data Storage Systems
Column-Based.
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Modern Databases NoSQL and NewSQL
NOSQL.
Christian Stark and Odbayar Badamjav
NOSQL databases and Big Data Storage Systems
NoSQL Systems Overview (as of November 2011).
1 Demand of your DB is changing Presented By: Ashwani Kumar
NOSQL and CAP Theorem.
NoSQL Databases Antonino Virgillito.
CSE 482 Lecture 5: NoSQL.
relational thoughts on NoSql
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Presentation transcript:

Regions of Interest

 What’s in a ROI?  Use cases  Requirements  Current Storage System  Problems  Alternative Storage

 ROI  Geometry  Measurements  ROI on Channel  Annotations ▪ ROI ▪ Measurement ▪ Links

 User created ROI  Measurement tools  HCS generated ROI  Automatic  External  External analysis  Particle Tracking  Other  Templates  ROIs without images

 Human generated  More interactions ▪ Merge, Propagate, Split, Delete  Measurements ▪ Geometry ▪ Intensity ▪ Path  ROI/ROI Links  Tags mostly on ROI  Write Many/Read Many

 HCS Generated ROI  Lots of ROI  Attached to Channel  Measurements Attached ▪ Multiple measurements  Tags on ROI, Measurements ▪ Analysis, results and meta.  Write Once, Read Many

 External Tool can Generate ROI (+ scripts)  Can be tagged  Links (ROI/ROI, ROI/Image)  Results can be in any format

 ROI need not be attached to image  Template to define other ROI

 N-Dimensional Data  Storage of Image data simple  ROI more complex ▪ Database entry, file format  We don’t just want to store in HDF

 Database  ROI  ROI Annotations  PyTables  Mask ROI  Measurements

 Pytables  ROI are heterogeneous  Concurrency  Python behind a core service call  Measurements are optimal  Tagging is an issue ▪ Inside file ▪ Multiple annotations reported to be slow

 ROI can be stored in database  Mask data can be an issue  Tagging in RBD not best  Many more annotations than we’d like  Link to external source for measurements

 Key-Value Pair Stores  Berkeley DB  Project Voldermort  Tokyo Cabinet  Document DB  MongoDB  CouchDB  Graph DB  Neo4J  InfoGrid  Table DB  Cassandra  Hypertables  HBase

 Other opinions on the storage solutions  MongoDB vs CouchDB, Cassandra,.. MongoDB vs CouchDB, Cassandra,..  CouchDB vs MongoDB CouchDB vs MongoDB  Pros and cons of MongoDB Pros and cons of MongoDB  Digg on Cassandra Digg on Cassandra  What is a supercolumn What is a supercolumn  Cassandra talk Cassandra talk  Indexing nodes in Neo4J Indexing nodes in Neo4J

 Document Database  NOSQL movement  Schemaless  No Tables ▪ Collections of like data  No Joins ▪ Document is equivalent of row of data ▪ Distributed file system (GridFS)

Pros  It has bindings to numerous languages (C++, C#, Java, Python,...).  Allows storage, indexing, linking of any user data  Annotations are now very easy, efficient  Has mechanisms for schema upgrade  Dynamic Queries  Replication  Sharding.  Map-Reduce framework.  Fast.  GridFS is a distributed file storage mechanism within Mongo.  Easy to install Cons  Schemaless, data integrity will need to be worked on.  Graph structures not inherently supported.

DEPLOYMENTS  SourceForge  BusinessInsider  New York Times  Disqus

Human Interaction Merge, Propagate, Split ✓ Geometry ✓ Intensity ✓ Path ✓ ROI/ROI Links ✓ Tags ✓ HCS Many ROI ✓ Tags on ROI ✓ Tags on Measurement ✓ Tables of Measurements ✓ Externally Generated Tags ✓ ROI/ROI Links, ROI/Image Links  Many formats, unknown types ✓ Other N-Dimensional ROI ✓ Hierarchical Structures ✓

connection = Connection(); db = connection['databaseName']; collection = db.['collectionName']; collection.insert({"tags" : [ ], "label" : “MyROI”, "shapes" : [{ "tags" : [{"tag" : "foo1", "namespace" : "bob"}], "rx" : 17, "ry" : 17, "label" : null, "cy" : 75, "cx" : 3, "t" : 0, "z" : 0, "type" : "Ellipse", "id" : 3 }, { "tags" : [{"tag" : "foo2", "namespace" : "bob"}], "rx" : 10, "ry" : 16, "label" : null, "cy" : 82, "cx" : 45, "t" : 0, "z" : 0, "type" : "Ellipse", "id" : 5 }], "type" : "Roi", "id" : 565 })

connection = Connection(); db = connection['databaseName']; collection = db.['collectionName']; collection.find({"shapes.tags.tag":'/.*mitosis.*/i'}) connection = Connection(); db = connection['databaseName']; collection = db.['collectionName']; collection.find({”shapes.tags.tag”:”foo1”,”tags.tag”:”foofoo”}) Find roi with tag foofoo and shapes with tag foo1 Find roi shapes with tag containing mitosis

 Graph Database  use nodes to represent objects  User specifies relationship between nodes  Allows complex traversal of node structures

PROS  Handles graph structures nicely  Transactional  Supported by Gremlin Gremlin Gremlin  Native RDF rdf-sail/ rdf-sail/  Easy to install CONS  No C++ language binding.  Not distributed.  Tables are not so easily modeled.  Difficult to query on node contents

DEPLOYMENTS  The Swedish Defence forces  Windh Technologies  Flextoll

public enum OMERORelations implements RelationshipType { ASSOCIATE, DERIVE, AGGREGATE, COMPOSE } Node image = neo.createNode(); image.setProperty("IObject",imageI); image.setProperty("id",imageI.getId().getValue()); image.setProperty("name",imageI.getName().getValue()); Node derivedImage = neo.createNode(); derivedImage.setProperty("IObject",derivedImageI); derivedImage.setProperty("id",derivedImageI.getId().getValue()); derivedImage.setProperty("name",derivedImageI.getName().getValue()); Relationship relationship = image.createRelationshipTo( derivedImage, OMERORelations.DERIVE ); relationship.setProperty("type","ROI"); relationship.setProperty("operation","crop"); relationship.setProperty("roi",cropRoiI);

Human Interaction Merge, Propagate, Split ✓ Geometry  Intensity  Path ✓ ROI/ROI Links ✓ Tags  HCS Many ROI ✓ Tags on ROI ✓ Tags on Measurement ✓ Tables of Measurements  Externally Generated Tags ✓ ROI/ROI Links, ROI/Image Links ✓ Many formats, unknown types  Other N-Dimensional ROI  Hierarchical Structures ✓

Implementation of Google’s BigTables, is a complex implement of a key/value store to represent a table. A sophisticated toolset is required to get the most out of this solutions, for instance Google has created sawzall to query this system. Digg have released a language to work with Cassandra called LazyBoy. sawzall LazyBoy Works by creating a table which has columns linked together called column families, like data will exist in the same column family (Ellipse ROI).

Pros  Quick  Handles heterogeneous data well  Different rows can have different columns  Can manage distributed data  Map/Reduce  Focus on writes not reads  Scales nicely  Easy to Install Cons  Not simple to work with  Building hierarchical structures  Sorting  Querying ▪ Ad Hoc Queries are bad, Digg still use MySQL for certain queries.  Have to manage secondary indexes, (K/V)  Version 0.5

Deployments  Facebook (MAYBE!!)  Digg

Human Interaction Merge, Propagate, Split ✓ Geometry ✓ Intensity ✓ Path  ROI/ROI Links  Tags ✓ HCS Many ROI ✓ Tags on ROI ✓ Tags on Measurement ✓ Tables of Measurements ✓ Externally Generated Tags ✓ ROI/ROI Links, ROI/Image Links ✓ Many formats, unknown types  Other N-Dimensional ROI ✓ Hierarchical Structures 

Implementation of Google’s BigTables, is a complex implement of a key/value store to represent a table. A sophisticated toolset is required to get the most out of this solutions, for instance Google has created sawzall to query this system. HyperTable has a query language call HQL. sawzall Works by creating a table which has columns linked together called column families, like data will exist in the same column family (Ellipse ROI).

Pros  Quick  Handles heterogeneous data well  Different rows can have different columns  Can manage distributed data  Map/Reduce  Scales nicely  Easy to Install Cons  GPL License  Building hierarchical structures  Docs are weak  HQL works for simple queries only  Map/Reduce for other work  limit of 255 column families  Secondary keys

Deployments  Rediff  Zvents

Human Interaction Merge, Propagate, Split ✓ Geometry ✓ Intensity ✓ Path  ROI/ROI Links  Tags ✓ HCS Many ROI ✓ Tags on ROI ✓ Tags on Measurement ✓ Tables of Measurements ✓ Externally Generated Tags ✓ ROI/ROI Links, ROI/Image Links ✓ Many formats, unknown types  Other N-Dimensional ROI ✓ Hierarchical Structures 

 Why do we have an RDMS  We don’t normalise the data  Each import will normalise on: ▪ Image, ObjectiveSettings, LogicalChannel, LightSettings, Detector Settings.  Object Penalty  Difference between normalisation and view