Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Himanshu Grover.

Slides:



Advertisements
Similar presentations
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Advertisements

Relational Database Alternatives NoSQL. Choosing A Data Model Relational database underpin legacy applications and meet business needs However, companies.
NoSQL Databases: MongoDB vs Cassandra
In 10 minutes Mohannad El Dafrawy Sara Rodriguez Lino Valdivia Jr.
Regions of Interest.  What’s in a ROI?  Use cases  Requirements  Current Storage System  Problems  Alternative Storage.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
MongoDB Introduction © Zoran Maksimovic
Graph databases …the other end of the NoSQL spectrum. Material taken from NoSQL Distilled and Seven Databases in Seven Weeks.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
What is MongoDB? Developed by 10gen It is a NoSQL database A document-oriented database It uses BSON format.
Jeff Lemmerman Matt Chimento Medtronic Confidential 1 9th Annual CodeFreeze Symposium Medtronic Energy and Component Center.
A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria Supervisor: Dr. Jian Yang.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
Systems analysis and design, 6th edition Dennis, wixom, and roth
1 Overview of Databases. 2 Content Databases Example: Access Structure Query language (SQL)
NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
WTT Workshop de Tendências Tecnológicas 2014
Goodbye rows and tables, hello documents and collections.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 4th Edition Copyright © 2009 John Wiley & Sons, Inc. All rights.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book.
© Copyright 2013 STI INNSBRUCK
MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built.
Introduction to MongoDB
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
MongoDB First Light. Mongo DB Basics Mongo is a document based NoSQL. –A document is just a JSON object. –A collection is just a (large) set of documents.
03 | Express and Databases
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Nov 2006 Google released the paper on BigTable.
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
CIS 250 Advanced Computer Applications Database Management Systems.
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 5th Edition Copyright © 2015 John Wiley & Sons, Inc. All rights.
Introduction to Databases Angela Clark University of South Alabama.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Introduction to MongoDB. Database compared.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
NoSQL databases A brief introduction NoSQL databases1.
Big Data Yuan Xue CS 292 Special topics on.
Context Aware RBAC Model For Wearable Devices And NoSQL Databases Amit Bansal Siddharth Pathak Vijendra Rana Vishal Shah Guided By: Dr. Csilla Farkas Associate.
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
SQL Basics Review Reviewing what we’ve learned so far…….
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
1 Analysis on the performance of graph query languages: Comparative study of Cypher, Gremlin and native access in Neo4j Athiq Ahamed, ITIS, TU-Braunschweig.
COMP 430 Intro. to Database Systems MongoDB. What is MongoDB? “Humongous” DB NoSQL, no schemas DB Lots of similarities with SQL RDBMs, but with more flexibility.
Dive into NoSQL with Azure Niels Naglé Hylke Peek.
NoSql An alternative option in the DevEvenings ORM Smackdown Tarn Barford
CS 405G: Introduction to Database Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
NoSQL: Graph Databases
DBSI Teaser Presentation
and Big Data Storage Systems
CS122B: Projects in Databases and Web Applications Winter 2017
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
NOSQL.
NOSQL databases and Big Data Storage Systems
NoSQL Databases Antonino Virgillito.
relational thoughts on NoSql
CMPE 280 Web UI Design and Development March 14 Class Meeting
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Working with GEOLocation Data
Presentation transcript:

Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Himanshu Grover

Big Data in Biology/Healthcare 3 Vs: Volume Velocity Variety (richness/complexity; Structured, semi-structured, unstructured) Examples: Omics technologies (Proteomics, genomics / NGS, metabolomics etc.) Clinical data – EMRs (patients, providers, medications, procedures, symptoms, diagnoses, financials) Need simple and advanced analytics – exploratory analyses & knowledge discovery, complex visualizations, reporting, operations etc.

Utilization: Persistence + Analytics File System Vs. Databases 1.Ex. ASCII, semi-structured (xml), binary 2.Overhead in Parsing 3.No Indexing/Search/Filtering 4.Too large 1.Ex. ASCII, semi-structured (xml), binary 2.Overhead in Parsing 3.No Indexing/Search/Filtering 4.Too large 1.Efficient storage and access 2.Analytics 1.Efficient storage and access 2.Analytics

Example: Proteomics {'experimentName' : ' ', 'filename' : 'GPM mgf', 'scan’: 3 'mz' : , 'expPeaks' : [ { 'mz' : , 'intensity' : 14.0}, { 'mz' : , 'intensity' : 23.0}, { 'mz' : , 'intensity' : 19.0}, { 'mz' : , 'intensity' : 22.0},... ] }... Identifier Peaks Peptide Info MS/MS Spectrum Information

{'experimentName' : ' ', 'filename' : 'GPM mgf', 'scan’: 3 'mz' : , 'expPeaks' : [ { 'mz' : , 'intensity' : 14.0}, { 'mz' : , 'intensity' : 23.0}, { 'mz' : , 'intensity' : 19.0}, { 'mz' : , 'intensity' : 22.0}, } … Identification Peaks Peptide Info Relational Databases id/pkExp Name File Name scan … Peaks Impedance Mismatch Spectrum Table id/pkExp Name File Name scan … Peak1Peak2… Difficulty running on Clusters id/pkExp Name File Name scan … id/pkFkmzint v1 (un-normalized) v2 (un-normalized) Spectrum Table Peaks Table v3 (normalized)

Spectrum Ex. cont’d Example of a 1-to-many relationship Un-normalized schema redundancy and disk wastage non-uniformity (ex. different numbers of peaks per spectrum) query ability varies in blob storage Normalized schema effective, but requires joins Other examples (relationship types?: Proteins-to-peptides; genes to proteins; patients-to- diseases

Not Only SQL (NOSQL) / Non-Relational Key features: – Aggregate Orientation, i.e. closely related data, that is accessed as a unit (aggregate), leads to faster read/write operations – Facility for rich structure – Easier to program data access (application development productivity) – Application/context-specific, unlike generic relational data model (database as an integration point) Representation: – Key-value Stores (Ex.Riak, Redis, etc.) – Column Family Stores (Ex. Cassandra, HBase etc.) – Document-oriented Stores (Ex. MongoDB, CouchDB etc.) – Graph databases (Ex. Neo4J etc.)

Why MongoDB: Flexible Collections (≈Tables) of Documents (≈Rows) Documents = set of key-value pairs (Ex. Python Dict, Java HashMap etc.) doc={ ‘_id ’:, :, : { :, : }, : [ { : }, { : val 42 }, …] } Simple ’experimentName' : ' ’ Embedded/Hierarchical List (non-uniform) Ex. ‘peaks' : [{'mz’ : 792.6, 'int' : 14.0}, { 'mz' : 874.6,'int' : 23.0},…] Non-uniform and dynamic

Collection: SpectrumArchive {'_id' : ObjectId('52c ded5b32082bb5'), 'experimentName' : ' ', 'filename' : 'GPM mgf', 'scan' : 1749 'mz' : , 'intensity' : 0.0, 'rt' : 0.0, 'expPeaks' : [ { 'mz' : , 'intensity' : 14.0}, { 'mz' : , 'intensity' : 23.0}, { 'mz' : , 'intensity' : 19.0}, { 'mz' : , 'intensity' : 22.0},... ] }...

Data Modeling: Design Choices Document structure (Entities/Aggregates) – Data Access patterns => What is accessed together must go together Relationships – 1-to-few, 1-to-many, many-to-many – Embedding (de-normalized) vs. Referencing (normalized) – cardinality of relationship may be unbounded and/or quite large (for some cases) Document growth issue

Why MongoDB: Distribution Aggregates are a natural unit of interaction as well as distribution – no notion of joins – Scale out storage and processing Two forms – Replication – Sharding MongoD - 1 MongoD - 2 MongoD - 5..… MongoS Automatic data distribution Seamless distributed query processing and analytics Automatic data distribution Seamless distributed query processing and analytics Application Code

Why MongoDB: Other features Powerful query language and operators – including ability to look into nested/embedded documents and arrays/lists Secondary indexing Performant and extensive analytics operators over distributed databases

Demo BasicC – Create R – Read U – Update D – Delete -From mongo shell -From PyMongo driver

Some Limitations of MongoDB No built-in atomic transactions across multiple documents or collections. Difficult to do lots of many-to-many relationships (use graph databases)

Some Resources NoSQL Distilled – Broad discussion of diff. categories of NoSQL databases MongoDB specific: – MongoDB website user manual, public talks/presentations – MongoDB University: “MongoDB for Developers” course uses Python – MongoDB – The Definitive Guide ()

Take home No one size fits all Choice depends on application requirements