Open Source distributed document DB for an enterprise

Slides:



Advertisements
Similar presentations
Proposal: Model-Driven SAL for the OpenDaylight Controller
Advertisements

New Release Announcements and Product Roadmap Chris DiPierro, Director of Software Development April 9-11, 2014
Database System Concepts and Architecture
Database Architectures and the Web
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
Distributed Database Management Systems
Apache Jakarta Tomcat Suh, Junho. Road Map Tomcat Overview Tomcat Overview History History What is Tomcat? What is Tomcat? Servlet Container.
Understanding and Managing WebSphere V5
Talend 5.4 Architecture Adam Pemble Talend Professional Services.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
Opensource for Cloud Deployments – Risk – Reward – Reality
Architecture Of ASP.NET. What is ASP?  Server-side scripting technology.  Files containing HTML and scripting code.  Access via HTTP requests.  Scripting.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Module 1: Server Roles and Initial Configuration Tasks
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
— Build your own enterprise-class PaaS platform. Master Cloudset Cloudset1 Cloudset Resource Pool cloud Dedicated resource can be assigned to a cloudset.
Fundamentals of Database Chapter 7 Database Technologies.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Microsoft SharePoint Server 2010 for the Microsoft ASP.NET Developer Yaroslav Pentsarskyy
IMDGs An essential part of your architecture. About me
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:
Database Architectures Database System Architectures Considerations – Data storage: Where do the data and DBMS reside? – Processing: Where.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Distributed Computing Systems CSCI 4780/6780. Distributed System A distributed system is: A collection of independent computers that appears to its users.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Dispatching Java agents to user for data extraction from third party web sites Alex Roque F.I.U. HPDRC.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Ahmet Fuat – Bahçe ş ehir University İleri Seviyede Oracle Ön Bellek Mekanizması (Oracle Coherence)
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Your Data Any Place, Any Time Beyond Relational. Overview of Beyond Relational Applications Today Beyond Relational Feature Overview Whirlwind Feature.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
The Holmes Platform and Applications
Introduction to Oracle Forms Developer and Oracle Forms Services
Chapter 9: The Client/Server Database Environment
InGenius Connector Enterprise Microsoft Dynamics CRM
Introduction to Distributed Platforms
Netscape Application Server
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Introduction to Oracle Forms Developer and Oracle Forms Services
LOCO Extract – Transform - Load
GWE Core Grid Wizard Enterprise (
Spark Presentation.
Introduction to Oracle Forms Developer and Oracle Forms Services
The Client/Server Database Environment
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
The Client/Server Database Environment
In-Memory Performance
PHP / MySQL Introduction
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
What is the Azure SQL Datawarehouse?
Mapping the Data Warehouse to a Multiprocessor Architecture
Oracle Architecture Overview
Introduction to Databases Transparencies
Overview of big data tools
Software models - Software Architecture Design Patterns
Introduction to Teradata
Technical Capabilities
Web Application Server 2001/3/27 Kang, Seungwoo. Web Application Server A class of middleware Speeding application development Strategic platform for.
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Database System Architectures
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Mark Quirk Head of Technology Developer & Platform Group
Presentation transcript:

Open Source distributed document DB for an enterprise technical overview

PROJECT BAGRI: OVERVIEW Distributed system for storage and real-time processing of high volumes of semi-structured data documents Built on top of distributed cache solution like Coherence, Hazelcast, Infinispan, etc Designed to comply with enterprise level requirements providing high availability/fault tolerance/horizontal scalability/ACID transactions capabilities right out of the box Uses XQuery language for data processing Provides standard XQJ driver for client access Open sourced under Apache 2 license

THE MOST VALUABLE FEATURES Multi-document ACID transactions handling via multi-level concurrency control Does not require pre-registered schema, collects meta-data on the fly Can be used for any kind of semi-structured data processing XQuery requests are being transformed and optimized into queries against distributed cache Queries are performed in parallel on cache nodes and partitions Responses are streamed to clients asynchronously preventing client resources overload

UNDERLYING DISTRIBUTED CACHE PLATFORM Standalone Java SE app with built-in clustering capabilities Data distributed between cache partitions by consistent hashing algorithm applied on data keys Data can be accessed via Map interface or queried by predicates Used both as data and computation grid for massive parallel data processing Data processed in-place on the cache nodes by distributed tasks submitted by client; queries are also distributed Data can be loaded/stored in any kind of persistent store via simple and extensible CacheStore interface Clients access server cache interfaces via client proxies

HOW DISTRIBUTED CACHE CAPABILITIES ARE USED IN BAGRI All documents are stored in Schemas. Schema is like a database in RDBMS Every schema is handled by dedicated Distributed Cache cluster Document meta-data (unique paths) are stored in caches and replicated between all cluster nodes Document data are sliced and stored in distributed caches in share-nothing fashion Other distributed caches store indexed values, compiled queries, in-fly transactions, query results and other internal stuff Client connects to server via underlying cache mechanisms Client requests are wrapped to distributed tasks and processed on the server side Results are returned back via dedicated channel (queue) established between client and server

BAGRI COMPONENTS Bagri deployment layers are: Clustered data servers Administration server(s) Persistent Store Clients

OTHER UNIQUE BAGRI FEATURES XQJ (JSR-225) and XDM (proprietary) client interfaces Full access control, role based security Wide indexes capabilities: unique, range, case-insensitive, XPath (wildcards) Server-side modules implemented in XQuery or in Java Custom triggers intercepting all document-level operations Java Binding allows client/server handling of documents as POJOs All document changes are versioned Document formats supported: XML, JSON, POJO, Map, … Open extension APIs to connect external data formats and document stores …

PERFORMANCE ANALYSIS The system demonstrates linear scalability with no performance degradation on volume increase Servers: 2..8 nodes started on 2 metal boxes; 20 cores; 4G per server Client: 1 metal box, 24 cores, 2G Network: 10G Throughput: 40..70K query/sec Latency: 1,1..1,7 ms XDM (direct) interface is 50% faster

Standalone (embedded) mode Client-server (grid) mode DEPLOYMENT MODELS Standalone (embedded) mode Client-server (grid) mode Managed schema clusters Administration server (optional) Host A Host B “Test” cluster TPoX client “TPoX” cluster System cluster Host C Admin host

MANAGEMENT & MONITORING Management layer delivered as a dedicated Distributed Cache cluster All functionality exposed via set of MBeans deployed on administration server Rich API allows management and monitoring of every peace of system functionality: Set of MBeans to handle Roles, Users, Nodes, Schemas, Modules, Libraries, Extensions Set of MBeans to manage and monitor statistics on Schema details: Model, Documents, Indices, Queries, Transactions, Triggers, Clients.. One central place to manage schemas, schema resources and schema access rights Can be used from any standard JMX console or via API

MANAGEMENT UI Implemented as a custom plugin for VisualVM tool Can be used to visually configure common and schema-related resources Provides explorer-style interface to work with documents and collections Provides query console to perform XQueries and see query results Actively developed, open for new features and suggestions

BAGRI COMPETITIVE ADVANTAGES True horizontal scalability, transparent redundancy & high availability inherited from the underlying distributed cache platform An ability to handle any high data volumes in real-time with memory access speed The well-known XQuery language is used for any kind of data manipulation and transformation Standard syntax for joins, sorting, grouping and many other valuable features; driven by industrial community Multi-document ACID transactions supporting all standard transaction isolation levels …

CONTACTS Bagri site: http://bagridb.com Bagri project repository: http://www.github.com/dsukhoroslov/bagri Bagri extensions project repository: http://www.github.com/dsukhoroslov/bagri-extensions Bagri Google Group: https://groups.google.com/forum/#!forum/bagridb Mail to: support@bagridb.com

Thank You!