IKT437 Knowledge Engineering and Representation

Slides:



Advertisements
Similar presentations
C6 Databases.
Advertisements

© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Management Information Systems, Sixth Edition
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
NoSQL Databases: MongoDB vs Cassandra
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Modern Databases NoSQL and NewSQL Willem Visser RW334.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.
NOSQL Implementation and examples Maciej Matuszewski.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
CMPE 226 Database Systems May 3 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
NoSQL: Graph Databases
Neo4j: GRAPH DATABASE 27 March, 2017
Database Systems: Design, Implementation, and Management Tenth Edition
CS 405G: Introduction to Database Systems
NoSQL: Graph Databases
and Big Data Storage Systems
CSE 775 – Distributed Objects Bekir Turkkan & Habib Kaya
NoSQL Know Your Enemy Shelly Noll SRT Solutions, Ann Arbor, MI
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Open Source distributed document DB for an enterprise
Modern Databases NoSQL and NewSQL
NOSQL.
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
CMPE 280 Web UI Design and Development October 17 Class Meeting
Dineesha Suraweera.
Christian Stark and Odbayar Badamjav
CHAPTER 3 Architectures for Distributed Systems
Twitter & NoSQL Integration with MVC4 Web API
NOSQL databases and Big Data Storage Systems
Running on the Powerful Microsoft Azure Platform,
NoSQL Systems Overview (as of November 2011).
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
ArangoDB, with Microsoft Azure Functionality, Lets You Build Modern Applications on Top of Flexible, Multi-Model, Open-Source Database MICROSOFT AZURE.
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
Big Data - in Performance Engineering
MANAGING DATA RESOURCES
NoSQL Databases An Overview
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
NoSQL Databases Antonino Virgillito.
Overview of big data tools
Quasardb Is a Fast, Reliable, and Highly Scalable Application Database, Built on Microsoft Azure and Designed Not to Buckle Under Demand MICROSOFT AZURE.
Chapter 1: Introduction
NoSQL Not Only SQL University of Kurdistan Faculty of Engineering
April 13th – Semi-structured data
Transaction Properties: ACID vs. BASE
Data Warehousing Concepts
Introduction to NoSQL Database Systems
CMPE 280 Web UI Design and Development March 14 Class Meeting
Big DATA.
NoSQL & Document Stores
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Server & Tools Business
Presentation transcript:

IKT437 Knowledge Engineering and Representation NoSQL Terje Gjøsæter, Ph.D. UiA, Grimstad – 16. November 2015

Overview Introduction and Motivation History of NoSQL Categories of NoSQL Examples of NoSQL systems Encodings Querying Examples Summary

Introduction NoSQL has become increasingly popular and important lately. NoSQL – No SQL, or Not Only SQL? Many different variants, covering many different needs and use cases. So what is NoSQL? Every data store that is not SQL-based RDBMS? Q: Opinions?

Typical characteristics Non-relational Flexible schema Less structured data Supports big data Other or additional query languages than SQL Distributed – horizontal scaling Eventual consistency – tradeoff due to CAP theorem Q: Are you all familiar with the CAP theorem and consistency models?

CAP Theorem It is impossible for a distributed system to provide all three of the following at the same time: Consistency (all nodes see the same data at the same time) Availability (a guarantee that every request receives a) Partition tolerance (the system continues despite partitioning due to network failures)

Consistency Models (from distributed computing) Eventual Consistency A weak consistency model in a system with lack of simultaneous updates. If no update takes very long time, all replicas eventually will become consistent. Strict consistency The strongest consistency model. Requires that if a process reads any memory location, the value returned by the read operation is the value written by the most recent write operation to that location.

Motivation Why NoSQL? Less structured databases needed. Not all data fit into relational table-based structure. Social Media and Big Data are the big drivers for new database types. Data tends to be less structured and too big for traditional RDBMS. Let’s briefly introduce data storage needs of Social Media and Big Data.

Social Media and Web 2.0 – Example of Big Data Google, Facebook, Twitter, Instagram, Amazon and Yahoo among others need to store and handle enormous amounts of data These data tend to have different characteristics and requirements compared to «typical» structured database data. Less strict structure in the data. Need for a way to distribute of data across clusters that is easy to manage and use. Different requirements for consistency (see CAP-theorem) Example: sometimes we see a post on facebook disappearing and then showing up again later

Big Data – Early Definitions Large data sets, taxing the capacities of main memory, local disk, and even remote disk (1997) data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze McKinsey

HANDLING AND STORAGE OF «TOO BIG» DATA Source: Georgia Tech Library (http://d7.library.gatech.edu)

Meanwhile: Big Data – Opportunity-Enablers Data Warehousing Central repository of integrated (and highly structured) data for reporting and analysis Business Intelligence Analysing data to make informed business decisions. Data Mining Searching for interesting trends and patterns in data

Defining Big Data as an Opportunity Mayer-Schönberger & Cukier 2013: “The ability of society to harness information in novel ways to produce useful insights…” and “…things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value.” License: CC0 Public Domain

Big Data can be…….. SECURITY AND PRIVACY First mention 1997 – NASA: “data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. We call this the problem of big data. When data sets do not fit in main memory (in core), or when they do not fit even on local disk, the most common solution is to acquire more resources.” Oxford English Dictionary: “data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges.” McKinsey: “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze,” All these initial definitions tended to focus on challenges of handling and storage: «Too big» data. SECURITY AND PRIVACY Source: Wikimedia common, Camelia Bobanlicensed under the Creative Commons Attribution-Share Alike 3.0 Unported

Big Data Aspects and Life Cycle Overview Selection, Harvesting, Data Integration Store all? Include external data from different sources? Structuring and Storage Structured or unstructured? With meta- information? SQL or NoSQL? Analysis, Visualisation Machine learning; graphs, maps Protection and Usage Policy Security, privacy Sharing policy

NoSQL to the Rescue! NoSQL is able to cover the needs of Social Media and Big Data But different variants of NoSQL also support Small data Simple data Awkwardly shaped data Funny data Odd data

Typical Features of NoSQL Running well on clusters Mostly open-source Schema-less Not having to convert your data to and from a relational data model but can use the data model of your software directly.

Overview Introduction and Motivation History of NoSQL Categories of NoSQL Examples of NoSQL systems Encodings Querying Examples Summary

History of NoSQL Q: When did people first start talking about NoSQL?

History of NoSQL Q: When did people first start talking about NoSQL? The term NoSQL was used by Carlo Strozzi in 1998 to name his lightweight, Strozzi NoSQL open-source relational database that did not expose the standard SQL interface, but was still relational. Johan Oskarsson of Last.fm reintroduced the term NoSQL in early 2009 when he organised an event to discuss "open source distributed, non relational databases". Most early NoSQL systems did not support ACID and Joins. This is changing lately… Q: Are you all familiar with the ACID requirements for databases?

ACID Atomicity means that database modifications must follow an all or nothing rule. Each transaction is said to be atomic. If one part of the transaction fails, the entire transaction fails. Consistency means that only valid data will be written to the database. Isolation requires that multiple transactions occurring at the same time not impact each other’s execution. Durability ensures that any transaction committed to the database will not be lost.

Overview Introduction and Motivation History of NoSQL Categories of NoSQL Examples of NoSQL systems Encodings Querying Examples Summary

Categories of NoSQL – Key-value-based Supports  a dictionary or map of key-value pairs. Value may be simple or (un/semi/-)structured blob of data. Often used as basis for more complex data models. Wide Column Store A type of key-value database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table.

Categories of NoSQL – Document-oriented Supports storing, retrieving, and managing document-oriented information.  Documents encapsulate and encode data in some standard formats or encodings. XML subclass of document-oriented databases that are optimized to extract their metadata from XML documents. Object store Object includes data itself, variable amount of metadata, and globally unique identifier.  Storing photos on Facebook, songs on Spotify, or files in online collaboration service such as Dropbox.

Categories of NoSQL – Graph-based uses graph structures for semantic queries with nodes, edges and properties to represent and store data. Triplestore RDF Variant of graph-based. Stores triples: subject-predicate-object Alice knows Bob; Bob has Cat; Cat catches Mouse; Alice fears Mouse Adding a name to the triple makes a "quad store" or named graph.

Categories of NoSQL – Hybrids Multi-model Support multiple data models against a single, integrated backend. May also contain relational elements MultiValue Differs from RDBMS in that it support and encourage the use of attributes which can take a list of values, rather than all attributes being single-valued.

Key-Value Databases Key-value systems treat the data as a single opaque collection which may have different fields for every record. This offers considerable flexibility and more closely follows modern concepts like object-oriented programming. Because optional values are not represented by placeholders as in most RDBs, key-value stores often use far less memory to store the same database, which can lead to large performance gains in certain workloads. Examples: CouchDB,  Oracle NoSQL Database,  Dynamo,  MemcacheDB,  Redis 

Column-oriented Databases Wide Column-store The names and format of the columns can vary from row to row in the same table. A column has three elements: Unique name: Used to reference the column. Value: The content of the column. Simple type. Timestamp: The system timestamp used to determine the valid content. The timestamp is used to differentiate the valid content from stale ones. Examples: Accumulo,  Cassandra,  Druid,  HBase,  Vertica

Document-oriented Databases Databases has Collections that has Documents that has semi-structured data In a key-value store the data is considered to be opaque to the database. Document-oriented system relies on internal structure in the document to extract metadata that the database engine uses for optimization. Designed to offer a richer experience with modern programming techniques. XML databases are a specific subclass of document-oriented databases that are optimised to extract their metadata from XML documents. Examples: Clusterpoint, Apache CouchDB, Couchbase, Lotus Notes, MongoDB

Graph-based Databases Graph databases are based on graph theory. Nodes, properties, and edges. Nodes represent entities such as people, businesses, accounts, etc. Properties are information that relate to nodes. Edges are the lines that connect nodes to nodes or nodes to properties Most of the important information is really stored in the edges. Meaningful patterns emerge when one examines the connections and interconnections of nodes, properties, and edges Examples: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog

How do we Choose a NoSQL Database for our Project? Key-value databases are generally useful for storing session information, user profiles, preferences, shopping cart data. We would avoid using Key-value databases when we need to query by data, have relationships between the data being stored or we need to operate on multiple keys at the same time. Column oriented databases are generally useful for content management systems, blogging platforms, maintaining counters, expiring usage, heavy write volume such as log aggregation. We would avoid using column family databases for systems that are in early development, changing query patterns. Document databases are generally useful for content management systems, blogging platforms, web analytics, real-time analytics, ecommerce-applications. We would avoid using document databases for systems that need complex transactions spanning multiple operations or queries against varying aggregate structures. Graph databases are very well suited to problem spaces where we have connected data, such as social networks, spatial data, routing information for goods and money, recommendation engines.

Performance Data Model Performance Scalability Flexibility Complexity Functionality Key–Value Store high none variable (none) Column-Oriented Store moderate low minimal Document-Oriented Store variable (high) variable (low) Graph Database variable graph theory Relational Database relational algebra Source: Ben Scofield http://www.slideshare.net/bscofield/nosql-codemash-2010

Overview Introduction and Motivation History of NoSQL Categories of NoSQL Examples of NoSQL systems Encodings Querying Examples Summary

Examples of NoSQL Systems Examples of real world popular NoSQL database systems: MongoDB CouchDB BaseX Apache Cassandra Amazon DynamoDB Redis Neo4J

MongoDB Document-oriented Search by field, range queries, regular expression searches. Queries can return specific fields of documents and also include user-defined JavaScript functions. Any field in a document can be indexed Replication MongoDB provides high availability with replica sets. Load balancing MongoDB scales horisontally. The data is split into ranges and distributed across multiple servers. MapReduce can be used for batch processing of data and aggregation operations. JavaScript can be used in queries and aggregation functions (e.g. MapReduce).

Apache CouchDB Document-oriented “A database that completely embraces the web" Uses JSON to store data JavaScript as query language using MapReduce HTTP for API

BaseX XML-database XPath query language XQuery 3.1 Client-Server architecture with user and transaction management and logging facilities APIs: RESTful API, WebDAV, XML:DB, Java, C#, Perl, PHP, Python and others Supported data formats: XML, HTML, JSON, CSV, Text, binary data GUI including several visualisations: Treemap, table view, tree view, scatter plot

Apache Cassandra Hybrid between key-value and wide-column-oriented Decentralized: Every node in the cluster has the same role - no single point of failure. Data is distributed across the cluster but every node can service any request. Replication strategies are configurable. Scalable: Read and write throughput increase linearly as new machines are added. Data is automatically replicated to multiple nodes for fault-tolerance. Tunable consistency MapReduce support, Hadoop integration Query language: CQL

Amazon DynamoDB Key-value db fully managed proprietary NoSQL database service offered by Amazon.com. "built on the principles of Dynamo" (used initially for their own website). Language bindings for Java, Node.js, .NET, Perl, PHP, Python, Ruby, and Erlang.

Redis Key-value db Redis maps keys to types of values. Redis typically holds the whole dataset in memory. By default, Redis syncs data to the disk at least every 2 seconds, with more or less robust options available if needed. In the case of a complete system failure on default settings, only a few seconds of data would be lost. Language bindings include ActionScript, C, C++, C#, Clojure, Common Lisp, D, Dart, Erlang, Go, Haskell, Haxe, Io, Java, JavaScript (Node.js), Julia, Lua, Objective-C, Perl, PHP, Pure Data, Python, R, Racket, Ruby, Rust, Scala, Smalltalk and Tcl.

Neo4J Graph-oriented Implemented in Java and accessible from software written in other languages using the Cypher query language through a transactional HTTP endpoint. ACID-compliant transactional database with native graph storage and processing. The most popular graph database. Everything is stored as an edge, a node or an attribute. Each node and edge can have any number of attributes. Both the nodes and edges can be labelled. Labels can be used to narrow searches.

Overview Introduction and Motivation History of NoSQL Categories of NoSQL Examples of NoSQL systems Encodings Querying Examples Summary

Encodings XML Includes structure and meta-data JSON JavaScript Object Notation Simpler and less formal than XML. YAML a human-readable data serialization format, inspired by XML and JSON BSON Binary variant of JSON used by MongoDB RDF – Many variants! Q: Mention some RDF encodings?

Encodings RDF – Many different serialisation formats for RDF graphs! Turtle a compact, human-friendly format. N-Triples a very simple, easy-to-parse, line-based format that is not as compact as Turtle. N-Quads a superset of N-Triples, for serializing multiple RDF graphs. JSON-LD a JSON-based serialization. N3 or Notation3, a non-standard serialization that is very similar to Turtle, but has some additional features, such as the ability to define inference rules. RDF/XML an XML-based syntax that was the first standard format for serializing RDF.

Overview Introduction and Motivation History of NoSQL Categories of NoSQL Examples of NoSQL systems Encodings Querying Examples Summary

Querying How to Query a NoSQL database? SPARQL (SPARQL Protocol and RDF Query Language) HTTP REST API (with JSON) Specialised query language, e.g. CQL (Cassandra Query Language) Specialised API and/or client app (e.g. mongo client for MongoDB) Java or various other general purpose programming languages…

Querying How to Query a NoSQL database? SPARQL (SPARQL Protocol and RDF Query Language) HTTP REST API (with JSON) Specialised query language, e.g. CQL (Cassandra Query Language) Specialised API and/or client app (e.g. mongo client for MongoDB) Java or various other general purpose programming languages… Q: And?

Querying How to Query a NoSQL database? SPARQL (SPARQL Protocol and RDF Query Language) HTTP REST API (with JSON) Specialised query language, e.g. CQL (Cassandra Query Language) Specialised API and/or client app (e.g. mongo client for MongoDB) Java or various other general purpose programming languages… Q: And? SQL – NoSQL = «not only SQL» But SQL is still not very common.

Querying Lack of join means that we will often either: Do multiple queries. Create and store complex documents with all the application needs inside – e.g. a blogpost with all comments included. Un-normalise the database by duplicating information that is needed in multiple locations.

Overview Introduction and Motivation History of NoSQL Categories of NoSQL Examples of NoSQL systems Encodings Querying Examples Summary

Example – Semistructured Sensor Data Case: Storing sensor readings from mobile phone (CIEM: SmartRescue Project) MongoDB for storing readings in semi-structured documents containing available measurements at a given time. Transforming data to JSON document Querying, analysis and visualisation

Example – Native XML Storage Case: Storing anonymised IDS alarms in XML database. Alarms are formatted as IDMEF messages (an XML-based format) To be stored in BaseX XML database.

Example – Emergency information integration Case: Collecting information about emergencies from multiple sources (CIEM) Storage: Redis database.

Overview Introduction and Motivation History of NoSQL Categories of NoSQL Examples of NoSQL systems Encodings Querying Examples Summary

Summary Scalable distributable databases for un/semi-structured (big) data. Very flexible concerning datamodels. Not always ACID. CAP-theorem -> may be less emphasis on consistency. Not only SQL means SQL still allowed! Relational databases are still alive and have their uses; choose wisely!

The End