Database Systems 236363 NoSQL. Source:

Slides:



Advertisements
Similar presentations
What is a Database By: Cristian Dubon.
Advertisements

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
1 CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. 2 Introduction - We discuss here two mathematical formalisms which can be used as the basis for stating and.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Introduction to Structured Query Language (SQL)
Instructor: Craig Duckett CASE, ORDER BY, GROUP BY, HAVING, Subqueries
PHP (2) – Functions, Arrays, Databases, and sessions.
Introduction to Structured Query Language (SQL)
Introduction to Structured Query Language (SQL)
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Introduction to SQL J.-S. Chou Assistant Professor.
Chapter 3 Single-Table Queries
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
Lists in Python.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
COMP5338 – Advanced Data Models
HAP 709 – Healthcare Databases SQL Data Manipulation Language (DML) Updated Fall, 2009.
CODD’s 12 RULES OF RELATIONAL DATABASE
Querying Structured Text in an XML Database By Xuemei Luo.
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
Database: SQL and MySQL
SQL: Data Manipulation Presented by Mary Choi For CS157B Dr. Sin Min Lee.
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
DATABASE TRANSACTION. Transaction It is a logical unit of work that must succeed or fail in its entirety. A transaction is an atomic operation which may.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Databases Shortfalls of file management systems Structure of a database Database administration Database Management system Hierarchical Databases Network.
Fall 2013, Databases, Exam 2 Questions for the second exam. Your answers are due by Dec. 18 at 4PM. (This is the final exam slot.) And please type your.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
DATA RETRIEVAL WITH SQL Goal: To issue a database query using the SELECT command.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
1 CSCE Database Systems Anxiao (Andrew) Jiang The Database Language SQL.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
NOSQL DATABASE Not Only SQL DATABASE
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
Database: SQL, MySQL, LINQ and Java DB © by Pearson Education, Inc. All Rights Reserved.
More SQL (and Relational Algebra). More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
NoSQL databases A brief introduction NoSQL databases1.
7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
Neo4j: GRAPH DATABASE 27 March, 2017
Fundamentals of DBMS Notes-1.
and Big Data Storage Systems
SQL Query Getting to the data ……..
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
Instructor: Craig Duckett Lecture 09: Tuesday, April 25th, 2017
Query-by-Example (QBE)
Modern Databases NoSQL and NewSQL
NOSQL databases and Big Data Storage Systems
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Prof: Dr. Shu-Ching Chen TA: Yimin Yang
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
Chapter # 7 Introduction to Structured Query Language (SQL) Part II.
Prof: Dr. Shu-Ching Chen TA: Haiman Tian
Contents Preface I Introduction Lesson Objectives I-2
Relational Database Design
Database Management Systems
Presentation transcript:

Database Systems NoSQL

Source:

“Definition” Literally: NoSQL = Not Only SQL Practically: Anything that deviates from standard Relational Database Management Systems (RDBMS)

Reminder: What is an RDBMS? Relational data model – Structured – Data represented through a collection of tables/relations – Normalization Relational query language + relational data manipulation language + relational data definition language – As a standard, SQL Strongly consistent concurrency control – The notion of transactions – The ACID model – Taught in the course NoSQL = any deviation from RDBMS on any of these axes This is in fact where most of the hype is

What’s Driving this Trend? 1.The relational data model is not perfectly suited for all applications – Semi-structured models (and corresponding query languages) are a better fit for some cases We’ve already talked about XML Other types of document data models – In other cases, a graph oriented data model (and query languages) are a better fit For example, for exploring the social graphs of social networks – In others, a Column-Family or even a schema-less Key- Value semantics is enough In the latter, both the key and value can be arbitrary objects

What’s Driving this Trend? 2.Performance, distribution, and Web scale – Traditional databases cannot keep up with the performance required by very large scale analytics applications (OLAP) – Internet companies need to handle massive amounts of data and the data needs to be available all the time with fast response time This leads to building large distributed data-centers Some choose to prefer weak consistency – BASE rather than ACID – details in more advanced courses Warning: This is a hyped new domain meaning that there is great confusion over what is what with no precise agreed upon definitions

Key-Value Interface consists of – put(key,value) and get(key) – sometimes also scan() Typically, the value can be of arbitrary type The key can either be a string or an arbitrary type The responsibility is delegated entirely to the application – Both for semantics enforcement and logic execution These systems are performance and availability driven – Most such systems provide BASE instead of ACID – Detailed discussion in (File Systems), (Distributed Systems), (Big Data) – here, only a brief discussion

On the Notion of Transactions A transaction is a collection of operations that form a single logical unit – Reserve a seat on the plane Verify that the seat is vacant with the price I was quoted and then reserve it, which also prevents others from reserving it – This might also involve charging a credit card – Order something from an online store Verify that the item exists in stock – Operations in a transactional file systems Moving a file from one directory to another – Involves verifying that the file exists, creating a copy in the new directory, and deleting the old one – Usually, each SQL query forms a single transaction

In Practice Things Get Complicated In practice, computers are often parallel – Further, many systems these days are distributed, adding another element of parallelization What happens when two people try to reserve the same seat on an airplane concurrently? – The common answer: only one should succeed What happens if a transaction fails in the middle? – The seat was already taken – should we charge the credit card? – The credit company refused the payment – should we hold the seat? – The receiving file system directory is full – should we remove its old copy from the old directory?

ACID vs. BASE Traditional semantics – Atomicity – Consistency – Isolation – Durability Key-Value typical semantics – Basic Avaiability – Soft state – Eventual consistency Hard to implement efficiently, especially in a distributed system. The system may require to block during network interruptions to avoid violating the strong consistency requirement When needed, willing to sacrifice strong consistency in favor of availability and performance More on this, in other courses (File Systems, Distributed Systems, Big Data)

Column-Family The data model here provides the abstraction of a multi family-column table – Each row is identified by a key – Each row consists of multiple column-families Sometimes a column-family is called a collection – Each column-family consists of one or more columns Yet, different rows may include different columns in the same column family – Data is typically immutable Cells include multiple versions The motivation here is again performance and availability – Decentralized implementations – Denormalization instead of normalization and joins – Some systems provide strong consistency and atomicity guarantees while others do not

More on Column-Family There are many variants of this model – The first well known example of this model is Google’s Big Table Only used inside Google – A very well known open source implementation is called Cassandra Initially developed by Facebook Currently an Apache open source project Cassandra Query Language (CQL)

Graph Database A property graph contains nodes and directed edges – Nodes represent entities – Edges represent (directed & labeled) relationships Both can have properties – Properties are key-value pairs Keys are strings; values are arbitrary data types Best suited for highly connected data – RDBMS is better suited for aggregated data

Graph Database Example – Nodes can be users of a social network and edges can represent “friend” relationships – Nodes can represent users and books and edges represent “purchased” relationships – Nodes can represent users and restaurants and edges represent “recommended” relationships The edge properties here can be the ranking as well as the textual review

Queries on a Graph Database The basic mechanism is called a Traverse – It starts from a given node and explores portions of the graph based on the query For example – Who are the friends of friends of friends of Amy? – What is the average rating for a given movie given by users whose friendship-distance from me is at most 5 hops?

Motivation for Graph Databases Generality and convenience – Many things can be naturally modeled as a graph Performance – The cost of joins does not increase with the total size of the data, but rather depends on the local part of the graph that is traversed by the query processor Extendibility and flexibility – New node types and new relationship types can be added to an existing graph Agility – The ability to follow agile programming and design methods

Neo4j and Cypher Neo4j is an open source graph database – (Relatively) well adopted by industry E.g., ebay, HP, National Geographic, Wallmart, Cisco, etc. Cypher is a widely used graph query languages, implemented in Neo4j – Simple to learn

Cypher The most basic Cypher query includes the following structure: – Pattern matching expression – Return expression based on variables bound in the pattern matching

Cypher Simple Example MATCH (c)-[:KNOWS]->(b)-[:KNOWS]->(a:user{name:'Michael‘}), (c)-[:KNOWS]->(a) RETURN b, c

Matching Nodes in Cypher (a) : node a – If a is already bound, we search for this specific node; otherwise, any node which will then be bound to a () : some node (:Ntype) : some node of type Ntype (a:Ntype) : node a of type Ntype (a { prop:’value’ } ) : node a that has a property called prop with a value ‘value’ (a:Ntype { prop:’value’ } ) : node a of type Ntype that has a property called prop with a value ‘value’

Matching Relationships in Cypher (a)--(b) : nodes a and b are related by a relationship (a)-->(b) : node a has a relationship to b (a)<--(b) : node b has a relationship to a (a)-->() : node a has a relationship to some node (a)-[r]->(b) : a is related to b by the relationship r (a)-[:Rtype]->(b) : a is related to b by a relationship of type Rtype (a)-[:R1|:R2]->(b) : a is related to b by a relationship of type R1 or type R2 (a)-[r:Rtype]->(b) : a is related to b by a relationship r of type Rtype

Advanced Matching Relationships (a)-->(b) (b)-->(c) : multiple relationships (a)-[:Rtype*2]->(b) : (a) is 2 hops away from (b) over relationships of type Rtype – If Rtype is not specified, can be any relationship (and different relationships in each hop) – When no number is given, it means any length path (a)-[:Rtype*minHops..maxHops]-> (b) : (a) is at least minHops and at most maxHops away from (b) over relationships of type Rtype – If minHops is not specified, default is 1 – If maxHops is not specified, default is infinity – Can even be 0! (a)-[r*2]->(b) : (a) is 2 hops away from (b) over the sequence of relationships r (a)-[*{prop:val}]->(b) : we search for paths in which all relationships have a property prop whose value is val

More Advanced Matching Relationships Named paths MATCH p=(a {prop:val} )-->() RETURN p Shortest path – shortestPath((a)-[*minHops..maxHops]-(b)) Finds the shortest path of length between minHops and MaxHops between (a) and (b) – allShorestPath((a)-[*]-(b)) Finds all shortest paths between (a) and (b)

Augmented Return Column alias MATCH (a { name: "A" }) RETURN a.age AS SomethingTotallyDifferent Unique results MATCH (a { name: "A" })-->(b) RETURN DISTINCT b Other expressions – Any expression can be used as a return item — literals, predicates, properties, functions, and everything else MATCH (a { name: "A" }) RETURN a.age > 30, "I'm a literal",(a)-->() – The result is the collection of the value True, “I’m a literal”, and the result of evaluating the function (a)-->()

ORDER BY Order results by properties MATCH (n) RETURN n ORDER BY n.age, n.name Descending order MATCH (n) RETURN n ORDER BY n.name DESC NULL is always ordered last in ascending order (default) and first in descending order – Note that missing node/relationship properties are evaluated to null

Where Clauses Provides criteria for filtering in pattern matching expression Examples: MATCH (n) WHERE n.name = 'Peter' XOR n.age < 30 RETURN n MATCH (n) WHERE n.name =~ 'Tob.*' RETURN n MATCH (tobias { name: 'Tobias' }),(others) WHERE others.name IN ['Andres', 'Peter'] AND (tobias)<--(others) RETURN others

Skip and Limit Limit crops the suffix of the result Skip eliminate the prefix MATCH (n) RETURN n ORDER BY n.name SKIP 1 LIMIT 2 This expression results in returning the 2 nd and 3 rd elements of the previously computed result

With Used to manipulate the result sequence before it is passed on to the following query parts – One common usage of WITH is to limit the number of entries that are then passed on to other MATCH clauses – WITH is also used to separate reading from updating of the graph Every part of a query must be either read-only or write-only When going from a reading part to a writing part, the switch must be done with a WITH clause MATCH (david { name: "David" })--(otherPerson)-->() WITH otherPerson, count(*) AS foaf WHERE foaf > 1 RETURN otherPerson MATCH (n) WITH n ORDER BY n.name DESC LIMIT 3 RETURN collect(n.name)

Union Combines the results of two or more queries into a single result set that includes all the rows that belong to all queries in the union – The number and the names of the columns must be identical in all queries combined by using UNION To keep all the result rows, use UNION ALL Using just UNION will combine and remove duplicates from the result set MATCH (n:Actor) RETURN n.name AS name UNION ALL MATCH (n:Movie) RETURN n.title AS name MATCH (n:Actor) RETURN n.name AS name UNION MATCH (n:Movie) RETURN n.title AS name With duplicates Without duplicates

CREATE (nodes) CREATE (n) Creates a node n CREATE (n:Person) Creates a node n of label Person CREATE (n:Person:Swedish) Creates a node n with two labels: Person and Swedish CREATE (n:Person { name : 'Andres', title : 'Developer' }) Creates a node n of label Person with properties name=‘Andres’ and title=‘Developer’ CREATE (a { name : 'Andres' }) Creates a node with a property name=‘Andres”

CREATE (relationships) MATCH (a:Person),(b:Person) WHERE a.name = 'Node A' AND b.name = 'Node B' CREATE (a)-[r:RELTYPE]->(b) RETURN r MATCH (a:Person),(b:Person) WHERE a.name = 'Node A' AND b.name = 'Node B' CREATE (a)-[r:RELTYPE { name : a.name + ' ' + b.name }]->(b) RETURN r CREATE p =(andres { name:'Andres' })-[:WORKS_AT]->(neo)<- [:WORKS_AT]-(michael { name:'Michael' }) RETURN p Creates a full path (nodes + relationships) Creates a relationship with properties Creates a labeled relationship

CREATE UNIQUE Creates only the parts of the graphs that are missing in a CREATE query – Left for the interested students to explore on their own…

Additional Cypher Clauses DELETE – Delete nodes and relationships Remove – Removes labels and properties SET – Updating labels on nodes and properties on nodes and relationships FOREACH – Performs an updating action on each item in a collection or a path MATCH p =(source)-[*]->(destination) WHERE source.name='A' AND destination.name='D' FOREACH (n IN nodes(p)| SET n.marked = TRUE )

A Note on Labels in Neo4j A node can have multiple labels Labels can be viewed as a combination of a tagging mechanism and is-a relationship – It enables choosing nodes based on their label(s) – In the future, it would enable imposing restrictions on properties and values I.e., act also as a light-weight optional schema A label can be assigned upon creation and using the SET expression A label can be removed using the REMOVE expression

Operators Mathematical – +, -, *, /,%, ^ Comparison – =,<>,,>=,<= Boolean – AND, OR, XOR, NOT String – Concatenation through + Collection – Concatenation through + – IN to check if an element exists in a collection

Simple CASE Expression CASE test WHEN value THEN result [WHEN...] [ELSE default] END Example: MATCH n RETURN CASE n.eyes WHEN 'blue' THEN 1 WHEN 'brown' THEN 2 ELSE 3 END AS result In CASE expressions, the evaluated test is compared against the value of the WHEN statements, one after the other, until the first one that matches. If none matches, then the default is returned if exists; otherwise, a NULL is returned.

Generic CASE Expression CASE WHEN predicate THEN result [WHEN...] [ELSE default] END Example MATCH n RETURN CASE WHEN n.eyes = 'blue' THEN 1 WHEN n.age < 40 THEN 2 ELSE 3 END AS result Here, each predicate is evaluated until the first one matches. If none match, the default value is returned if exists; otherwise, NULL.

Collections A literal collection is created by using brackets and separating the elements in the collection with commas RETURN [0,1,2,3,4,5,6,7,8,9] AS collection – The result is the collection [0,1,2,3,4,5,6,7,8,9] Many ways of selecting elements from a collection, e.g., RETURN range(0,10)[3]- 3 rd element (3 in this case) RETURN range(0,10)[-3]- 3 rd from the end (8 here) RETURN range(0,10)[0..3]- [0,1,2] RETURN range(0,10)[0..-5]- [0,1,2,3,4,5] RETURN range(0,10)[..4]- [0,1,2,3] RETURN range(0,10)[-5..]- [6,7,8,9,10]

More on Collections RETURN [x IN range(0,10)| x^3] AS result Result: [ 0.0,1.0,8.0,27.0,64.0,125.0,216.0,343.0,512.0,729.0, ] RETURN [x IN range(0,10) WHERE x % 2 = 0] AS result Result: [0,2,4,6,8,10] RETURN [x IN range(0,10) WHERE x % 2 = 0 | x^3] AS result Result: [0.0,8.0,64.0,216.0,512.0,1000.0]

Aggregation Aggregate functions take multiple input values and calculate an aggregated value from them – E.g., avg(), min(), max(), count(), sum(), stdev() MATCH (me:Person)-->(friend:Person)-->(friend_of_friend:Person) WHERE me.name = 'A' RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)

Back to the Train Operation Example Station S_Name Height S_Type Line L_Num Direction L_Type Train T_Num Days Service T_Category Class Food Serves Km Arrives A_Time D_Time Platform Travels Gives The graph includes the following elements:

Sample Queries Which stations are served by line 1-South? MATCH (line:Line {L_Num:'1',Direction:'South'})-[:Serves]->(station:Station) RETURN station Which lines have stations below sea level? MATCH (line:Line)-[:Serves]->(station:Station) WHERE station.height<0 RETURN DISTINCT line.L_Num,line.Direction

Sample Queries Which stations serve multiple lines? MATCH (line)-[:Serves]->(station) WITH station,count(line) as linesCount WHERE linesCount>1 RETURN station.S_Name How can I reach from station A to B with the minimal number of train changes MATCH (a:Station {S_Name:‘A'}), (b:Station {S_Name:‘B'}), p=shortestPath((a)-[:Serves*]-(b)) RETURN nodes(p)

Sample Queries What is the highest station? MATCH (s:Station) RETURN s ORDER BY s.height DESC LIMIT 1 Which trains serve all stations? MATCH (s:Station) WITH collect(s) AS sc MATCH (t:Train) WHERE ALL (x IN sc WHERE (t)-[:Arrives]->(x)) RETURN t

How Do I Choose? As a rule of thumb Source:

Are RDBMS Dead? (should I forget everything I learned in this course?) Definitely not!!! 1.RDBMS and SQL is the default time-tested database technology 2.See previous slide 3.RDBMS are making leapfrog improvements in performance due to advances in storage technologies and other optimizations, making them suitable for high demanding OLAP applications E.g., SAP’s HANA 4.Many modern Internet web sites rely on multiple databases, each of a different kind, for their various aspects Similarly to the fact that C++ or Java might be your default programming language, yet you might opt to use PHP, Ruby/Rails, Perl, Eiffel, Erlang, ML, etc. for various specific tasks

Additional Reading Graph Databases by Robinson, Webber, and Eifrem (O’Reilly) – free eBook