Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 405G: Introduction to Database Systems

Similar presentations


Presentation on theme: "CS 405G: Introduction to Database Systems"— Presentation transcript:

1 CS 405G: Introduction to Database Systems
24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

2 Chen Qian @ University of Kentucky
Summary Tree-based indexes: O(logN) for search and update, support range queries Hash-based indexes: best for equality searches O(1), cannot support range searches. Static and dynamic 3/14/2018 Chen University of Kentucky 16

3 NoSQL Systems: Motivation
NoSQL: The Name “SQL” = Traditional relational DBMS Recognition over past decade or so: Not every data management/analysis problem is best solved using a traditional relational DBMS “NoSQL” = “No SQL” = Not using traditional relational DBMS “No SQL”  Don’t use SQL language

4 NoSQL Systems: Motivation
NoSQL: The Name “SQL” = Traditional relational DBMS Recognition over past decade or so: Not every data management/analysis problem is best solved using a traditional relational DBMS “NoSQL” = “No SQL” = Not using traditional relational DBMS “No SQL”  Don’t use SQL language “NoSQL” = “Not Only SQL”

5 NoSQL Systems: Motivation
Not every data management/analysis problem is best solved using a traditional DBMS Database Management System (DBMS) provides…. … efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data.

6 NoSQL Systems: Motivation
Alternative to traditional relational DBMS Flexible schema Quicker/cheaper to set up Massive scalability Relaxed consistency  higher performance & availability No declarative query language  more programming Relaxed consistency  fewer guarantees

7 NoSQL Systems: Motivation
Example #1: Web log analysis Each record: UserID, URL, timestamp, additional-info Task: Load into database system Data cleaning Data extraction Verification Schema Nothing above is needed for noSQL!

8 NoSQL Systems: Motivation
Example #1: Web log analysis Each record: UserID, URL, timestamp, additional-info Task: Find all records for… Given UserID Given URL Given timestamp Certain construct appearing in additional-info

9 NoSQL Systems: Motivation
Example #1: Web log analysis Each record: UserID, URL, timestamp, additional-info Separate records: UserID, name, age, gender, … Task: Find average age of user accessing given URL May not require strict consistency.

10 NoSQL Systems: Motivation
Example #2: Social-network graph Each record: UserID1, UserID2 Separate records: UserID, name, age, gender, … Task: Find all friends of friends of friends of … friends of given user Large number of joins? Not efficient at all! Specially designed graph database may be better

11 NoSQL Systems: Motivation
Example #3: Wikipedia pages Large collection of documents Combination of structured and unstructured data Task: Retrieve introductory paragraph of all pages about U.S. presidents before 1900 Mix of structured and unstructured data

12 NoSQL Systems: Motivation
Alternative to traditional relational DBMS Flexible schema Quicker/cheaper to set up Massive scalability Relaxed consistency  higher performance & availability No declarative query language  more programming Relaxed consistency  fewer guarantees

13 NoSQL Systems Overview

14 NoSQL Systems: Overview
Several incarnations MapReduce framework: OLAP Key-value stores: OLTP Document stores Graph database systems

15 NoSQL Systems: Overview
MapReduce Framework Originally from Google, open source Hadoop No data model, data stored in files User provides specific functions map() reduce() System provides data processing “glue”, fault-tolerance, scalability

16 NoSQL Systems: Overview
Map and Reduce Functions Map: Divide problem into subproblems Reduce: Do work on subproblems, combine results

17 NoSQL Systems: Overview
MapReduce Architecture

18 NoSQL Systems: Overview
MapReduce Example: Web log analysis Each record: UserID, URL, timestamp, additional-info Task: Count number of accesses for each domain (inside URL)

19 NoSQL Systems: Overview
MapReduce Example (modified #1) Each record: UserID, URL, timestamp, additional-info Task: Total “value” of accesses for each domain based on additional-info

20 NoSQL Systems: Overview
MapReduce Framework No data model, data stored in files User provides specific functions System provides data processing “glue”, fault-tolerance, scalability

21 NoSQL Systems: Overview
MapReduce Framework Schemas and declarative queries are missed Hive – schemas, SQL-like query language Pig – more imperative but with relational operators Both compile to “workflow” of Hadoop (MapReduce) jobs

22 NoSQL Systems: Overview
Key-Value Stores Extremely simple interface Data model: (key, value) pairs Operations: Insert(key,value), Fetch(key), Update(key), Delete(key) Implementation: efficiency, scalability, fault-tolerance Records distributed to nodes based on key Replication Single-record transactions, “eventual consistency”

23 NoSQL Systems: Overview
Key-Value Stores Extremely simple interface Data model: (key, value) pairs Operations: Insert(key,value), Fetch(key), Update(key), Delete(key) Some allow (non-uniform) columns within value Some allow Fetch on range of keys Example systems Google BigTable, Amazon Dynamo, Cassandra, Voldemort, HBase, …

24 NoSQL Systems: Overview
Document Stores Like Key-Value Stores except value is document Data model: (key, document) pairs Document: JSON, XML, other semistructured formats Basic operations: Insert(key,document), Fetch(key), Update(key), Delete(key) Also Fetch based on document contents Example systems CouchDB, MongoDB, SimpleDB, …

25 NoSQL Systems: Overview
Graph Database Systems Data model: nodes and edges Nodes may have properties (including ID) Edges may have labels or roles

26 NoSQL Systems: Overview
Graph Database Systems Interfaces and query languages vary Single-step versus “path expressions” versus full recursion Example systems Neo4j, FlockDB, Pregel, … RDF “triple stores” can map to graph databases

27 NoSQL Systems: Overview
“NoSQL” = “Not Only SQL” Not every data management/analysis problem is best solved exclusively using a traditional DBMS Current incarnations MapReduce framework Key-value stores Document stores Graph database systems


Download ppt "CS 405G: Introduction to Database Systems"

Similar presentations


Ads by Google