Download presentation
Presentation is loading. Please wait.
1
Big Data Yuan Xue (yuan.xue@vanderbilt.edu) CS 292 Special topics on
2
Part I Relational Database Yuan Xue (yuan.xue@vanderbilt.edu)
3
Discussion Did you ever encounter a data management problem? Experimental data from a homework? Personal data? Other data? How did you manage your data?
4
Database Database: An integrated collection of related data Usually stored on secondary storage (as files) Also in-memory database Examples of databases Vanderbilt student database, course registration and grading database (backend of YES); Amazon’s products and customer database; Ebay’s products and transaction database; Facebook’s user and message database; And more… Database Data
5
Database Management System (DBMS) DBMS: A collection of software/programs Designed to assist in creating, and managing database Support defining, constructing, manipulating, sharing databases Examples of DBMSs Relational DBMSs: Commercial: Oracle, IBM (DB2, Informix), Microsoft (SQL Server, Access); Open source: MySQL, PostgreSQL NoSQL and newSQL: BigTable/Hbase, Cassandra, Redis, Riak, MongoDB, Dynamo, DynamoDB, Spanner Other: object-oriented database, etc
6
Database System Environment Database DBMS Users Data Application Data Without DBMS With DBMS
7
Benefit of DBMS Development convenience Reduce application development time Data independence: Application programs not dependent on data representation and storage details Data integrity and consistency: Enforce consistency constraints on data Data sharing and Concurrency control Data is better utilized (discovered and reused), redundancy of data is minimized Avoid undesirable race conditions that arise with simultaneous access/updates to data Centralized control DBA tunes the database to balance user's needs Security Prevent unauthorized access. Crash recovery Ensure the integrity of data in the presence of failures
8
Example Application – MiniTwitter What data do we need? What capabilities on the data do we need?
9
Example Application – MiniTwitter What data do we need? User profile info: ID, password, email, display name, picture, people I follow, people who follow me. Tweets: author, time, content (topic), replies (author, time, content), favorite (author, time), What capabilities on the data do we need? Register a new user Follow/unfollow a user (approve following request) post/delete a tweet Read/update in real-time all the tweets from the people I follow Show the number of tweets I posted, #people following me, #people I follow Trend information Information required to record System State Operations that update and retrieve System State
10
Three-Level Architecture Key question: how to describe data? Conceptual Data Model Logic Data Model Physical Data Model Entities, attributes, relationships (entity-relationship model) Coming next Storage, data structure
11
Database Model Logic Data Model: logical structure of data organization Types of data model Relational model: table Semistructured data model (XML/JSON) tree Various data models in NoSQL systems key-value pair column-family graph Object-oriented model object, class, inheritance a layer over relational model
12
Schema = structural description of relations in database Instance = actual contents at given point in time Schema – structural description of relations in database Instance – data in the database at a given point in time Relational Data Model IDNameEmailPassword Alice00Alicealice00@gmail.comAadf1234 Bob2013Bobbob13@gmail.comqwer6789
13
Schema = structural description of relations in database Instance = actual contents at given point in time Database = set of named relations (or tables) Each relation has a set of named attributes (or columns) Each tuple (or row) has a value for each attribute Each attribute has a type (or domain) Relational Data Model IDNameEmailPassword Alice00Alicealice00@gmail.comAadf1234 Bob2013Bobbob13@gmail.comqwer6789
14
Discussion How to design relations (tables) for MiniTwitter What are the aspects we need to consider?
15
Design – Version 0.1 IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. User Tweet FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow Pretending to be md5 hashcode ;)
16
Key – attribute whose value is unique in each tuple Or set of attributes whose combined values are unique Relational Data Model IDNameEmailPassword Alice00Alicealice00@gm ail.com Aadf1234 Bob2013Bobbob13@gmai l.com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDtimestampAuthorContent 00012013.12.20.1 1.20.2 Alice00Hello 00022013.12.20.1 1.23.6 Bob2013Nice weather 00032014.1.6.1.2 5.2 Alice00@Bob Not sure.. User Tweet IDFollowertimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow
17
Key – attribute whose value is unique in each tuple Or set of attributes whose combined values are unique Relational Data Model IDNameEmailPassword Alice00Alicealice00@gm ail.com Aadf1234 Bob2013Bobbob13@gmai l.com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDtimestampAuthorContent 00012013.12.20.1 1.20.2 Alice00Hello 00022013.12.20.1 1.23.6 Bob2013Nice weather 00032014.1.6.1.2 5.2 Alice00@Bob Not sure.. User Tweet IDFollowertimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow
18
Foreign Key – attribute or set of attributes in one table that point to the primary key of another Relational Data Model IDNameEmailPassword Alice00Alicealice00@gm ail.com Aadf1234 Bob2013Bobbob13@gmai l.com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDtimestampAuthorContent 00012013.12.20.1 1.20.2 Alice00Hello 00022013.12.20.1 1.23.6 Bob2013Nice weather 00032014.1.6.1.2 5.2 Alice00@Bob Not sure.. User Tweet IDFollowertimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow
19
Foreign Key – attribute or set of attributes in one table that point to the primary key of another Relational Data Model IDNameEmailPassword Alice00Alicealice00@gm ail.com Aadf1234 Bob2013Bobbob13@gmai l.com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDtimestampAuthorContent 00012013.12.20.1 1.20.2 Alice00Hello 00022013.12.20.1 1.23.6 Bob2013Nice weather 00032014.1.6.1.2 5.2 Alice00@Bob Not sure.. User Tweet IDFollowertimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow
20
More on Relational Data Model NULL – special value for “unknown” or “undefined” Relational Model Constraint Summary Domain constraints Key constraints Integrity contraints
21
Relational Data Model and Database Relation Model Simple representation Efficient implementation Driven by relational algebra and relational calculus Up-front definition of schemas and types that the data will thereafter adhere to High-level simple yet expressive query language Relational databases Proven success for both open source and proprietary systems Provide full ACID guarantees. SQL as widely used and standard way of database interaction
22
Creating and Using a Relational Database Steps in creating and using a (relational) database 1. Design schema (using DDL – data definition language) 2. Initialization: “Bulk load” initial data 3. Operation: execute queries and modifications Data Meta-data: database definition
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.