CS345: Advanced Databases Chris Ré. What this course is Database fundamentals: –Theory –Old Crusty, Good SQL stuff –No/New/Not-Yet SQL New stuff: Knowledge.

Slides:



Advertisements
Similar presentations
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Advertisements

CS 540 Database Management Systems
Database Management Systems 1 Ramakrishnan & Gehrke Introduction to Database Systems Chapter 1 Instructor: Mirsad Hadzikadic.
Becoming a Pragmatic Programmer Terry Cheng Nov 18, 2004.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
CS510 Concurrent Systems Course Overview. CS510 - Concurrent Systems 2 About the Instructor  Instructor – Jonathan Walpole o Professor at PSU o Research.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 1: Introduction to Relational.
The Relational Model CS 186, Spring 2007, Lecture 2 Cow book Section 1.5, Chapter 3 Mary Roth.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CS346: Advanced Databases Graham Cormode Term 2.
NoSQL Database.
Chapter 1 Overview of Databases and Transaction Processing.
ONTOLOGY SUPPORT For the Semantic Web. THE BIG PICTURE  Diagram, page 9  html5  xml can be used as a syntactic model for RDF and DAML/OIL  RDF, RDF.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
CSC2012 Database Technology & CSC2513 Database Systems.
CS426 Game Programming II Dan Fleck. Why games?  While the ideas in this course are demonstrated programming games, they are useful in all parts of computer.
Course Introduction Introduction to Databases Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
1 CS 430 Database Theory Winter 2005 Lecture 1: Introduction.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
CS461: Principles and Internals of Database Systems Instructor: Ying Cai Department of Computer Science Iowa State University Office:
Introduction to Database Systems Fundamental Concepts Irvanizam Zamanhuri, M.Sc Computer Science Study Program Syiah Kuala University Website:
Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.
04/30/13 Last class: summary, goggles, ices Discrete Structures (CS 173) Derek Hoiem, University of Illinois 1 Image: wordpress.com/2011/11/22/lig.
Mini-Project on Web Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 2: Intro to Relational.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
CS346: Advanced Databases Alexandra I. Cristea Term 1.
Database A database is a collection of data organized to meet users’ needs. In this section: Database Structure Database Tools Industrial Databases Concepts.
1.1 CAS CS 460/660 Relational Model. 1.2 Review E/R Model: Entities, relationships, attributes Cardinalities: 1:1, 1:n, m:1, m:n Keys: superkeys, candidate.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
SLIDE 1IS 257 – Fall 2014 NewSQL and VoltDB University of California, Berkeley School of Information IS 257: Database Management.
Big Data Analytics Carlos Ordonez. Big Data Analytics research Input? BIG DATA (large data sets, large files, many documents, many tables, fast growing)
Institute for Personal Robots in Education (IPRE)‏ CSC 170 Computing: Science and Creativity.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
Understanding the field & setting expectations.  Personal  International  UNT Alumni (Mathematics)  Academic  Economics & Mathematics  Professional.
CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems.
Compiler Construction (CS-636)
SEC835 Security in Databases and Web applications Presentation.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
CPS 216: Advanced Database Systems Shivnath Babu.
CS5604: Final Presentation ProjOpenDSA: Log Support Victoria Suwardiman Anand Swaminathan Shiyi Wei Department of Computer Science, Virginia Tech December.
CPS 216: Advanced Database Systems Shivnath Babu.
Fall CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh
CS 540 Database Management Systems Lecture1: Course overview.
Big Data Yuan Xue CS 292 Special topics on.
External Data Access Adam Rauch, 6/05/08 Team: Geoff Snyder, Kevin Beverly, Cory Nathe, Matthew Bellew, Mark Igra, George Snelling.
IMS 4212: Course Introduction 1 Dr. Lawrence West, Management Dept., University of Central Florida ISM 4212 Dr. Larry West
Big Data Yuan Xue CS 292 Special topics on.
CS445 Pacific University 1 11/16/2016 CS 445 Introduction to Database Systems TTH 1:00 – 2:15 Chadd Williams Office HoursM 1:00-2:00 Tue 11-noon Thur 3-4.
CSCI5570 Large Scale Data Processing Systems
Introduction to Database Systems CSE 444
CS422 Principles of Database Systems Course Overview
So, what was this course about?
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#1: Introduction
Introduction to NewSQL
Introduction to Database Systems
Relational Algebra Chapter 4, Part A
Compiler Construction
A brief history of data and databases
CMPT 733, SPRING 2016 Jiannan Wang
Programming Languages
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Parallel Analytic Systems
History of Database Systems
Programming Languages
Introduction to Database Systems CSE 444
Introduction to Database Systems CSE 444
CS 239 – Big Data Systems Fall 2018
Presentation transcript:

CS345: Advanced Databases Chris Ré

What this course is Database fundamentals: –Theory –Old Crusty, Good SQL stuff –No/New/Not-Yet SQL New stuff: Knowledge bases & Inference Databases is a strange and beautiful area: Theory, Algorithms, Systems, & Applications It’s a bit scattered, and I love it.

A Brief, Biased Database History

Three Turing Award Winners Charles Bachmann Edgar Codd Jim Gray Seminal contributions made in Industry

The Birth of the Relational Model (1971) database: a handful of relations (tables) with fixed schema. WorksIn(Employee,Dept) Query with small # of operations: Selection (filter), Projection, Join, Union. Basically, an operational finite model theory.

Data and Query Model R(A,B) = { (a 1,b 2 ),…,(a n,b n ) } S(B,C,D) = { (b’ 1,c 1,d 1 ),…,(b’ m,c m,d m ) }    R) ={ a : exists b. (a,b) in R } Projection Selection  F (R) ={ (a,b) : F( (a,b) ) for t in R } F : D(R) -> {True, False} Join(R,S) = { (a,b,c,d) : (a,b) in R & (b,c,d) in S} Join Data

Key idea of the Relational Model Declarative User says what they want--- not how to get it. Declarative User says what they want--- not how to get it.

Key question: Can one implement the Relational Model efficiently?

System R In,1974 System R shows possible to get good performance. 1 st Implementation of SQL. In,1974 System R shows possible to get good performance. 1 st Implementation of SQL. IBM didn’t Push it, worried about IMS cannibalization, but… IBM didn’t Push it, worried about IMS cannibalization, but… Pat Selinger

Others Come on to the Scene… Larry Ellison hears about IBM’s Research prototype and founds a company….

Fast Forward to Today Relational model is dominate model of data.

Takeaways about Database Research Started with mathematical elegance and with close ties to industry. Improve runtime performance as a proxy to increase programmer productivity.

The Big Ideas

Independence Declarative languages can improve productivity –Different team members work independently Backend, Storage, UI, BI, Etc. –Transactional model. –Challenge: Support efficient concurrent access?

Performance Parallel programming is hard; SQL is most popular parallel programming language. –How do you deal with asymmetry of memory hierarchy (Disk/MM/Cache)? –How do you structure parallel optimization? –Concurrency?

Manageability Systems live over time, and the system should automate many routine tasks. –Maintain derived data products (views) –Self-monitoring systems (autonomic)

Course Topics

A user says what they want— not how to get it.

Topic 1: QP Fundamentals Query Processing Fundamentals 1.Empirical Join evaluation from 70s! 2.System R: The Archetype (Cardinalityw) 3.Formal Query Languages 4.Acyclic Query Evaluation (Structure) 5.Worst-case Optimal Join Algorithms (S + C) This will be the most formal part of the course.

Analyzing your data before it was big (when it was just very large…)

Topic 2: OLAP-Style Analytics Building new and old data systems: 1.Theory of Materialized View 2.Gamma (Parallel DBs) 3.MapReduce & the Rise of NoSQL (2000s) 4.NewSQL & Optimizing Joins on MR (theory) 5.Fagin’s Algorithm (theory) 6.Statistical Analytic Systems

My biased view of the future…

Topic 3: Next-Generation Systems 1.Information Extraction 2.Probabilistic Query Evaluation (Theory) 3.Scalable Inference 4.Knowledge Bases

Transactions.

Topic 4: OLTP Style Transactional Systems 1.The rise of Key-Value Stores 2.The case for determinism 3.CALM & CAPs 4.The Return of Main Memory DBs. 5.Spanner, F1, and Data Centers

Course Logistics

Grading Course Project (More next) –Do something interesting with data. –Teams OK –Form teams soon and me by Jan 12. Midterm Exam

Projects in each topic 1.Knowledgebase Construction –Pick a domain and build a KBC system for it with DeepDive 2.Join Algorithms –Certificate versions (see me) –MapReduce? GraphLab? Spark? 3. Analytics Systems 4. Transactional Systems. You are free to choose other projects

Datasets Snapshot of the web marked up with NLP tools and structured data (KBP and KBA challenges) 500k+ docs used by PaleoBiologists and structured data. We can mark up even more stuff. Benchmark ML, graphs if you want to work on analytics or join evaluation.

Wednesday Wednesday we begin the ancient art of join evaluation. All who pass this way must pass through this ancient topic! Read: Shapiro. –not too carefully, we’ll go through details