Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 440 Database Management Systems

Similar presentations


Presentation on theme: "CS 440 Database Management Systems"— Presentation transcript:

1 CS 440 Database Management Systems
Course overview

2 Welcome to CS440! Instructor: Arash Termehchy
Assistant Professor at EECS Research on data management and analytics Information & Data Management and Analytics (IDEA) Lab

3 The Era of Big Data Both opportunities and challenges.
Technological shifts, e.g., mobile devices, have created a staggering number of enormous data sets. Both opportunities and challenges.

4 Opportunities: unreasonable effectiveness of data
A. Halevy, et al. The unreasonable effectiveness of data, IEEE Intelligence Systems, 2009. Observation from working with large datasets in Google. More data generally outperforms complex statistical models in the data-centric prediction and discovery. Conclusion: Usually, no need for overly complex statistical models.

5 Opportunities are priceless! The story of John Snow
“In the mid-1850s, Dr. John Snow plotted cholera deaths on a map, and in the corner of a particularly hard-hit buildings was a water pump. A 19th-century version of Big Data, which suggested an association between cholera and the water pump.” Integrating data sets has saved millions of lives!

6 Paradigm shifting influence on scientific discovery
“The Fourth Paradigm: Data-Intensive Scientific Discovery”, Jim Gray Empirical Theoretical Computational Data-centric Sloan Sky Server database is a top cited resource in the field of astronomy. Astronomical observation => database query Spread of diseases by analyzing Google query log Personalized medicine, drug discovery, …

7 Challenges: data volume
Sloan Sky Server will soon store 30 terabyte per day. Hardon Colider can generate 500 exabyte per day. 90% of world data generated in the last two years (2013) Every two year : ten times more data

8 Challenges: data variety/ diversity
Database systems used to deal with a single static database. Need to transform and or integrate large number of evolving data sets. Impossible to do manually. “A data integration expert is never without a job”

9 Challenges: usability
“….(in the next few years) we project a need for 1.5 million additional analysts in the United States who can analyze data effectively…“, -- McKinsey Big Data Study, 2012 Current systems are not built for scientists and normal users. “It may take a PhD in computer science to successfully deploy a data analytics algorithm!”

10 The notion of database management system (DBMS)
Data processing used to be mostly ad-hoc programming. W. McGee, Generalization: Key to Successful Electronic Data Processing, Journal of ACM, 1959. Generalization, aka abstraction/ data modeling File: A sequence of records. Operation: sort, select part of the file, … Makes data management and processing usable. People can learn and use the abstraction instead of developing new data processing programs. How to build models that provide nice generalizations How to implement the efficiently

11 Abstraction is the key How to develop usable abstractions for our data? Data models, query languages, Relational data model, graph data model, … How to implement these abstractions efficiently? Database systems internal Storage management, indexing, ….

12 Topics How to develop usable abstractions for our data?
relational data model graph data model database programming How to implement these abstractions efficiently? storage management and indexing query processing algorithms query optimization Transaction management parallel and distributed data processing

13 Our plan Learn the fundamental concepts and ideas
Foundational models, algorithms, and systems. Textbooks, resources, and lectures. Apply them to new problems Apply the lessons learned to interesting database problems. By doing assignments.

14 Learning the fundamentals: Lectures
Review and discuss the material. Will be available on the course website after the class. Provide the road map for studying The course material can seem overwhelming. Attendance is not required but encouraged. Read the course material before the class. Participate and ask questions!

15 Learning the fundamentals: Readings
Textbooks: Database management systems, 3rd edition, R. Ramakrishnan and J. Gehrke. Cow book Mining Massive data sets, Jure Leskovec, Anand Rajaraman, Jeff Ullman. Free Online Papers for newer material: posted on the course website.

16 Learning the fundamentals: Readings
Recommended Database systems: the complete book, 2nd edition, Hector Garcia Molina, Jeffry Ullman, and Jennifer Widom. The complete book Foundations of databases, Serge Aitboul, Richard Hull, Victor Vianu Alice book

17 Learning the fundamentals: Exam
Midterm exam in class. Closed books and notes Tests your knowledge of the subjects discussed in the class. 40% of the overall grade In class No final exam

18 Apply your understanding: assignments
Seven assignments: Announced on Piazza and course website, posted on the course website. Both written and programming. Submit using TEACH Write using word processors and submit in pdf. Start early! 60% of the overall grade

19 How to get the most out of the course?
Communicate with the course staff TA: Vahid Ghadakchi, Parisa Ataie Piazza preferred method of communication Office hours Arash: Tuesday 4:30 – 5:30 pm Vahid: Monday/ Wednesday 4 – 5 pm Parisa: Monday 9 – 10 am the staff for other types of questions Use [cs440] tag in the subject line. Communicate with your peers on course materials and lectures. Check the Piazza and course website for announcements or possible changes in the schedule.

20 What is next? A review of relational model, relational algebra, and SQL. You refresh your memory by working on some advanced problems on relational model and database design.


Download ppt "CS 440 Database Management Systems"

Similar presentations


Ads by Google