Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Introduction Spring 2016.

Similar presentations

Presentation on theme: "Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Introduction Spring 2016."— Presentation transcript:

1 Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Introduction Spring 2016

2 Welcome to COP5725! COP5725: Advanced Database Systems – Course website: all you need to know about COP5725 – Time: 2pm--3:15pm Mondays and Wednesdays – Venue: LOV 103 Please go over the syllabus carefully before taking the class! 1

3 Welcome to COP5725! Instructor – Prof. Peixiang Zhao – Office hours: Monday, Wednesday: 3:30pm-4:30pm Or by appointment – Office: LOV 262 – Research interest: Database, data mining, information/social network analysis TA – Dr. Esra Akbas – Office hours: Tuesday 10am – 11am – Office: MCH 106-A 2

4 The Goal of COP5725! 1.Reflection of the foundation: – Climb up to the shoulders – the foundational models, representations, systems, and techniques for relational database systems, by way of reading and lectures 2.Projection on the outlook: – And look out from here! Be inspired – what’s the next advanced database systems? – by way of reading and presenting the classics and the state-of-the- art, and by way of doing projects! “We can do it!” 3

5 The Contents of COP5725! Relational Database Internals – Fundamentals for relational databases – Data storage and representation – Advanced indexing – Query processing and execution – Query optimization – …… Advanced Database Topics – Parallel/Distributed databases (MapReduce) – Data mining (selected topics) – Data on the Web – …… 4

6 Welcome to COP5725! Textbook – Database Systems: The Complete Book 2nd edition – Hector Garcia-Molina, Jeff Ullman and Jennifer Widom Recommended reading – Database Management Systems 3rd edition, by Raghu Ramakrishnan and Johannes Gehrke – Readings in Database Systems 5th edition, by P. Bailis J. Hellerstein and M. Stonebraker ( – The Web Prerequisites – COP4710: Introduction to Database Systems – COP4530: Data Structures and Algorithms – Good programming skills 5

7 Welcome to COP5725! Components of the course 1.Two lectures every week 2.Two assignments (10%) 3.A series of papers to be read and summarized (15%) One or two-page paper summary to be submitted during the class on the due date 4.Paper presentation (5%) Every student (or group?) will present one paper related to her/his project in the class for 20(?) minutes 5.Semester-long project (30%) Research-flavor Implementation-flavor 6.A set of quizzes (5%) 7.Final exam (35%) 6

8 Paper Summaries Milestone papers in database systems Every paper will be assigned early in the course website, and can be downloaded within the campus network One to two pages summary includes – What is the problem? – Why is this problem important and worthy of a thorough study? – Why is this problem difficult? – What are the innovative ideas and technical merits? – Comments on the experimental evaluations – Any drawbacks and potential improvement? Summarize based on your own understanding. Verbatim copying from the paper results in low scores Contents in the paper will be tested in the final exam! 7

9 Paper Presentation Every student (or group) will have a chance to select one paper to present in the class – The paper should be closely related to the project you are conducting – The slides (pptx/ppt/pdf) should be sent to the instructor at least one day prior to the class you will be presenting – The slides organization should be similar to the requirement of the paper summary – 20(?) minutes presentation and Q&A Student will sign up for the presentation in the near future 8

10 Project Theme: choose either of the two 1.Research-flavor: mainly for Ph.D. students find an interesting, nontrivial data management problem, propose a novel and effective solution to it 2.Implementation-flavor: mainly for M.S. students find an interesting method/algorithm in a data management paper, implement it and perform experimental studies Teamwork: a group of one or two students (but no more!) The project is partitioned into multiple milestones, each of which requires deliverables 9

11 Multi-stage Project 1.Group formation (0%) 2.Project Proposal (10%) – What I want to do? 3.Literature Survey (20%) – What are the state-of-the-art? 4.Status report (10%) – What I have achieved thus far 5.Source code, software and final report (60%) – Dude, these are my deliverables! 10

12 Implementation Project Topics: – Choose a research paper published in the following conferences/journals after 2001, implement the idea and finish all experimental studies related to this idea – Conferences: SIGMOD, VLDB, ICDE, KDD, ICDM, SDM, SIGIR, WWW, CIKM – Journals: TODS, VLDB Journal, TKDD, TKDE Workload (in C/C++, Java, or Python) – 3000-5000 lines of code; real/synthetic data, experimental studies Expectation – Source code, software, detailed readmes and scripts, and a final report Repeatability, Completeness of datasets and experimental studies, Efficiency, Effectiveness, Scalability …… You may demo your implementation to TA 11

13 Research Project Topics: – A state-of-the-art data management, mining problem in your research area Workload – Problem definition, algorithm design and analysis, implementation (more than 3000 lines of code, in C/C++, Java, or Python), experimental studies – Your innovative ideas! Expectation – A conference-quality (potential publishable) paper – Source code, software, detailed readmes and scripts – You may demo your implementation to TA 12

14 Quizzes The first quiz will be held on Monday 01/11 – Takes up 3% of your full credit! – Coverage: Fundamentals in relational DB Data structures and algorithms Remaining quizzes will be held throughout the semester – Call for attendance – Get feedbacks and suggestions from students 13

15 Is This Course Suitable For Me? First-day Attendance Policy at FSU Prerequisites MUST be satisfied – Introduction to database systems Relational model, relational algebra, relational design, SQL, B/B+ tree, hashing, transaction management, crash recovery…… – Data structures and algorithms Difference between stack and queue? Worst-case complexity for insertion/deletion in Red-black trees? Dijkstra algorithm for shortest-path computation Set-cover is NP-complete ……. Feel comfortable in programing (a lot) 14

16 COP5725 = How DB Knowledge is created + How to create more In terms of topics, COP5725 is not: – about Linux + Apache + PHP + MySQL (LAMP) – about designing DBs that are in BCNF – about SQL3 and stored procedures – about Oracle tuning and implementation In terms of methodology, COP5725 is not solely – by reading textbook and acing it – by implementing a well-specified DB algorithm, e.g., B+tree 15

17 How to Get the Most out of COP5725? Read and think before class – read the textbooks for related concepts – read the papers Use lectures as road map for studying – Lecture notes won’t cover all the material Use your peers in learning – discuss in/out of classes to enhance understanding Explore interesting projects creatively – learning by doing 16

18 Any questions so far? 17

19 Evolution of Data Management 18 Jim Gray: Evolution of Data Management. IEEE Computer 29(10): 38-46 (1996)

20 Prehistory Thoughts: Emergence of the Notion of DBMS William C. McGee: Generalization: Key to Successful Electronic Data Processing. J. ACM 6(1): 1-23 (1959) When data processing was mostly ad-hoc programs --- Need generalization, e.g., – sorting – file maintenance – data access – modification and update – report generation – …… 19

21 How Did We Get Here? The dominating relational database system, which we take for granted now, was deemed impossible to implement and difficult to use in its early days But-- Quoting Jim Gray: These innovations give one of the best examples of research prototypes turning into products. The relational model, parallel database systems, active databases, and object-relational databases all came from the academic and industrial research labs. The development of database technology has been a textbook case of successful collaboration between academy and industry. -- Evolution of Data Management 20

22 Examples 21

23 In Industry 22

24 In Science – Turing Awardees 23 CHARLES BACHMAN, 1973EDGAR CODD, 1981 JAMES GRAY, 1998MICHAEL STONEBRAKER, 2014

25 The Grand Challenges of Data Management Relational DBMS was invented in early 70’s, and now 50+ billion mature industry What are we still working on? Big Data! – – What is the ultimately advanced DB? – Data of all sorts--- Prevalent on the Web! – What have you been searching lately? – What you search is what you want? New challenges naturally arise – structured vs. unstructured data – querying vs. analysis vs. searching – closed “base” vs. the open Web 24

26 Tallahassee, Florida, 2016 Have fun! Have fun! What Does 'Big Data' Mean and Who Will Win?

Download ppt "Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Introduction Spring 2016."

Similar presentations

Ads by Google