15-826: Multimedia Databases and Data Mining

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos.
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #4: Multi-key and Spatial Access Methods - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos & A. Pavlo Lecture#6: Rel. model - SQL part1 (R&G, chapter.
CMU SCS : Multimedia Databases and Data Mining Lecture#3: Primary key indexing – hashing C. Faloutsos.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#1: Introduction.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Lecture#7 (cont’d): Rel. model - SQL part3.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
1 541: Database Systems S. Muthu Muthukrishnan. 2 Preliminaries  CS541. Thursdays 5 – 8 PM, CORE A. Course webpage:
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos & A. Pavlo Lecture#7 : Rel. model - SQL part2 (R&G, chapters.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Rel. model - SQL part1.
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Relational Model – SQL Part I (based on notes by Silberchatz,Korth,
Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.
CS462: Introduction to Database Systems. ©Silberschatz, Korth and Sudarshan1.2Database System Concepts Course Information Instructor  Kyoung-Don (KD)
Systems analysis and design, 6th edition Dennis, wixom, and roth
CMU SSD7: Database Systems
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Relational Model – SQL Part II (based on notes by Silberchatz,Korth,
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Introduction.
Database Systems I Content: –How to build a database application –Principles of database-system implementation Instructor: John Sieg Required Text:
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Rel. model - SQL part3.
CS453: Databases and State in Web Applications (Part 2) Prof. Tom Horton.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Rel. model - SQL part2.
Chapter 8 Views and Indexes 第 8 章 视图与索引. 8.1 Virtual Views  Views:  “virtual relations”. Another class of SQL relations that do not exist physically.
SQL Introduction to database and SQL. Chapter 1: Databases and Database Users 6 Introduction to Databases Databases touch all aspects of our lives. Examples:
1 Introduction to Database Systems, CS420 SQL Views and Indexes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
Database System Architecture and Implementation Execution Costs 1 Slides Credit: Michael Grossniklaus – Uni-Konstanz.
CSCI5570 Large Scale Data Processing Systems
Databases Stefano Grazioli.
Intro to MIS – MGS351 Databases and Data Warehouses
Query Processing and Optimization, and Database Tuning
CF 1334 Sistem Basis Data (3 SKS)
“Introduction To Database and SQL”
Digital Video Library - Jacky Ma.
Database Management Systems
Course Introduction 공학대학원 데이타베이스
Chapter # 14 Indexing Structures for Files
CS 540 Database Management Systems
MODELS OF DATABASE AND DATABASE DESIGN
Storage and Indexes Chapter 8 & 9
Copyright © 2003 by Kyu-Young Whang
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Databases and Data Warehouses Chapter 3
Introduction to Relational Databases
COSC 6340 Projects & Homeworks Spring 2002
“Introduction To Database and SQL”
1 Demand of your DB is changing Presented By: Ashwani Kumar
The Relational Model Content based on Chapter 3
C. Faloutsos Query Optimization – part 1
Lecture#12: External Sorting (R&G, Ch13)
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
15-826: Multimedia Databases and Data Mining
Selected Topics: External Sorting, Join Algorithms, …
CMPT 354: Database System I
Views and Indexes Controlling Concurrent Behavior
Speeding Accesses to Data
Chapter 17 Designing Databases
15-826: Multimedia Databases and Data Mining
McGraw-Hill Technology Education
C. Faloutsos Query Optimization – part 2
The Relational Model Content based on Chapter 3
15-826: Multimedia Databases and Data Mining
Lecuter-1.
Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.
Presentation transcript:

15-826: Multimedia Databases and Data Mining C. Faloutsos 15-826 15-826: Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU www.cs.cmu.edu/~christos

Copyright: C. Faloutsos (2017) 15-826 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Problem Given a large collection of (multimedia) records, or graphs, find similar/interesting things, ie: Allow fast, approximate queries, and Find rules/patterns 15-826 Copyright: C. Faloutsos (2017)

Problem Given a large collection of (multimedia) records, or graphs, find similar/interesting things, ie: Allow fast, approximate queries, and Find rules/patterns Q1: Examples, for ‘similar’? 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Sample queries Similarity search Find pairs of branches with similar sales patterns ??? 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Sample queries Similarity search Find pairs of branches with similar sales patterns find medical cases similar to Smith's Find pairs of sensor series that move in sync Find shapes like a spark-plug (nn: ‘case based reasoning’) 15-826 Copyright: C. Faloutsos (2017)

Problem Given a large collection of (multimedia) records, or graphs, find similar/interesting things, ie: Allow fast, approximate queries, and Find rules/patterns Q1: Examples, for ‘interesting’? 15-826 Copyright: C. Faloutsos (2017)

Sample queries –cont’d Rule discovery Clusters (of branches; of sensor data; ...) ??? 15-826 Copyright: C. Faloutsos (2017)

Sample queries –cont’d Rule discovery Clusters (of branches; of sensor data; ...) Forecasting (total sales for next year?) Outliers (eg., unexpected part failures; fraud detection) 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Example: ~1B nodes (web sites) ~6B edges (http links) ‘YahooWeb graph’ U Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos. Net-Ray: Visualizing and Mining Billion-Scale Graphs PAKDD 2014, Tainan, Taiwan. 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Outline Goal: ‘Find similar / interesting things’ (crash) intro to DB Indexing - similarity search Data Mining 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Detailed Outline Intro to DB Relational DBMS - what and why? 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Detailed Outline Intro to DB Relational DBMS - what and why? inserting, retrieving and summarizing data views; security/privacy (concurrency control and recovery) 15-826 Copyright: C. Faloutsos (2017)

What is the goal of rel. DBMSs C. Faloutsos 15-826 What is the goal of rel. DBMSs 15-826 Copyright: C. Faloutsos (2017)

What is the goal of rel. DBMSs C. Faloutsos 15-826 What is the goal of rel. DBMSs Electronic record-keeping: Fast and convenient access to information. Eg.: students, taking classes, obtaining grades; find my gpa <and other ad-hoc queries> 15-826 Copyright: C. Faloutsos (2017)

Main vendors/products Commercial Open source 15-826 Copyright: C. Faloutsos (2017)

Main vendors/products Commercial Oracle IBM/DB2 MS SQL-server Sybase (MS Access, ...) Open source Postgres (UCB) mySQL, sqlite, miniBase (Wisc) (www.sigmod.org) 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Detailed Outline Intro to DB Relational DBMS - what and why? inserting, retrieving and summarizing data views; security/privacy (concurrency control and recovery) 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) How do DBs work? We use sqlite3 as an example, from http://www.sqlite.org 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) How do DBs work? linux% sqlite3 mydb # mydb: file sqlite> create table student ( ssn fixed; name char(20) ); 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) How do DBs work? sqlite> insert into student values (123, “Smith”); sqlite> select * from student; 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) How do DBs work? sqlite> create table takes ( ssn fixed, c_id char(5), grade fixed)); 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) How do DBs work - cont’d More than one tables - joins Eg., roster (names only) for 15-826 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) How do DBs work - cont’d sqlite> select name from student, takes where student.ssn = takes.ssn and takes.c_id = “15826” 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) SQL-DML General form: select a1, a2, … an from r1, r2, … rm where P [order by ….] [group by …] [having …] 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Aggregation Find ssn and GPA for each student 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) DM! Aggregation sqlite> select ssn, avg(grade) from takes group by ssn; 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Detailed Outline Intro to DB Relational DBMS - what and why? inserting, retrieving and summarizing data views; security/privacy (concurrency control and recovery) 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Views - what and why? suppose you ONLY want to see ssn and GPA (eg., in your data-warehouse) suppose secy is only allowed to see GPAs, but not individual grades (or, suppose you want to create a short-hand for a query you ask again and again) -> VIEWS! 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Views sqlite> create view fellowship as ( select ssn, avg(grade) from takes group by ssn); 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Views sqlite> create view fellowship as ( select ssn, avg(grade) from takes group by ssn); 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Views Views = ‘virtual tables’ 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Views sqlite> select * from fellowship; 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Views sqlite> grant select on fellowship to secy; 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Detailed Outline Intro to DB Relational DBMS - what and why? inserting, retrieving and summarizing data views; security/privacy (concurrency control and recovery) What if slow? Conclusions 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) What if slow? sqlite> select * from irs_table where ssn=‘123’; Q: What to do, if it takes 2hours? 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) DM! What if slow? sqlite> select * from irs_table where ssn=‘123’; Q: What to do, if it takes 2hours? A: build an index Q’: on what attribute? Q’’: what syntax? 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) DM! What if slow? sqlite> select * from irs_table where ssn=‘123’; Q: What to do, if it takes 2hours? A: build an index Q’: on what attribute? A: ssn Q’’: what syntax? A: create index 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) What if slow - #2? sqlite> create table friends (p1, p2); Q: Facebook-style: find the 2-step-away people 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) What if slow - #2? sqlite> create table friends (p1, p2); sqlite> select f1.p1, f2.p2 from friends f1, friends f2 where f1.p2 = f2.p1; Q: too slow – now what? 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) DM! What if slow - #2? sqlite> create table friends (p1, p2); sqlite> select f1.p1, f2.p2 from friends f1, friends f2 where f1.p2 = f2.p1; Q: too slow – now what? A: ‘explain’: sqlite> explain select …. 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Long answer: Check the query optimizer (see, say, Ramakrishnan + Gehrke 3rd edition, chapter15): Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill 2002 (3rd ed). 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Conclusions (relational) DBMSs: electronic record keepers customize them with create table commands ask SQL queries to retrieve info 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) 15-826 Conclusions cont’d Data mining practitioner’s guide: create view, for short-hands / privacy group by + aggregates If a query runs slow: explain select – to see what happens create index – often speeds up queries 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) For more info: Sqlite3: www.sqlite.org - @ linux.andrew Postgres: also @ linux.andrew http://www.postgresql.org/docs/ Ramakrishnan + Gehrke, 3rd edition 15-415/615 web page, eg, http://www.cs.cmu.edu/~christos/courses/dbms.F16 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) We assume known: B-tree indices www.cs.cmu.edu/~christos/courses/826.S17/FOILS-pdf/020_b-trees.pdf Hashing www.cs.cmu.edu/~christos/courses/826.S17/FOILS-pdf/030_hashing.pdf (also, [Ramakrishnan+Gehrke, ch. 10, ch.11]) 15-826 Copyright: C. Faloutsos (2017)