CMU SCS 15-826: Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU www.cs.cmu.edu/~christos.

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos.
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #4: Multi-key and Spatial Access Methods - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications Faloutsos & Pavlo Lecture#11 (R&G ch. 11) Hashing.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos & A. Pavlo Lecture#6: Rel. model - SQL part1 (R&G, chapter.
Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
CMU SCS : Multimedia Databases and Data Mining Lecture#3: Primary key indexing – hashing C. Faloutsos.
IiWAS2002, Bandung, Indonesia Teaching and Learning Databases Dr. Stéphane Bressan National University of Singapore.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#1: Introduction.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
CMU SCS /615Faloutsos/Pavlo1 Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture #13: Query.
Introduction to Relational Databases Obtained from Portland State University.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Lecture#7 (cont’d): Rel. model - SQL part3.
Databases and Database Management System. 2 Goals comprehensive introduction to –the design of databases –database transaction processing –the use of.
Oracle SQL*plus John Ortiz. Lecture 10SQL: Overview2 Overview  SQL: Structured Query Language, pronounced S. Q. L. or sequel.  A standard language for.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#3: Primary key indexing – hashing C. Faloutsos.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos & A. Pavlo Lecture#7 : Rel. model - SQL part2 (R&G, chapters.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Rel. model - SQL part1.
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Relational Model – SQL Part I (based on notes by Silberchatz,Korth,
Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.
CS462: Introduction to Database Systems. ©Silberschatz, Korth and Sudarshan1.2Database System Concepts Course Information Instructor  Kyoung-Don (KD)
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
Course Introduction Introduction to Databases Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
CMU SSD7: Database Systems
CS461: Principles and Internals of Database Systems Instructor: Ying Cai Department of Computer Science Iowa State University Office:
Database Organization and Design
CSE373: Data Structures & Algorithms Lecture 15: B-Trees Linda Shapiro Winter 2015.
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Relational Model – SQL Part II (based on notes by Silberchatz,Korth,
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
INFS614, Dr. Brodsky, GMU1 Database Management Systems INFS 614 Instructor: Professor Alex Brodsky
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Introduction.
“INTRODUCTION TO DATABASE AND SQL”. Outlines 2  Introduction To Database  Database Concepts  Database Properties  What is Database Management System.
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
Database Systems I Content: –How to build a database application –Principles of database-system implementation Instructor: John Sieg Required Text:
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Rel. model - SQL part3.
Database Management COP4540, SCS, FIU Physical Database Design (2) (ch. 16 & ch. 6)
CS453: Databases and State in Web Applications (Part 2) Prof. Tom Horton.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Rel. model - SQL part2.
Mining real world data RDBMS and SQL. Index RDBMS introduction SQL (Structured Query language)
Introduction to Database Systems Exam Preparation.
File Organizations and Indexing
Chapter 8 Views and Indexes 第 8 章 视图与索引. 8.1 Virtual Views  Views:  “virtual relations”. Another class of SQL relations that do not exist physically.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
Financial Information Management FIM: Databases Stefano Grazioli.
SQL Introduction to database and SQL. Chapter 1: Databases and Database Users 6 Introduction to Databases Databases touch all aspects of our lives. Examples:
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Introduction to Database Systems, CS420 SQL Views and Indexes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
Digital Video Library - Jacky Ma.
Database Management Systems
Chapter # 14 Indexing Structures for Files
15-826: Multimedia Databases and Data Mining
Introduction to Relational Databases
15-826: Multimedia Databases and Data Mining
Selected Topics: External Sorting, Join Algorithms, …
Speeding Accesses to Data
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Presentation transcript:

CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU

CMU SCS Copyright: C. Faloutsos (2014)2 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining

CMU SCS Copyright: C. Faloutsos (2014)3 Problem Given a large collection of (multimedia) records, or graphs, find similar/interesting things, ie: Allow fast, approximate queries, and Find rules/patterns

CMU SCS Copyright: C. Faloutsos (2014)4 Problem Given a large collection of (multimedia) records, or graphs, find similar/interesting things, ie: Allow fast, approximate queries, and Find rules/patterns Q1: Examples, for ‘similar’ ?

CMU SCS Copyright: C. Faloutsos (2014)5 Sample queries Similarity search –Find pairs of branches with similar sales patterns –find medical cases similar to Smith's –Find pairs of sensor series that move in sync –Find shapes like a spark-plug –(nn: ‘case based reasoning’)

CMU SCS Copyright: C. Faloutsos (2014)6 Problem Given a large collection of (multimedia) records, or graphs, find similar/interesting things, ie: Allow fast, approximate queries, and Find rules/patterns Q1: Examples, for ‘interesting’ ?

CMU SCS Copyright: C. Faloutsos (2014)7 Sample queries –cont’d Rule discovery –Clusters (of branches; of sensor data;...) –Forecasting (total sales for next year?) –Outliers (eg., unexpected part failures; fraud detection)

CMU SCS (c) 2014, C. Faloutsos8 Example: WWW, Seoul U Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos. Net-Ray: Visualizing and Mining Billion-Scale Graphs PAKDD 2014, Tainan, Taiwan. ~1B nodes (web sites) ~6B edges (http links) ‘YahooWeb graph’

CMU SCS Copyright: C. Faloutsos (2014)9 Outline Goal: ‘Find similar / interesting things’ (crash) intro to DB Indexing - similarity search Data Mining

CMU SCS Copyright: C. Faloutsos (2014)10 Detailed Outline Intro to DB Relational DBMS - what and why?

CMU SCS Copyright: C. Faloutsos (2014)11 Detailed Outline Intro to DB Relational DBMS - what and why? –inserting, retrieving and summarizing data –views; security/privacy –(concurrency control and recovery)

CMU SCS Copyright: C. Faloutsos (2014)12 What is the goal of rel. DBMSs

CMU SCS Copyright: C. Faloutsos (2014)13 What is the goal of rel. DBMSs Electronic record-keeping: Fast and convenient access to information. Eg.: students, taking classes, obtaining grades; find my gpa

CMU SCS Copyright: C. Faloutsos (2014)14 Main vendors/products CommercialOpen source

CMU SCS Copyright: C. Faloutsos (2014)15 Main vendors/products Commercial Oracle IBM/DB2 MS SQL-server Sybase (MS Access,...) Open source Postgres (UCB) mySQL, sqlite, miniBase (Wisc) (

CMU SCS Copyright: C. Faloutsos (2014)16 Detailed Outline Intro to DB Relational DBMS - what and why? –inserting, retrieving and summarizing data –views; security/privacy –(concurrency control and recovery)

CMU SCS Copyright: C. Faloutsos (2014)17 How do DBs work? We use sqlite3 as an example, from

CMU SCS Copyright: C. Faloutsos (2014)18 How do DBs work? linux% sqlite3 mydb # mydb: file sqlite> create table student ( ssn fixed; name char(20) );

CMU SCS Copyright: C. Faloutsos (2014)19 How do DBs work? sqlite> insert into student values (123, “Smith”); sqlite> select * from student;

CMU SCS Copyright: C. Faloutsos (2014)20 How do DBs work? sqlite> create table takes ( ssn fixed, c_id char(5), grade fixed));

CMU SCS Copyright: C. Faloutsos (2014)21 How do DBs work - cont’d More than one tables - joins Eg., roster (names only) for

CMU SCS Copyright: C. Faloutsos (2014)22 How do DBs work - cont’d sqlite> select name from student, takes where student.ssn = takes.ssn and takes.c_id = “15826”

CMU SCS Copyright: C. Faloutsos (2014)23 SQL-DML General form: select a1, a2, … an from r1, r2, … rm where P [order by ….] [group by …] [having …]

CMU SCS Copyright: C. Faloutsos (2014)24 Aggregation Find ssn and GPA for each student

CMU SCS Copyright: C. Faloutsos (2014)25 Aggregation sqlite> select ssn, avg(grade) from takes group by ssn; DM!

CMU SCS Copyright: C. Faloutsos (2014)26 Detailed Outline Intro to DB Relational DBMS - what and why? –inserting, retrieving and summarizing data –views; security/privacy –(concurrency control and recovery)

CMU SCS Copyright: C. Faloutsos (2014)27 Views - what and why? suppose you ONLY want to see ssn and GPA (eg., in your data-warehouse) suppose secy is only allowed to see GPAs, but not individual grades (or, suppose you want to create a short- hand for a query you ask again and again) -> VIEWS!

CMU SCS Copyright: C. Faloutsos (2014)28 Views sqlite> create view fellowship as ( select ssn, avg(grade) from takes group by ssn);

CMU SCS Copyright: C. Faloutsos (2014)29 Views sqlite> create view fellowship as ( select ssn, avg(grade) from takes group by ssn);

CMU SCS Copyright: C. Faloutsos (2014)30 Views Views = ‘virtual tables’

CMU SCS Copyright: C. Faloutsos (2014)31 Views sqlite> select * from fellowship;

CMU SCS Copyright: C. Faloutsos (2014)32 Views sqlite> grant select on fellowship to secy;

CMU SCS Copyright: C. Faloutsos (2014)33 Detailed Outline Intro to DB Relational DBMS - what and why? –inserting, retrieving and summarizing data –views; security/privacy –(concurrency control and recovery) What if slow? Conclusions

CMU SCS What if slow? sqlite> select * from irs_table where ssn=‘123’; Q: What to do, if it takes 2hours? Copyright: C. Faloutsos (2014)34

CMU SCS What if slow? sqlite> select * from irs_table where ssn=‘123’; Q: What to do, if it takes 2hours? A: build an index Q’: on what attribute? Q’’: what syntax? Copyright: C. Faloutsos (2014)35 DM!

CMU SCS What if slow? sqlite> select * from irs_table where ssn=‘123’; Q: What to do, if it takes 2hours? A: build an index Q’: on what attribute? A: ssn Q’’: what syntax? A: create index Copyright: C. Faloutsos (2014)36 DM!

CMU SCS What if slow - #2? sqlite> create table friends (p1, p2); Q: Facebook-style: find the 2-step-away people Copyright: C. Faloutsos (2014)37

CMU SCS What if slow - #2? sqlite> create table friends (p1, p2); sqlite> select f1.p1, f2.p2 from friends f1, friends f2 where f1.p2 = f2.p1; Q: too slow – now what? Copyright: C. Faloutsos (2014)38

CMU SCS What if slow - #2? sqlite> create table friends (p1, p2); sqlite> select f1.p1, f2.p2 from friends f1, friends f2 where f1.p2 = f2.p1; Q: too slow – now what? A: ‘explain’: sqlite> explain select … Copyright: C. Faloutsos (2014)39 DM!

CMU SCS Long answer: Check the query optimizer (see, say, Ramakrishnan + Gehrke 3 rd edition, chapter15): Copyright: C. Faloutsos (2014)40 Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill 2002 (3rd ed).

CMU SCS Copyright: C. Faloutsos (2014)41 Conclusions (relational) DBMSs: electronic record keepers customize them with create table commands ask SQL queries to retrieve info

CMU SCS Copyright: C. Faloutsos (2014)42 Conclusions cont’d Data mining practitioner’s guide: create view, for short-hands / privacy group by + aggregates If a query runs slow: –explain select – to see what happens –create index – often speeds up queries

CMU SCS Copyright: C. Faloutsos (2014)43 For more info: Sqlite3: linux.andrew Postgres: linux.andrew Ramakrishnan + Gehrke, 3rd edition web page, eg, –

CMU SCS We assume known: B-tree indices Hashing (also, [Ramakrishnan+Gehrke, ch. 10, ch.11]) Copyright: C. Faloutsos (2014)44