1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional.

Slides:



Advertisements
Similar presentations
Ranking Multimedia Databases via Relevance Feedback with History and Foresight Support / 12 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT AND EXPLORATION.
Advertisements

Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
A Framework for Clustering Evolving Data Streams Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu Presented by: Di Yang Charudatta Wad.
On Map-Matching Vehicle Tracking Data
Distance and Similarity Measures
1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)
Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa (Nara.
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Multimedia DBs.
Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.
Spatio-Temporal Databases
Efficient Similarity Search in Sequence Databases Rakesh Agrawal, Christos Faloutsos and Arun Swami Leila Kaghazian.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
1998/5/21by Chang I-Ning1 ImageRover: A Content-Based Image Browser for the World Wide Web Introduction Approach Image Collection Subsystem Image Query.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Multimedia DBs. Time Series Data
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Based on Slides by D. Gunopulos (UCR)
Spatial and Temporal Data Mining
Techniques and Data Structures for Efficient Multimedia Similarity Search.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
Lead Black Slide. © 2001 Business & Information Systems 2/e2 Chapter 7 Information System Data Management.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Info Vis: Multi-Dimensional Data Chris North cs3724: HCI.
Data Mining Techniques
Module 04: Algorithms Topic 07: Instance-Based Learning
Multimedia and Time-series Data
1 Wavelets for Efficient Querying of Large Multidimensional Datasets Wavelets for Efficient Querying of Large Multidimensional Datasets Cyrus Shahabi University.
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
1 C. Shahabi Mining Multidimensional Databases Mining Multidimensional Databases Cyrus Shahabi University of Southern California Dept. of Computer Science.
1 Data Mining: Data Lecture Notes for Chapter 2. 2 What is Data? l Collection of data objects and their attributes l An attribute is a property or characteristic.
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
Group 8: Denial Hess, Yun Zhang Project presentation.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
1/12/ Multimedia Data Mining. Multimedia data types any type of information medium that can be represented, processed, stored and transmitted over.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.
1 Database Systems Group Research Overview OLAP Statistical Tests Goal: Isolate factors that cause significant changes in a measured value – Ex:
Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC Relevance Feedback for Image Retrieval.
3/13/2016Data Mining 1 Lecture 1-2 Data and Data Preparation Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB) Bangkok.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.
Overview Issues in Mobile Databases – Data management – Transaction management Mobile Databases and Information Retrieval.
Dense-Region Based Compact Data Cube
Data Mining: Concepts and Techniques
Distance and Similarity Measures
Data Transformation: Normalization
BlinkDB.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
BlinkDB.
Supervised Time Series Pattern Discovery through Local Importance
Image Segmentation Techniques
Introduction to Spatial Databases
Efficient Evaluation of k-NN Queries Using Spatial Mashups
CSE572, CBS572: Data Mining by H. Liu
Algorithm design (computational geometry)
Nearest Neighbors CSC 576: Data Mining.
Group 9 – Data Mining: Data
Topological Signatures For Fast Mobility Analysis
CSE572: Data Mining by H. Liu
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Presentation transcript:

1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional data sets & approaches  Graphs (e.g., road networks)  Immersidata (e.g., haptic)  User profiles & aggregation/clustering

2 ISI’02 Storing multidimensional data (matrix vs. relations) Indexing multidimensional data (R-tree) Queries  Search for similar objects (similarity search ) [ICDE ’ 00,ICME ’ 00]  Spatial and temporal queries [ IDEAS ’ 00,ACM-GIS ’ 01,KAIS ’ 02] Multidimensional data mining  Aggregation [EDBT ’ 02,PODS ’ 02]  Clustering [ACM-MMj ’ 02]  Classification [INFORMS ’ 02]  Finding outliers [SSDBM ’ 01] Challenges

3 ISI’02 f (S1) e.g., avg e.g., std Stock Prices S1 Sn day $price 1365 day $price 1365 A point in 365 dimensions (computationally complex) f (Sn) A point in 2 dimensions (not accurate enough) 33 11 22 44 55 g (Sn) g (S1) A point in 5 dimensions transformation-based: FFT, Wavelet [SSDBM’00, 01]

4 ISI’02 More Similarity Search & Clustering More accurate Images Red Green Blue Color Histograms R G B Red Green Blue C Angle Sequences = [  ]          Shapes [ICDE’99 … ICME’00] Web Navigations (Hit) Feature Vectors [RIDE’97 … WebKDD’01] P1 P2 P3 P4 P5 … 3 870

5 ISI’02 On-Line Analytical Processing (OLAP) Multidimensional data sets:  Dimension attributes (e.g., Store, Product, Date)  Measure attributes (e.g., Sale, Price) Range-sum queries  Average sale of shoes in CA in 2001  Number of jackets sold in Seattle in Sep Tougher queries:  Covariance of sale and price of jackets in CA in 2001 (correlation)  Variance of price of jackets in 2001 in Seattle Store Location Product DateSale LA Shoes Jan. 01 $21,500 $85.99 NY Jacket June 01 $28,700 $45.99 Price Market-Relation  (p=shoe)  (s CA)  (d 2001) Avg (sale) Too Slow!

6 ISI’02 Example Solution (Pre-computation): Prefix-sum [Agrawal et. al 1997] Age Salary 25$50k 28$55k 30$58k 50$100k 55$130k 57 $120k $40k $55k $65k $100k$120k $150k Salary Age Query: Sum(salary) when (25 < age < 40) and (55k < salary < 150k) Issues: Measure attribute should be pre-selected Aggregation function should be pre-selected (sum or count) Updates are expensive (need re-computation) Result: I – II – III + IV Query: Sum(salary) when (25 < age < 40) and (55k < salary < 150k)

7 ISI’02 Spatial & Temporal Data Complex Queries Data types: A point: or A line-segment: A line: sequence of line-segments A region: A closed set of lines Moving point: (e.g., car, train, …) Changing region: (e.g., changing temperature of a county) Queries: Rivers Countries Hospitals Cities Taxi 5km of Home 10 min Experiments BrainR [Visual’99] [ACM-GIS’01, VLDB’01]

8 ISI’02 Spatial & Temporal Data & Queries Data types: A point: or A line-segment: A line: sequence of line-segments A region: A closed set of lines Moving point: (e.g., objects, car, train, … ) Queries: Molecules Microbes Train-stations Cities Round objects 5cm of Hand 10 s Number of distractions in of subject Station

9 ISI’02 Spatial & Temporal Data & Queries … K Nearest Neighbor queries: find the k nearest objects to a query point (5 closest hospitals to my car) u What is nearest? In road network (or a graph) is “shortest path” which is complex to compute in real- time for all points of interests A B C 2-D Space u Approach: embed graph into high dimensional space where computationally simple Minkowski metrics (e.g., Euclidean) can approximate real distances [ACM-GIS’02?] A B C Embedding Techniques (e.g., Lipschitz) n-D Space

10 ISI’02 Immersidata and Mining Queries [CIKM’01, UACHI’01]

11 ISI’02 … … Immersidata and Mining Queries … A dynamic sign, e.g., ASL colors 

12 ISI’02 Fuzzy Aggregation Fuzzy Aggregation Clusters User Profiles & Clustering Offline Processes PPED Similarity Measure and Clustering PPED Similarity Measure and Clustering User Profiles User 1 User 2 User 3 User 4 User 5 User U-6 User U-5 User U-4 User U-3 User U-2 User U-1 User U User 6 Voting Favorite Features (Rock= High Classical= Low Pop= Low Rap= High) Item Database Cluster Wish-list

13 ISI’02 PPED Similarity Measure PPED Similarity Measure Fuzzy Aggregation Clusters User Profiles & Clustering Online Processes Current User’s Profile A List of Similarity Values User Wish-List Cluster Wish-lists