Download presentation
Presentation is loading. Please wait.
1
1 CIDR’03 AIMS: An Immersidata Management System Cyrus Shahabi Computer Science Department & Integrated Media Systems Center University of Southern California Los Angeles, CA 90089-0781 shahabi@usc.edu http://infolab.usc.edu
2
2 CIDR’03 Outline Definitions and Motivating Applications Immersive Data Types (focus: immersidata) AIMS Architecture Subsystems: Acquisition, Storage & Querying Current Status (demo, if time permits) Conclusion and Future Work
3
3 CIDR’03 Immersive Environments Immersive Environments allow a user to become immersed within an augmented or virtual reality environment in order to interact with people, objects, places, and databases. Examples Office of the Future (UNC) Fire Fighter Training System (Georgia Tech) Planetary Exploration (JPL) Physical/Occupational Therapy System (Haifa Univ.) Virtual Classroom and Office (USC IMSC) Haptic Museum (USC IMSC) MRE: Mission Rehearsal Exercise (USC ICT)
4
4 CIDR’03 Thesis (1) It is absolutely critical to understand the data generated by and for immersive environments For example, from the data acquired from a user’s interactions with an immersive environment (i.e., immersidata), we can learn about the user’s behavior to: Study human factor issues Measure the effectiveness of the environment Customize the information delivery Identify pitfalls in the system Better understand the user’s intentions Improve the system performance For immersive and multimedia community! For database community: Immersive sensors are the user interfaces of the future; as a research community we should study their generated data or we will miss the boat.
5
5 CIDR’03 Example: Immersive Sensor Data Streams
6
6 CIDR’03 command Play Run Stop Zoom-In Zoom-Out 0.72 0.15 0.63 0.92 0.25 Immersive environment Recognition System DB of Labeled Patterns Application (1) : Immersive Sensor Pattern Recognition On-Line Query & Analysis
7
7 CIDR’03 Acquisition Module Immersidata Database Spatio-Temporal (moving sensors) Query Evaluation ii ii 2. Sensor values sampled over time Recognition modules: -SVD -Bayesian Classifiers -Neural Net 3. Semantic description of hand 1. User makes ASL signs w/ a glove 4. ASL signs recognized CEF Application (1) : American Sign Language (ASL) as well-defined patterns
8
8 CIDR’03 On-Line query and analysis challenges: A hand sign is composed of a sequence of data samples across multiple sensor streams A sequence for one sign has no fixed length (i.e., can’t tell when one ends and the other starts!) Two problems (chicken & egg-problem) with interdependent solutions should be addressed Isolate signs Recognize the isolated sign An example statement in American Sign Language (ASL) like yellow shoes I Application (1) : ASL On-Line Q&A …
9
9 CIDR’03 Application (2) : Immersive Classroom Off-Line Query & Analysis Study attention performance for Normal & ADHD- Diagnosed Children A classroom as a virtual environment (virtual students, a virtual teacher, desks, a blackboard, a window to the playground, doors) Presence of distracters Paper airplane Ambient classroom noise Students walking Cars passing outside, visible through the window
10
10 CIDR’03 Application (2) : IC Off-Line Q&A … User, wearing HMD, is immersed into the class Trackers monitor body movements and stream data to the database Task: pressing a button when a particular letter pattern is seen on the virtual blackboard (e.g., AX) Head sensor data Arm sensor data Leg sensor data DB Mouse Clicks Displayed Characters Distracters
11
11 CIDR’03 Application (2) – IC Off-Line Q&A … Off-line query and analysis: Range-sum queries Sum of body movements Average reaction time to the patterns Number of correct hits Classification and clustering Use a classification technique to differentiate between normal and ADHD-diagnosed subjects (e.g., SVM) Distinguishing hyperactive kids from normal by automatically analyzing tracker data: major impact in psychotherapy, able to discriminate and specify diagnosis in a manner not possible using existing traditional methods
12
12 CIDR’03 Thesis (2) Immersive applications in training and simulation domains, share common data storage and analysis requirements (i.e., dealing w/ sensor data streams, aka immersidata) Hence, instead of building customized systems for the “acquisition, storage and querying” needs of each immersive application, one can design a general-purpose system addressing many of the shared requirements
13
13 CIDR’03 Common Data Components of Immersive Environments [ACM-ITP’02] User (subject(s)) Virtual Space Actor Objects Mission (task objective) Immersive Data Types Conventional Data: user data Spatio-Temporal Data: immersive space/time data Immersidata : Sensor Data Streams
14
14 CIDR’03 Focus: Immersidata [MIS’99] Data acquired from user’s interaction with the immersive environment Subject body positions Subject recognized gestures Can be analyzed to learn about user’s behavior Specifications Multidimensional Spatio-Temporal Continuous Data Streams (CDS) Potentially large in size and bandwidth requirements Noisy …,, …, …,, …
15
15 CIDR’03 1. Acquisition module DWPT basis selection for each dimension Transformation 2. Storage module Wavelets packing into disk blocks or DB BLOBS Immersidata storage (file-system + OR-DBMS) 4. Query & analysis module Application-specific GUI ProPolyne [web] services Users states and contexts Sensor Data Streams 3. User interaction module Pattern isolation heuristic Pattern matching: SVD-based measure AIMS: An Immersidata Management System
16
16 CIDR’03 Challenges of AIMS Subsystems Acquisition [SIGMETRICS’01,ICME’02] Data should be filtered and transformed (similar to signals) Database friendly signal processing techniques are required Storage [SIGMOD’03?] Physical level of storage system should be designed to store transformed data (e.g., wavelet coefficients) Block allocation strategies considering query patterns Offline Query and Analysis [EDBT’02.PODS’02] Approximate, progressive, and efficient polynomial analytical query on large amount of multidimensional data Online Query and Analysis [MMM’03] Common challenges with querying continuous data streams Real-time pattern recognition on aggregation of multiple data streams that are incrementally completing Data from all streams form the meaningful data
17
17 CIDR’03 1. Acquisition Module Receive multidimensional sensor streams In real-time selects different basis per dimension (optimally) from the DWPT (Discrete Wavelet Packet Transforms) library Applies multidimensional transformation to data (generates multi-resolution representations of data) NOTE: no compression is applied, no data will be lost by this process INPUT: Multidimensional streams OUTPUT: Wavelet coefficients Approaches:
18
18 CIDR’03 2. Storage Module Optimally packs related wavelet coefficients into disk blocks (to reduce future I/O cost) and store them in the file system or within OR-DBMS Includes corresponding disk blocks info into the DBMS (Database Management System) for future queries INPUT: Wavelet coefficients OUTPUT: disk blocks metadata records Approaches:
19
19 CIDR’03 Optimal Disk Placement for Wavelet Data Dependency Graph (Haar wavelets)
20
20 CIDR’03 Optimal Disk Placement for Wavelet Data Tiling - Blocking (Haar wavelets)
21
21 CIDR’03 3. User Interaction Module Receives data from various input-devices (beyond keyboard and mouse) used by the user (e.g., for data visualization purposes) Understands the set of requested actions (SVD + mutual- information) Translate actions to application-specific commands and/or database queries (takes user-profile & context into account) Also stores a history of users interactions to be mined off-line and/or on-line to extract user state/behavior and application context to facilitate future interactions by the same user (e.g., personalization/customization) INPUT: Camera/speech/tracker/immersive-sensor OUTPUT: application commands and queries user profile/state and application context Approaches:
22
22 CIDR’03 4. Query & Analysis Module Transforms queries into a consistent wavelet domain as of data Performs queries efficiently (and perhaps approximately or progressively) in the wavelet domain Displays the correct resolution/granularity of aggregate value(s) and/or events to the user based on user profile (e.g., tolerable latency time) and/or system requirements and/or data availability An event is tagged with space (e.g., latitude, longitude and altitude), time and bag of attributes INPUT: Range and point queries OUTPUT: Aggregate values/Integrated events Approaches:
23
23 CIDR’03 AIMS Main Theme: Data Manipulation, Query & Analysis in the WAVELET Domain Main idea/distinction: storage is cheap and queries are ad-hoc; let’s keep all the wavelet coefficients! (no data compression) Intuition: At the data population time, we don’t know which coefficients are more/less important Different than the signal-processing objective to reconstruct the entire signal as good as possible This has been observed by [Garofalakis & Gibbons, SIGMOD’02], but they proposed other ways to drop coefficients assuming a uniform workload Opportunity: At the query time, however, we have the knowledge of what is important to the pending query
24
24 CIDR’03 Define range-sum query as dot product of query vector and data vector (also observed by [Gilbert et. al, VLDB’2001] but no query transformation) Offline: Multidimensional wavelet transform of data At the query time: “lazy” wavelet transform of query vector (very fast) Dot product of query and data vectors in the transformed domain exact result Choose high-energy query coefficients only fast approximate result (90% accuracy by retrieving < 10% of data) Choose query coefficients in order of energy progressive result AIMS Main Theme: Q&A of Wavelets
25
25 CIDR’03 Progressive Evaluation of Vector Queries
26
26 CIDR’03 Current Status: ProPolyne Demonstration
27
27 CIDR’03 1. Acquisition module DWPT basis selection for each dimension Transformation 2. Storage module Wavelets packing into disk blocks or DB BLOBS Sensor Data storage (file-system + DBMS) 4. Query & analysis module Application-specific GUI ProPolyne [web] services Users states and contexts Remote Sensor Data Streams 3. User interaction module Pattern isolation heuristic Pattern matching: SVD-based measure AIMS with a Twist!
28
28 CIDR’03 Conclusion and Future Work A new application domain, immersive applications, and one of its data set, immersidata, were introduced Database challenges involved in managing immersidata discussed: Some direct adoption of the typical database research techniques (e.g., OLAP) Some modifications/extensions of the current research contributions (e.g., in the area of data streams) that are not applicable immediately The design of AIMS, an innovative data systems architecture, were reported Future Work I/O efficient ways for Wavelet transformation and incremental update Hybrid sorting of both data and query coefficients Prototypical implementation of an end-to-end application using AIMS Performance evaluation
29
29 CIDR’03 Application (3) – Physical/Occupational Therapy Both On-Line and Off-Line Q&A Rehabilitation research using virtual environments and gaming technologies Enables individuals with severe physical disabilities to use their residual motor abilities in more efficient and less fatiguing ways Patient watches her video projected on a 2-d virtual environment Video cameras track body movements Animated target characters are manipulated within the environment Patient is asked to hit the targets to gain more score Potential data analysis tasks Offline analysis of user performance in order to find specific motor disabilities Online analysis of body movements to add more targets in the directions which need more exercises
30
30 CIDR’03 Thanks!
31
31 CIDR’03 Haptic Data Acquisition [SIGMETRICS’01] Temporal aspect: the rate of which the values of sensors should be sampled? Trade-off between ‘accuracy & bandwidth utilization Fixed Sampling : Sampling at a constant rate; max value of speed is a function of system speed and/or haptic glove Group Sampling : Intuitive grouping of sensors; different sampling rate for each group Adaptive Sampling : Dynamic sampling; within a window of session, every sensor sampled at an individual optimal rate
32
32 CIDR’03 ProPolyne Features “Measure” can be any polynomial on any combination of attributes Can support COUNT, SUM, AVERAGE Also supports Covariance, Kurtosis, etc. All using one set of pre-computed aggregates Independent from how well the data set can be compressed/approximated by wavelets Because: We show “range-sum queries” can always be approximated well by wavelets (not always HAAR though!) Low update cost: O(log d N) Can be used for exact, approximate and progressive range-sum query evaluation
33
33 CIDR’03 Polynomial Range-Sum Queries Polynomial range-sum queries: Q(R,f,I) I is a finite instance of schema F R SubSetOf Dom( F ), is the range f : Dom( F ) R is a polynomial of degree Example: F = (Age, Salary) R : (25 < age < 40) & (55k < salary < 150k) Age Salary 25$50k 28$55k 30$58k 50$100k 55$130k 57 $120k I
34
34 CIDR’03 Polynomial Range-Sum Queries as “Vector Queries” The data frequency distribution of I is the function I : Dom( F ) Z that maps a point x to the number of times it occurs in I To emphasize the fact that a query is an operator on the data frequency distribution, we write Example: (25,50)= (28,55)=…= (57,120)=1 and (x)=0 otherwise. Age Salary 25$50k 28$55k 30$58k 50$100k 55$130k 57 $120k I where: if Hence: Or: Vector Query querydata
35
35 CIDR’03 Ha[i]’sGa[i]’s a[i]’s H 2 a [i]’sGHa[i]’s H 3 a[i]’s GH 2 a[i]’s H operator: computes a local average of array a at every other point to produce an array of summary coefficients: Ha Example (Haar) h=[1/2,1/2] G operator: measures how much values in the array a vary inside each of the summarized blocks to compute an array of detail coefficients: Ga Example (Haar) g=[1/2,-1/2] Overview of Wavelets DWT of a Summary coefficients of a at level 2 Detail coefficients of a at level 2 aka wavelet coefficients of a
36
36 CIDR’03 Naive Evaluation of Vector Queries Using Wavelets Hence, vector queries can be computed in the wavelet- transformed space as: Algorithm: Off-line transformation of data vector (or “data distribution function”, i.e., , to be exact) O (| I | l d log d N) for sparse data, O (| I |) = N d for dense data Transform the query vector at submission O (N d ) ! Sum-up the products of the corresponding elements of data and query vectors Retrieving elements of data vector: O (N d ) !
37
37 CIDR’03 Fast Evaluation of Vector Queries Using Wavelets Main intuitions: “query vector” can be transformed quickly because most of the coefficients are known in advance “Transformed query vector” has a large number of negligible (e.g., zero) values (independent on how well data can be approximated by wavelet) Example: Haar filter & COUNT function on R=[5,12] on the domain of integers from 0 to 15: Ga GHaGH 2 a GH 3 a H4aH4a At each step, you know the zeros
38
38 CIDR’03 Exact Evaluation of Vector Queries Query: SUM(salary) when (25 < age < 40) & (55k < salary < 150k) # of Wavelet Coefficients: 837# of Nonzero Coordinates: 4380
39
39 CIDR’03 Approximate Evaluation of Vector Queries
40
40 CIDR’03 Optimal Disk Placement for Wavelet Data The goal is to efficiently store wavelet coefficients Efficiently means fast access to stored data, low I/O complexity, little disk access How to achieve this: create a principle of locality of reference Designed for wavelet overlap queries, but can be extended for polynomial range-sum queries over multidimensional data
41
41 CIDR’03 Optimal Disk Placement for Wavelet Data Discrete Wavelet Transform x0x0 x1x1 x3x3 x4x4 x5x5 x6x6 x7x7 x2x2 00 11 33 44 55 66 77 22 DWT Time Domain Wavelet Domain (coefficients)
42
42 CIDR’03 SVD Background The idea of SVD is based on the following theorem of linear algebra: If matrix, then there exist column-orthonormal matrices U and V such that where and, and is a diagonal matrix such that
43
43 CIDR’03 Weighted-Sum SVD Each data sequence could be represented as a matrix, where the columns (r) are the sensors and hence their # is fixed The similarity metric of two data sequences is defined on the ‘square’ matrices To eliminate the effect that the number of rows (i.e., the time dimension) in the two matrices are different (i.e., multiply the matrix by its transpose matrix)
44
44 CIDR’03 Weighted-Sum SVD Problem: Obtain the similarity of input sequence and the pattern q 11 q 1r q r1 q rr p 11 p 1r p r1 p rr square SVD decompose e 1, e 2, …, e r × c1c1 crcr c2c2 × e1e2ere1e2er f 1, f 2, …, f r × d1d1 drdr d2d2 × f1f2frf1f2fr weight cw 1 cw r cw 2 cw 1 +cw 2 +…+ cw r =1 dw 1 dw r dw 2 dw 1 +dw 2 +…+ dw r =1
45
45 CIDR’03 Weighted-Sum SVD Problem: Obtain the similarity of input sequence and the pattern e 1, e 2, …, e r e1e2ere1e2er cw 1 cw r cw 2 f 1, f 2, …, f r f1f2frf1f2fr dw 1 dw r dw 2 The similarity of input sequence and the pattern =min(Θ 1, Θ 2 )
46
46 CIDR’03 The Ridge-Climbing Heuristic Procedure: Compute the accumulated similarity values (ASVs) between the input sequence and all vocabulary sequences Keep track of all ASVs For each vocabulary sequence, check whether the ASV is monotonically increasing, and whether a maximum is reached Yes: put this vocabulary into the candidates pool Choose the vocabulary from the candidates pool with biggest maximal value Isolate the recognized stream
47
47 CIDR’03 The Ridge-Climbing Heuristic Assume the database only has three vocabulary sequence, like, yellow, and I. like ASVs time ASVs time ASVs time yellowI Maximum is reached! Isolate! Reset the ASVs like Input sequence
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.