Online Analytical Processing Stream Data: Is It Feasible?

Slides:



Advertisements
Similar presentations
Online Mining of Frequent Query Trees over XML Data Streams Hua-Fu Li*, Man-Kwan Shan and Suh-Yin Lee Department of Computer Science.
Advertisements

Clustering Data Streams Chun Wei Dept Computer & Information Technology Advisor: Dr. Sprague.
Incremental Clustering for Trajectories
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Data Warehousing Willem Visser RW334. Somebody is watching! Everybody seems to be recording your every move Loyalty cards Cookies – Facebook, Twitter,…
Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing.
Streaming Pattern Discovery in Multiple Time-Series Spiros Papadimitriou Jimeng Sun Christos Faloutsos Carnegie Mellon University VLDB 2005, Trondheim,
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
Cube Explorer: Online Exploration of Data Cubes Jiawei Han, Jianyong Wang, Guozhu Dong, Jian Pei, Ke Wang.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 12 —
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
1 Stream-based Data Management IS698 Min Song 2 Characteristics of Data Streams  Data Streams Data streams — continuous, ordered, changing, fast, huge.
Dunja Mladenić Marko Grobelnik Jožef Stefan Institute, Slovenia.
Data Mining – Intro.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Hierarchical Dwarfs for the Rollup-Cube Yannis Sismanis Antonios Deligiannakis Yannis Kotidis Nick Roussopoulos.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
OnLine Analytical Processing (OLAP)
1 Multi-dimensional Sequential Pattern Mining Helen Pinto, Jiawei Han, Jian Pei, Ke Wang, Qiming Chen, Umeshwar Dayal ~From: 10th ACM Intednational Conference.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
BUSINESS ANALYTICS AND DATA VISUALIZATION
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
Data Mining: Concepts and Techniques Mining data streams
Di Yang, Zhengyu Guo, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute EDBT 2010, Submitted 1 A Unified Framework Supporting Interactive.
Information Systems in Organizations
Information Systems in Organizations Managing the business: decision-making Growing the business: knowledge management, R&D, and social business.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Dense-Region Based Compact Data Cube
Jaclyn Hansberry MIS2502: Data Analytics The Things You Can Do With Data The Information Architecture of an Organization Jaclyn.
Presented by Niwan Wattanakitrungroj
Data Mining Functionalities
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
On-Line Application Processing
Data Mining – Intro.
Information Systems in Organizations
MIS2502: Data Analytics Advanced Analytics - Introduction
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data Warehouses Brief Overview Add ETL Copyright © 2011 Curt Hill.
Efficient Methods for Data Cube Computation
Mining Frequent Patterns from Data Streams
How to Crawl the Web Peking University 12/24/2003 Junghoo “John” Cho
Augmented Sketch: Faster and More Accurate Stream Processing
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Star Schema.
Supporting Fault-Tolerance in Streaming Grid Applications
Jiawei Han Department of Computer Science
Mining Dynamics of Data Streams in Multi-Dimensional Space
Community Distribution Outliers in Heterogeneous Information Networks
Mining Unusual Patterns in Data Streams in Multi-Dimensional Space
Data Warehousing and Data Mining
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
Data Mining: Concepts and Techniques
Supporting End-User Access
Data Mining: Concepts and Techniques
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
On-Line Application Processing
Introduction to Stream Computing and Reservoir Sampling
MIS2502: Data Analytics Introduction to Advanced Analytics
Data Mining: Concepts and Techniques
Big DATA.
Topological Signatures For Fast Mobility Analysis
Mining Data Streams Many slides are borrowed from Stanford Data Mining Class and Prof. Jiawei Han’s lecture slides.
Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer Science - University of Illinois at Urbana-Champaign.
Presentation transcript:

Online Analytical Processing Stream Data: Is It Feasible? Yixin Chen, Guozhu Dong, Jiawei Han, Jian Pei, Benjamin W. Wah, Jiayong Wang Univ. of Illinois at Urbana-Champaign Wright State Univ. Simon Fraser Univ. Peking Univ. June 2, 2002

Outline Characteristics of stream data Why on-line analytical processing and mining of stream data? A stream cube architecture Stream cube computation Discussion Conclusions 5/4/2019

What Is a Data Stream? Data Stream Ordered sequence of points, x1,…, xi, …, xn, that can be read only once or a small number of times in a fixed order Characteristics Huge volumes of data, possibly infinite Fast changing and requires fast response Data stream is more suited to our data processing needs of today Single linear scan algorithm: can only have one look random access is expensive Store only the summary of the data seen thus far Most stream data are at pretty low-level or multi-dimensional in nature, needs ML/MD processing 5/4/2019

Stream Data Applications Business: credit card transactions Telecommunication: phone calls Financial market: stock exchange Engineering & industrial processes: power supply Monitoring & surveillance: video streams Web page click streams 5/4/2019

Previous Research Stream data model Stream query model STanford stREam datA Manager (STREAM) Data Stream Management System (DSMS) Stream query model Continuous Queries Sliding windows Stream data mining Clustering & summarization (Guha, Motwani et al.) Correlation of data streams (Gehrke et al.) Classification of stream data (Domingos et al.) 5/4/2019

Why Stream Cube and Stream OLAP? Most stream data are at pretty low-level or multi-dimensional in nature: needs ML/MD processing Analysis requirements Multi-dimensional trends and unusual patterns Capturing important changes at multi-dimensions/levels Fast, real-time detection and response Comparing with data cube: Similarity and differences Stream (data) cube or stream OLAP Is it feasible? How to implement it efficiently? 5/4/2019

An Example Analysis of Web click streams Raw data at low levels: seconds, web page addresses, user ip addresses, … Analysts want: changes, trends, unusual patterns, at reasonable levels of details A typical data stream OLAP query Can we find patterns like “Average web clicking traffic in North America on sports in the last 15 minutes is 40% higher than that in the last 24 hours.” 5/4/2019

A Stream Cube Architecture A tilt time frame Different time granularities second, minute, quarter, hour, day, week, … Critical layers Minimum interest layer (m-layer) Observation layer (o-layer) User: watches at o-layer and occasionally needs to drill-down down to m-layer Partial materialization of stream cubes Full materialization: too space and time consuming No materialization: slow response at query time Partial materialization: what do you mean “partial”? 5/4/2019

A Tilt Time-Frame Model Up to 7 days 24hrs 4qtrs 15minutes 7 days Time Now 25sec. Up to a year 31 days 24 hours 4 qtrs 12 months Time Now 5/4/2019

Two Critical Layers in the Stream Cube (*, theme, quarter) o-layer (observation) (user-group, URL-group, minute) m-layer (minimal interest) (individual-user, URL, second) (primitive) stream data layer 5/4/2019

What Are the Issues? Materialization problem Computation problem Only materialize cuboids of the critical layers? Popular path approach vs. exception cell approach Computation problem How to compute and store stream cubes efficiently? How to discover unusual cells and patterns between the critical layer? 5/4/2019

Stream Cube Structure from the m-layer to the o-layer (A1, *, C1) (A1, *, C2) (A1, *, C2) (A1, *, C2) (A1, B1, C2) (A1, B2, C1) (A2, B1, C1) (A2, *, C2) (A2, B1, C2) A2, B2, C1) (A1, B2, C2) (A2, B2, C2) 5/4/2019

Stream Cube Computation Cube structure from m-layer to o-layer Three approaches All cuboids approach: materializing all cells Exceptional cells approach materializing only exceptional cells Popular path approach computing and materializing cells only along a popular path 5/4/2019

An H-Cube Structure Observation layer Minimal int. layer root sports politics enter. uic uiuc uiuc uic Minimal int. layer jeff mary jeff mary Q.I. Q.I. Q.I. Quant-Info Sum: xxxx Cnt: yyyy Regression: 5/4/2019

Feasibility analysis Popular path Exception cells approach Computing layers along the popular path Other planes/cells will be computed when requested Using H-cube structure to store computed cells (which form the stream cube) Tradeoff for time/space between cube materialization and online query computation Exception cells approach How to set up an appropriate thresholds for all the applications? 5/4/2019

b) Space vs. m-layer size a) Time vs. m-layer size b) Space vs. m-layer size Feasibility study: Time and space vs. # tuples at the m-layer for dataset D3L3C10T400K 5/4/2019

Time and space vs. # of levels b) Space vs. # levels a) Time vs. # levels Time and space vs. # of levels 5/4/2019

Conclusions Stream data analysis Besides query and mining, stream cube and OLAP are powerful tools for finding general and unusual patterns A multi-dimensional stream cube framework Tilt time frame Critical layers Popular path approach An important issues for further study Mining stream data at high-level, multiple-levels, or in multiple dimensions 5/4/2019

References A previous study on H-cubing J. Han, J. Pei, G. Dong, and K. Wang, “Computing Iceberg Data Cubes with Complex Measures”, SIGMOD’2001 A further study: regression cubes and stream data regression analysis Y. Chen, G. Dong, J. Han, B. Wah & J. Wang, “Multi-dimensional Regression Analysis of Time-series Data Streams,” VLDB 2002 5/4/2019

www.cs.uiuc.edu/~hanj Thank you !!! 5/4/2019