UNIT: User-ceNtrIc Transaction Management in Web-Database Systems Huiming Qu, Alexandros Labrinidis, Daniel Mosse Advanced Data Management Technologies.

Slides:



Advertisements
Similar presentations
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Advertisements

On Scheduling Vehicle-Roadside Data Access Yang Zhang Jing Zhao and Guohong Cao The Pennsylvania State University.
Supporting Cooperative Caching in Disruption Tolerant Networks
Hadi Goudarzi and Massoud Pedram
Hybrid Context Inconsistency Resolution for Context-aware Services
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Best-Effort Top-k Query Processing Under Budgetary Constraints
Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.
Cloud Control with Distributed Rate Limiting Raghaven et all Presented by: Brian Card CS Fall Kinicki 1.
LOAD BALANCING IN A CENTRALIZED DISTRIBUTED SYSTEM BY ANILA JAGANNATHAM ELENA HARRIS.
Backlog Estimation and Management for Real-Time Data Services Kyoung-Don Kang, Jisu Oh, and Yan Zhou Department of Computer Science State University of.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
What’s the Problem Web Server 1 Web Server N Web system played an essential role in Proving and Retrieve information. Cause Overloaded Status and Longer.
Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.
Soft Real-Time Semi-Partitioned Scheduling with Restricted Migrations on Uniform Heterogeneous Multiprocessors Kecheng Yang James H. Anderson Dept. of.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
1 Searching the Web Junghoo Cho UCLA Computer Science.
1 Resource Management in IP Telephony Networks Matthew Caesar, Dipak Ghosal, Randy H. Katz {mccaesar,
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
1 How to Crawl the Web Looksmart.com12/13/2002 Junghoo “John” Cho UCLA.
ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Computer Science Department Stony Brook University.
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud Alexander G. Connor Panos K. Chrysanthis Alexandros Labrinidis Advanced Data Management.
1 Crawling the Web Discovery and Maintenance of Large-Scale Web Data Junghoo Cho Stanford University.
1 Internet and Data Management Junghoo “John” Cho UCLA Computer Science.
ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Department of Computer Science Stony Brook University.
SAIU: An Efficient Cache Replacement Policy for Wireless On-demand Broadcasts Jianliang Xu, Qinglong Hu, Dik Lun Department of Computer Science in HK University.
Mariam Salloum (YP.com) Xin Luna Dong (Google) Divesh Srivastava (AT&T Research) Vassilis J. Tsotras (UC Riverside) 1 Online Ordering of Overlapping Data.
1 Crawling the Web Discovery and Maintenance of Large-Scale Web Data Junghoo Cho Stanford University.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
Bargaining Towards Maximized Resource Utilization in Video Streaming Datacenters Yuan Feng 1, Baochun Li 1, and Bo Li 2 1 Department of Electrical and.
Active Learning for Class Imbalance Problem
An Analytical Performance Model for Co-Management of Last-Level Cache and Bandwidth Sharing Taecheol Oh, Kiyeon Lee, and Sangyeun Cho Computer Science.
Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.
Preference-Aware Query and Update Scheduling in Web-databases Huiming Qu Department of Computer Science University of Pittsburgh Joint work with Prof.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Higashino Lab. Maximizing User Gain in Multi-flow Multicast Streaming on Overlay Networks Y.Nakamura, H.Yamaguchi and T.Higashino Graduate School of Information.
Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
Multi-Criteria Routing in Pervasive Environment with Sensors Santhanakrishnan, G., Li, Q., Beaver, J., Chrysanthis, P.K., Amer, A. and Labrinidis, A Department.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
IT Infrastructure for Providing Energy-as-a-Service to Electric Vehicles Smruti R. Sarangi, Partha Dutta, and Komal Jalan IEEE TRANSACTIONS ON SMART GRID,
Zone Sharing: A Hot-Spots Decomposition Scheme for Data-Centric Storage in Sensor Networks Mohamed Aly Nicholas Morsillo Panos K. Chrysanthis Kirk Pruhs.
Energy-Efficient Monitoring of Extreme Values in Sensor Networks Loo, Kin Kong 10 May, 2007.
Kaleidoscope – Adding Colors to Kademlia Gil Einziger, Roy Friedman, Eyal Kibbar Computer Science, Technion 1.
Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.
1 Admission Control and Request Scheduling in E-Commerce Web Sites Sameh Elnikety, EPFL Erich Nahum, IBM Watson John Tracey, IBM Watson Willy Zwaenepoel,
New Sampling-Based Summary Statistics for Improving Approximate Query Answers Yinghui Wang
Presented By Anirban Maiti Chandrashekar Vijayarenu
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
@ Carnegie Mellon Databases 1 Finding Frequent Items in Distributed Data Streams Amit Manjhi V. Shkapenyuk, K. Dhamdhere, C. Olston Carnegie Mellon University.
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
Improving QoS in BitTorrent-like VoD Systems Yan Yang Alix L.H. Chow Leana Golubchik Dannielle Bragg Univ. of Southern California Harvard University InfoCom.
A Multicast Routing Algorithm Using Movement Prediction for Mobile Ad Hoc Networks Huei-Wen Ferng, Ph.D. Assistant Professor Department of Computer Science.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.
Scalability of Local Image Descriptors Björn Þór Jónsson Department of Computer Science Reykjavík University Joint work with: Laurent Amsaleg (IRISA-CNRS)
Real-Time Databases and Data Services Krithi Ramamritham, Sang Son, Lissa Dipippo.
A paper on Join Synopses for Approximate Query Answering
How to Crawl the Web Peking University 12/24/2003 Junghoo “John” Cho
Augmented Sketch: Faster and More Accurate Stream Processing
Spatial Online Sampling and Aggregation
Load Shedding in Stream Databases – A Control-Based Approach
Farzaneh Mirzazadeh Fall 2007
Jason Neih and Monica.S.Lam
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

UNIT: User-ceNtrIc Transaction Management in Web-Database Systems Huiming Qu, Alexandros Labrinidis, Daniel Mosse Advanced Data Management Technologies Lab Department of Computer Science University of Pittsburgh

ADMT Lab, Department of Computer Science, University of Pittsburgh 2 UPDATESQUERIES Stock Trading Services (ideal) Web databases GOOG IBM $367.9 $75.8 GOOG IBM

ADMT Lab, Department of Computer Science, University of Pittsburgh 3 GOOG Stock Trading Services (reality) Web databases GOOG IBM SUN GOOG MSFT GOOG IBM OTE OVERLOADED! To avoid overloading: 1.increase hardware capacity, or 2.adding software support

ADMT Lab, Department of Computer Science, University of Pittsburgh 4 Stock Trading Services (UNIT) Web databases UNIT MSFT GOOG IBM SUN TUTU $367.9 $75.8 GOOG IBM OTE

ADMT Lab, Department of Computer Science, University of Pittsburgh 5 Problem Statement Users’ satisfaction are based on: Freshness: query is answered based on fresh data Timeliness: query is answered with short response time Transaction types –read-only queries and write-only updates are competing for system resources, more cpu to queries, better timeliness. more cpu to updates, better freshness. Optimization Goal: Maximize user satisfaction –through balancing the load of query and update transactions.

ADMT Lab, Department of Computer Science, University of Pittsburgh 6 Outline Motivating Example Performance metric: User Satisfaction System overview & algorithms Experiments Related work Conclusions

ADMT Lab, Department of Computer Science, University of Pittsburgh 7 User Requirements Timeliness: Meeting deadlines –Query response time ≤ its relative deadline. Freshness: Meeting freshness requirements –Query freshness ≥ its freshness requirement. –Query freshness (aggregation of data freshness): The minimal freshness of data accessed by the query –Data freshness (lag-based): Based on the number of unapplied updates Query U1 U3 U2 Q1 returns with U1 t

ADMT Lab, Department of Computer Science, University of Pittsburgh 8 Is Success Ratio Enough? Queries may be failed and dropped if: –rejected because of the admission control (Rejection Failure), or –fail to meet the deadlines (Deadline Missed Failure), or –fail to meet the freshness requirements (Data Stale Failure) Otherwise, it succeeds. Success Ratio: % of queries meeting their timeliness and freshness requirements. What is missing from success ratio? –Users’ preferences between timeliness and freshness.

ADMT Lab, Department of Computer Science, University of Pittsburgh 9 User Satisfaction Metric (USM)

ADMT Lab, Department of Computer Science, University of Pittsburgh 10 Outline Motivating Example Performance metric: User Satisfaction System overview & algorithms Experiments Related work Conclusions

ADMT Lab, Department of Computer Science, University of Pittsburgh 11 UNIT System (User-ceNtrIc Trans- action Management) Web-databases –Dual priority queue Updates > queries EDF for queries FIFO for updates –2PL-HP UNIT: load control –Load Balancing Controller –Query Admission Control –Update Frequency Modulation USM Load Balancing Controller Queries Data Reject Failure Deadline Missed Failure Success Data Stale Failure UNIT +/- queries Admission Control Frequency Modulation Updates Statistics +/- updates

ADMT Lab, Department of Computer Science, University of Pittsburgh 12 Gain Load Balancing Controller Success Gain Failure Cost Gain Rejection Cost Deadline Missed Cost Data Stale Cost Increase # of queries Decrease # of updates Decrease # of queries Increase # of updates + - Rejection Cost Data Stale Cost Deadline Missed Cost 0

ADMT Lab, Department of Computer Science, University of Pittsburgh 13 Query Admission Control Transaction deadline check –Will query meet its deadline with the current system workload? System USM check –Will query jeopardize the system USM if admitted? time q1q2q3q5 q6q7 q4 q5-7 deadlines Current time q6q7 q4 deadline

ADMT Lab, Department of Computer Science, University of Pittsburgh 14 Query Admission Control (cont.) Use C flex to Increase/Decrease # of queries –Decrease C flex to increase queries admitted –Increase C flex to decrease queries admitted time q1q2q3 q5-7 deadlines Current time q6q7q5 q4 deadline q1q2q3q1q2q3 smaller C flex larger C flex C flex

ADMT Lab, Department of Computer Science, University of Pittsburgh 15 Update Frequency Modulation Decrease # of Updates –Ticket Value (TV) for each active data item. –Updates increase TV; Queries decrease TV. –Higher TV  higher probability to be degraded. –Lottery Scheme [Waldspurger 95] to pick data items to drop updates. Increase # of Updates –Randomly pick a degraded data item. –Restore all its updates. D1D2 D3 U1Q3U1 D1 is picked to reduce its updates!

ADMT Lab, Department of Computer Science, University of Pittsburgh 16 Outline Motivating Example Performance Metric: User Satisfaction System Overview & Algorithms Experiments Related Work Conclusions

ADMT Lab, Department of Computer Science, University of Pittsburgh 17 Algorithms Compared IMU –I–Immediate Update, no admission control, 100% freshness ODU –O–On-demand Update, no admission control, 100% freshness QMF: [Kang,TKDE’04] –I–Immediate update, admission control, no weights among rejection, timeliness and freshness requirements are considered. UNIT –i–is what U need

ADMT Lab, Department of Computer Science, University of Pittsburgh 18 Experiment Design We want to evaluate the following: 1.Effectiveness of the update frequency modulation, 2.Performance under the naïve USM setting (= Success Ratio), 3.Performance under various USM settings, 4.Distribution of four query outcomes under various USM settings.

ADMT Lab, Department of Computer Science, University of Pittsburgh 19 Experimental Setup Query trace –based on HP disk cello99a access traces (1069 hours, 110,035 reads). Relative deadline generated from query exec time qt –uniformly distributed from avg(qt) to 10 * max(qt)). Freshness requirement for all queries is set to 90%. Update traces update traces workloadCorrelation to queries low-uniflow (6144, 15%) Uniform low-posPositive low-negNegative med-unifmed (30000, 75%) Uniform med-posPositive med-negNegative high-unifhigh (61440, 150%) Uniform high-posPositive high-negNegative

ADMT Lab, Department of Computer Science, University of Pittsburgh Update Frequency Modulation Evaluation Query Distributions on DataUpdate Distributions on Data (med-unif) few queries Updates can be removed without hurting query freshness.

ADMT Lab, Department of Computer Science, University of Pittsburgh Update Frequency Modulation Evaluation (cont.) Query Distributions on DataUpdate Distributions on Data (med-neg) few queries A very small portion of updates are needed to keep queries freshness high.

ADMT Lab, Department of Computer Science, University of Pittsburgh 22 UNIT has the least performance drop when workload increases. 2. Naïve USM = Success Ratio (gain = 1, penalties = 0) positive correlationnegative correlation

ADMT Lab, Department of Computer Science, University of Pittsburgh USM (gain = 1, penalties ≠ 0) Case 1 - Gain dominates: penalties = 0.1 or 0.5 Case 2 - Penalty dominates: penalties = 1 or 5 UNIT has the least penalties. UNIT has the highest gain.

ADMT Lab, Department of Computer Science, University of Pittsburgh Query outcome distributions UNIT obtains higher success ratio than others because it keeps queries from falling into the categories that have higher penalties. Percentage of queries that are rejected (R), failed to meet deadlines (D), failed to meet freshness (F), or succeed (S). Other Algorithms UNIT under different USM settings

ADMT Lab, Department of Computer Science, University of Pittsburgh 25 Related work Web-databases –[Luo et al. Sigmod 02] –[Datta et al. Sigmod 02] –[Challenger et al. Infocom 00] –[Labrinidis et al. VLDBJ 04] –… Real time databases –[Adelberg et al., Sigmod 95] –[Kang et al., TDKE 04] –… Stream Processing –[Tatbul et al., VLDB 03] –[Das et al., Sigmod 03] –[Ding et al., CIKM 04] –[Babcock et al., ICDE 04] –[Sharaf et al., WebDB 05] –…

ADMT Lab, Department of Computer Science, University of Pittsburgh 26 Outline Motivating Example Performance metric: User Satisfaction System overview & algorithms Experiments Related work Conclusions

ADMT Lab, Department of Computer Science, University of Pittsburgh 27 Conclusions We proposed –a unified User Satisfaction Metric (USM) for web-database systems, –a feedback control system, UNIT, to control the query and update workload in order to maximize system USM, and –two algorithms that perform query admission control and update frequency modulation to balance the query and update workload. We finally showed with extensive simulation study based on real data that UNIT outperforms two baseline algorithms and the current state of the art.

Thank you! Questions and Comments Huiming Qu

ADMT Lab, Department of Computer Science, University of Pittsburgh 29

ADMT Lab, Department of Computer Science, University of Pittsburgh 30

ADMT Lab, Department of Computer Science, University of Pittsburgh 31

ADMT Lab, Department of Computer Science, University of Pittsburgh 32 User Requirements Timeliness: Meeting deadlines Freshness: Meeting freshness requirements

ADMT Lab, Department of Computer Science, University of Pittsburgh 33 Performance Metrics Timeliness –response time Freshness –time-based (t) –divergence-based (50) –lag-based (2) –… Deficiency of the above traditional metrics is –Lack of semantic info (user preferences/requirements) from applications. U1:$300U3:$350 t U2:$310 Q1 returns with U1:$300

ADMT Lab, Department of Computer Science, University of Pittsburgh 34 Update Frequency Modulation Degrade Update –Each data item maintains a Degrading Ticket Value T j –Lottery Schemes [Waldspurger 95], higher ticket value means more probably to be degraded. –Query decrease T j by DT j, Update increase T j by IT j –If picked, it is degraded by 10%. Upgrade Update –randomly pick a degraded data item –Upgrade it by 50%

ADMT Lab, Department of Computer Science, University of Pittsburgh 35 UNIT outperforms others in all cases. Naïve USM