DEXA 2005 Quality-Aware Replication of Multimedia Data Yicheng Tu, Jingfeng Yan and Sunil Prabhakar Department of Computer Sciences, Purdue University.

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

Lindsey Bleimes Charlie Garrod Adam Meyerson
G5BAIM Artificial Intelligence Methods
Hadi Goudarzi and Massoud Pedram
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Chapter 6: Memory Management
Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?
Greedy Algorithms Clayton Andrews 2/26/08. What is an algorithm? “An algorithm is any well-defined computational procedure that takes some value, or set.
Bidding Protocols for Deploying Mobile Sensors Reporter: Po-Chung Shih Computer Science and Information Engineering Department Fu-Jen Catholic University.
IBM Labs in Haifa © 2005 IBM Corporation Adaptive Application of SAT Solving Techniques Ohad Shacham and Karen Yorav Presented by Sharon Barner.
Gossip Scheduling for Periodic Streams in Ad-hoc WSNs Ercan Ucan, Nathanael Thompson, Indranil Gupta Department of Computer Science University of Illinois.
Artificial Intelligence in Game Design Introduction to Learning.
Planning under Uncertainty
CPSC 322, Lecture 9Slide 1 Search: Advanced Topics Computer Science cpsc322, Lecture 9 (Textbook Chpt 3.6) January, 23, 2009.
Small-world Overlay P2P Network
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Introduction to Analysis of Algorithms
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Institute of Computer Science University of Wroclaw Page Migration in Dynamic Networks Marcin Bieńkowski Joint work with: Jarek Byrka (Centrum voor Wiskunde.
Ensemble Learning: An Introduction
Document and Query Forms Chapter 2. 2 Document & Query Forms Q 1. What is a document? A document is a stored data record in any form A document is a stored.
D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.
Simultaneous Rate and Power Control in Multirate Multimedia CDMA Systems By: Sunil Kandukuri and Stephen Boyd.
Clustering of Web Content for Efficient Replication Yan Chen, Lili Qiu, Wei Chen, Luan Nguyen and Randy H. Katz {yanchen, wychen, luann,
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
Birch: An efficient data clustering method for very large databases
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Package Transportation Scheduling Albert Lee Robert Z. Lee.
A Randomized Approach to Robot Path Planning Based on Lazy Evaluation Robert Bohlin, Lydia E. Kavraki (2001) Presented by: Robbie Paolini.
LINEAR PROGRAMMING SIMPLEX METHOD.
Called as the Interval Scheduling Problem. A simpler version of a class of scheduling problems. – Can add weights. – Can add multiple resources – Can ask.
Vilalta&Eick: Informed Search Informed Search and Exploration Search Strategies Heuristic Functions Local Search Algorithms Vilalta&Eick: Informed Search.
A Theoretical Study of Optimization Techniques Used in Registration Area Based Location Management: Models and Online Algorithms Sandeep K. S. Gupta Goran.
QoS-Aware In-Network Processing for Mission-Critical Wireless Cyber-Physical Systems Qiao Xiang Advisor: Hongwei Zhang Department of Computer Science Wayne.
Network Aware Resource Allocation in Distributed Clouds.
1 Chapter 1 Analysis Basics. 2 Chapter Outline What is analysis? What to count and consider Mathematical background Rates of growth Tournament method.
Final Exam, May 25, 2007 Quality Management in Multimedia Databases and Data Stream Management Systems Yicheng Tu Department of Computer Sciences Purdue.
Efficient and Scalable Computation of the Energy and Makespan Pareto Front for Heterogeneous Computing Systems Kyle M. Tarplee 1, Ryan Friese 1, Anthony.
Scheduling policies for real- time embedded systems.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Heuristic Optimization Methods Greedy algorithms, Approximation algorithms, and GRASP.
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:
Kanpur Genetic Algorithms Laboratory IIT Kanpur 25, July 2006 (11:00 AM) Multi-Objective Dynamic Optimization using Evolutionary Algorithms by Udaya Bhaskara.
Algorithm Design Methods (II) Fall 2003 CSE, POSTECH.
Data Structure and Algorithms. Algorithms: efficiency and complexity Recursion Reading Algorithms.
Algorithm Analysis Part of slides are borrowed from UST.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
Reliable Multicast Routing for Software-Defined Networks.
1 CSC 421: Algorithm Design & Analysis Spring 2014 Complexity & lower bounds  brute force  decision trees  adversary arguments  problem reduction.
Minimizing Delay in Shared Pipelines Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) Yoram Revah, Aviran Kadosh.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems, Pages Tianyi Wang, Gang Quan, Shangping.
OR Chapter 4. How fast is the simplex method  Efficiency of an algorithm : measured by running time (number of unit operations) with respect to.
Distributed Control and Autonomous Systems Lab. Sang-Hyuk Yun and Hyo-Sung Ahn Distributed Control and Autonomous Systems Laboratory (DCASL ) Department.
Management of Broadband Media Assets on Wide Area Networks Lars-Olof Burchard.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Dense-Region Based Compact Data Cube
Memory Management.
Algorithm Analysis CSE 2011 Winter September 2018.
Data Replication in the Quality Space
Objective of This Course
Analysis & Design of Algorithms (CSCE 321)
QuaSAQ: Enabling End-to-End QoS for Distributed Multimedia Databases
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

DEXA 2005 Quality-Aware Replication of Multimedia Data Yicheng Tu, Jingfeng Yan and Sunil Prabhakar Department of Computer Sciences, Purdue University

DEXA 2005 Roadmap Introduction Static data replication Dynamic data replication Experimental (simulation) results Summary

DEXA 2005 Data Replication The problem: given a data item and its popularity, determine how many replicas to put For read/write data, where to put Destination: node(s) in a distributed environment Replicas are identical copies of the original data

DEXA 2005 Quality-Aware Replication Replicas are of different “quality” Destination: point(s) in a metric quality space Costs of transformation among different qualities are very high Applications –Multimedia –Materialized view –Biological structure Good news: read-only Bad news: too much storage needed

DEXA 2005 Delivery of Multimedia Data Quality (QoS) critical –Temporal/spatial resolution –Color –Format Varieties of user quality requirements –Determined by user preference and resource availability –Large number of quality combinations Adaptation techniques to satisfy quality needs –Dynamic adaptation: online transcoding –Static adaptation: retrieve precoded replica from disk

DEXA 2005 Dynamic adaptation Transcoding is very expensive in terms of CPU cost Online transcoding is not feasible in most cases Situation may improve in the future Layered coding –Not standardized yet. –Less popular than people expected

DEXA 2005 Static adaptation Little CPU cost Choice of many commercial service providers What about storage cost? –On the order of total number of quality points –Ignored in previous research assuming Very few quality profiles Storage is dirt cheap –Excessively high for service providers

The fixed-storage replica selection (FSRS) Problem An optimization: get the highest utility given the popularity ( f k ), storage cost ( s k ) of all quality points under total storage S –u(j,k): the utility when a request on quality j is served by replica of quality k Utility is given as a function of distance in quality space –Requests served by the closest replica

DEXA 2005 Roadmap Introduction Static data replication Dynamic data replication Experimental (simulation) results Summary

DEXA 2005 The FSRS Algorithms (I) Problem is NP-hard: a variation of the k-mean proble We propose a heuristic algorithm named Greedy –Aggresively selects replicas based on the ratio of marginal utility gain ( ∆u ) to cost ( s k ) –Time complexity: where I is the # of replicas selected and m the total # of possible replicas selected replica set P := Φ available storage s’ := S while s’ > 0 add the quality point that yields the largest ∆u/s k value to P decrease s’ by s k return P

DEXA 2005 The FSRS Algorithms (II) Greedy could pick some bad replicas, especially the earlier selections Remedy: remove those bad choices and re-select The Iterative Greedy algorithm: Time complexity: same as Greedy with a larger coefficient P ← a solution given by Greedy while there exists solution P’ s.t. U(P’) > U(P) do P ← P’ return P

DEXA 2005 Handling multiple media objects There are V ( V > 1) media objects in the database, each with its own quality space and FSRS solution However, the storage constraint S is global Both Greedy and Iterative Greedy can be easily extended to solve FSRS for multiple media objects The trick: view the V physical media objects as replicas of a virtual object Model the difference in the content of the V objects as values in a new quality dimension. Time complexity:, can be reduced to with some tweaks

DEXA 2005 Roadmap Introduction Static data replication Dynamic data replication Experimental (simulation) results Summary

DEXA 2005 Dynamic replication Popularity f of replicas could change over time We only consider the situation where popularity of all replicas of a media object changes together –Reasonable assumption in many systems –Problem becomes competition for storage among media objects –Study of the more general case is underway Desirable dynamic replication algorithms : –Find solutions as optimal as those by static FSRS algorithms –Fast enough to make online decisions Na ï ve solution: run Greedy every time a change of f occurs

DEXA 2005 Replication Roadmap (RR) Consider the order replicas are selected by Greedy – follow a predefined path (RR) for each media object RRs are all convex Exchanges of storage may happen between two media objects, triggered by the increase/decrease of f –The one that becomes more popular takes storage from the least popular one –The one that becomes less popular gives up storage to the most popular one –It is efficient to make exchanges at the frontiers of the RRs, no need to look inside

DEXA 2005 Replication Roadmap (continued) Storage exchanges, example: Media A should take storage from media B as the slope of its current segment in RR is greater than that of B’s

DEXA 2005 Dynamic FSRS algorithm Based on the RR idea Proved performance: results given are as optimal as those chosen by Greedy Preprocess phase: –Build the RRs Online phase: –Performing exchanges till total utility converges –Time complexity: O(I log V) where I: # of storage exchanges occurs and V is the # of media objects

DEXA 2005 Roadmap Introduction Static data replication Dynamic data replication Experimental (simulation) results Summary

DEXA 2005 Effectiveness of algorithms For comparison : –The optimal solution (by CPLEX) –Random selections –Local popularity-based

DEXA 2005 Efficiency of algorithms CPLEX < Iterative Greedy < Greedy < Random < Local Results on a P4 2.4 GHz CPU:

DEXA 2005 Dynamic replication Randomly generated changes of f Compare with Greedy Results with (almost) the same optimality as Greedy Reason: small number of storage exchanges

DEXA 2005 Summary Storage cost in static adaptation prohibits replication of all qualities Need to optimize toward the highest utility given storage constraints Two heuristics are proposed for static replication that gives near-optimal choices Fast online algorithm for one dynamic replication problem Unsolved puzzles: –General case of dynamic replication –Is there a bound for the performance of Greedy?

Backup Slide Storage for replication Empirical formula to calculate storage after transcoding to a lower quality in one dimension: Sum of all replicas when there are n qualities Three dimensions:, total storage is thus O ( n ^ 3 ) For d dimensions, O ( n ^ d )

Backup Slide An illustration: Greedy

Backup Slide An illustration: Iterative Greedy

Backup Slide More experimental results Selection of replicas by Greedy, 21 X 21 2-D quality space with larger number representing lower quality (i.e., point ( 20,20 ) is of the lowest quality), V = 30 Same inputs, results given by Iterative Greedy