Locality Optimizations in OceanStore Patrick R. Eaton Dennis Geels An introduction to introspective techniques for exploiting locality in wide area storage.

Slides:



Advertisements
Similar presentations
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Advertisements

Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Copyright Irwin/McGraw-Hill Data Modeling Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley.
Dynamic Bayesian Networks (DBNs)
Pond: the OceanStore Prototype CS 6464 Cornell University Presented by Yeounoh Chung.
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Extracting and Utilizing.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
Distributed Cluster Repair for OceanStore Irena Nadjakova and Arindam Chakrabarti Acknowledgements: Hakim Weatherspoon John Kubiatowicz.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Metamorphic Malware Research
Automatically Extracting and Verifying Design Patterns in Java Code James Norris Ruchika Agrawal Computer Science Department Stanford University {jcn,
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Chapter 14 The Second Component: The Database.
Recommender systems Ram Akella November 26 th 2008.
7/15/2015ROC/OceanStore Winter Retreat Introspective Replica Management in OceanStore Dennis Geels.
Data Cache Prefetching using a Global History Buffer Presented by: Chuck (Chengyan) Zhao Mar 30, 2004 Written by: - Kyle Nesbit - James Smith Department.
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Business Logic Abuse Detection in Cloud Computing Systems Grzegorz Kołaczek 1st International IBM Cloud Academy Conference Research Triangle Park, NC April.
By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.
GeoPKDD Geographic Privacy-aware Knowledge Discovery and Delivery Kick-off meeting Pisa, March 14, 2005.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
EVENT MANAGEMENT IN MULTIVARIATE STREAMING SENSOR DATA National and Kapodistrian University of Athens.
A Unified Modeling Framework for Distributed Resource Allocation of General Fork and Join Processing Networks in ACM SIGMETRICS
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Two Approaches to Calculating Correlated Reserve Indications Across Multiple Lines of Business Gerald Kirschner Classic Solutions Casualty Loss Reserve.
Querying Structured Text in an XML Database By Xuemei Luo.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Learning Phase at Head Ends 1 Edge Events Appliance Table Input Output by Naoki ref: M. Baranski and V. Jurgen (2004) by Josh Implemented in Java with.
Efficient P2P Searches Using Result-Caching From U. of Maryland. Presented by Lintao Liu 2/24/03.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
Chapter 6 Distributed File Systems Summary Bernard Chen 2007 CSc 8230.
Temporal-DHT and its Application in P2P-VoD Systems Abhishek Bhattacharya, Zhenyu Yang & Shiyun Zhang.
Optimizing Live Migration of Virtual Machines across Wide Area Networks using Integrated Replication and Scheduling Sumit Kumar Bose, Unisys Scott Brock,
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Module 10: Preparing to Monitor Server Performance.
Optimizing Live Migration of Virtual Machines across Wide Area Networks using Integrated Replication and Scheduling Sumit Kumar Bose, Unisys Scott Brock,
Session 1 Module 1: Introduction to Data Integrity
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
CS411 Database Systems Kazuhiro Minami 10: Indexing-1.
Attribute Allocation in Large Scale Sensor Networks Ratnabali Biswas, Kaushik Chowdhury, and Dharma P. Agrawal International Workshop on Data Management.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
OceanStore : An Architecture for Global-Scale Persistent Storage Jaewoo Kim, Youngho Yi, Minsik Cho.
Talal H. Noor, Quan Z. Sheng, Lina Yao,
Client/Server Databases and the Oracle 10g Relational Database
Updating SF-Tree Speaker: Ho Wai Shing.
Statistical Data Analysis
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Accessing nearby copies of replicated objects
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Predictive Performance
Database Management System
Presentation transcript:

Locality Optimizations in OceanStore Patrick R. Eaton Dennis Geels An introduction to introspective techniques for exploiting locality in wide area storage utilities.

Agenda OceanStore Review Problem Overview Previous Work Proposed Solution Prefetching Algorithm Preliminary Results Future Work

OceanStore Review Properties of OceanStore relevant to introspective locality optimizations –implemented in the extremely wide area –has many places to put any single piece of data –cannot rely on users to make relationships among data explicit –depends on effective locality optimizations for improved performance –No possible way to solve exactly

Problem Overview Passively observe data accesses –data shared among multiple users –single users accessing the network from different physical locations –data is replicated across the network Optimize the location of data to provide quicker access to users –cluster semantically related data –replicate data to move it closer to consumers –migrate primary replicas toward the source of updates

Measurable Attributes File Temperature –A measure that indicates the frequency of access to the file –A hot file is frequently accessed Semantic Distance (Kuenning) –Any measure that can quantify relationships between files on the range [0,  ) –Local distance relates one instance of a file access to another –Reference distance is an aggregate measure that summarizes all local distances for a pair of files –Typical measures use access order or timing information

Prefetching Techniques Automatic Prefetching (Griffoen and Appleton) –construct a probability graph that records accesses which follow within a lookahead period –predict a prefetch when the chance of an access is above a tunable parameter Context Modeling (Kroeger and Long) –record in a trie all access sequences which have been observed –maintain pointers to all nodes which represent current contexts –predict a prefetch when the chance of an access to a child of a current context is above a probability threshold

Our Approach Exploit the ideas of semantic distance to compute relationships among data –Cluster data based on the observed relationships –Store a summary of these relationships with the data Migrate (prefetch) files based on familiar patterns in the access stream –recognize higher order correlations as in context modeling –tolerate noise in the access stream

Motivation for Prefetching Algorithm A B Y Z K C A B Other patterns can only be detected through identification and filtering of noise. Many patterns can be predicted only by observation of higher-order correlation--combining several pieces of past history.

General Prefetching Algorithm Update –Record the most recent file accesses in the file history buffer (FHB) –Each time a new file S is accessed, extract all triples of the form (FHB(i), FHB(j))  S from the FHB and update in the second-order distance table Predict –Each time a new file S is accessed, examine the distance table entries of (FHB(i), S) –Prefetch files that are predicted with confidence above a certain threshold Problems –O(k 2 ) work to update distance table –Noise infects distance table FHB y B w g o F w K Distance Table (B,F)w K (y,B)w g o F K (o,w)K

Optimizations to the Prefetching Algorithm First-order distance table –Records files that are close, as measured by semantic distance –Allows reverse lookup Use first-order distance tables to filter out irrelevant file relationships –Update only relevant entries in the second- order distance table –Search for predictions based on only relevant access pairs Indicative FHB’s y B w g w K p e Distance Table yB w g t Bw g t o wg K t o y B w t o F w K y B w g o F K h

Prefetching Algorithm Example FHB y Q t u v R w x S 1 st Order Table Qa b R c d Rb S g h t St d e R v Update –Extract relevant triples by intersecting the FHB with the results from the reverse lookup in first-order tables 2 nd Order Table S a b d f (Q,w)b t (Q,v)t d e (Q,R) Find parents of S Predict –Extract relevant doubles by intersecting the FHB with the results from the reverse lookup in the first-order tables –Prefetch if the second-order table predicts a future access with sufficient confidence Find parents of R Update table t x b y t Q u v R Find parents of R Check table for prediction

Preliminary Results (Local System)

Future Work Retarget the simulations to model OceanStore Continue to refine the prefetching algorithm Examine the potential of higher order prefetching Combine prefetching and clustering Look for opportunities to test the ideas on different workloads