Two-Tier Architecture of OSD Metadata Management Xianbo Zhang, Keqiang Wu 11/11/2002.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
VERITAS Confidential Disaster Recovery – Beyond Backup Jason Phippen – Director Product and Solutions Marketing, EMEA.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Goal: To build a ubiquitous and robust storage infrastructure Requirement: Scalability, availability, performance, robustness Solution: Dynamic object.
2/18/2004 Challenges in Building Internet Services February 18, 2004.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
OSD Metadata Management
SERVER LOAD BALANCING Presented By : Priya Palanivelu.
OSD: Storage Substrate for the Enterprise and … the Grid Feng Wang Department of Computer Science University of Minnesota.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Data Sharing in OSD Environment Dingshan He September 30, 2002.
Object Naming & Content based Object Search 2/3/2003.
Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
.NET Mobile Application Development Introduction to Mobile and Distributed Applications.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 1: Introduction to Windows Server 2003.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
Module 12: Designing High Availability in Windows Server ® 2008.
GeoGrid: A scalable Location Service Network Authors: J.Zhang, G.Zhang, L.Liu Georgia Institute of Technology presented by Olga Weiss Com S 587x, Fall.
Distributed Systems Principles and Paradigms
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Module 11: Implementing ISA Server 2004 Enterprise Edition.
Distributed File System By Manshu Zhang. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
Storage Management and Caching in PAST A Large-scale persistent peer-to-peer storage utility Presented by Albert Tannous CSE 598D: Storage Systems – Dr.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Introducing Network Design Concepts Designing and Supporting Computer Networks.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.
Distributed Computing Systems CSCI 4780/6780. Distributed System A distributed system is: A collection of independent computers that appears to its users.
E.Soundararajan R.Baskaran & M.Sai Baba Indira Gandhi Centre for Atomic Research, Kalpakkam.
Serverless Network File Systems Overview by Joseph Thompson.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Configuring File Services. Using the Distributed File System Larger enterprises typically use more file servers Used to improve network performce Reduce.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
OSIsoft High Availability PI Replication
Chap 7: Consistency and Replication
Peer to Peer Network Design Discovery and Routing algorithms
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Backing Up and Restoring Databases by Using the SQL Server 2000.
Chapter 2 Database Environment.
Enterprise Computing with Jini Technology Mark Stang and Stephen Whinston Jan / Feb 2001, IT Pro presented by Alex Kotchnev.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 6: Planning, Configuring, And Troubleshooting WINS.
Malugo – a scalable peer-to-peer storage system..
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
PERFORMANCE MANAGEMENT IMPROVING PERFORMANCE TECHNIQUES Network management system 1.
CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.
Databases and DBMSs Todd S. Bacastow January 2005.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 6: Planning, Configuring, And Troubleshooting WINS.
Distributed Hash Tables
Understanding System Characteristics of Online Erasure Coding on Scalable, Distributed and Large-Scale SSD Array Systems Sungjoon Koh, Jie Zhang, Miryeong.
IS4680 Security Auditing for Compliance
Specialized Cloud Architectures
Distributed File Systems
Distributed Systems and Concurrency: Distributed Systems
Presentation transcript:

Two-Tier Architecture of OSD Metadata Management Xianbo Zhang, Keqiang Wu 11/11/2002

Region Balance lookup cost & location information update cost Data Locality: –Make the metadata server close to the user –Make user data close to the user Enterprise: campuses geographically distributed, a lot of data traffic happens within the campus. Each campus may have special security, policy control, performance requirements

P2P among Regions Good Scalability Resilience: each region has relatively independent metadata, functionality, resilient to network failure such as WAN connection loss Availability: Any server can take the responsibility of the down server

Region Self-organization Configure server to recognize potential OSD devices, clients. Device/client via configuration, neighbor or broadcasting to join the region. Region splitting/merging: operation happens based on performance requirements (e.g. throughput, response time, …). Size of region has several effects.

Server Main Responsibilities Naming: mapping URN to object GUID, can be done using mechanism similar to DNS Location: mapping GUID to region where storing objects. P2P systems such as CAN, Pastry may be modified/adopted here. Object migration and replication (based on access monitoring) –Done by active object (possible) –Done by metadata servers –Concurrent accesses

Mapping ObjectGUID –ObjectGUID=RegionID+ObjectID –ObjectGUID independent from Reg Primary copy of object will be stored in object owner’s region ObjectGUID randomly generated, possible embedded with user info RegionIDs=Hash1(ObjectGUID) –Object may be highly demanded in other region, how to do migration/replication? Stored to OSD Device –How to map ObjectGUID to specific device? –Security (R/W permission), Load balance

Object Mapping Issues How to efficiently find Region based on RegionID How to locate nearby object copy based on objectGUID –How to define nearby copy # of hops? Server response latency, bandwidth? How to find a good location to migrate/replicate hot object (it’s possible actual users of the object is far away from the the mapped location) –R/W pattern –User geographic location

Fault Tolerance Server: three levels of failover support, that is, server local backup/mirroring, server remote backup/mirroring, for most important data, erasure code can be used among servers for data disaster recovery. –Regional backup server can also be used for load balance

Content based Searching Some mechanism similar to CAN Content search sent to region managers, and sent to OSD devices. Devices respond with objectGUIDs if they have the requested contents. Concept of Active Disk can be helpful.

How ObjectID Map to Region Explanation is detailed in OceanStore, Pastry, etc. Region ID