Data Management in Distributed Systems Minqi Zhou Software Engineering Institute Office: Room 111 Mathematics Building Phone:

Slides:



Advertisements
Similar presentations
Structured Superpeers: Leveraging Heterogeneity to Provide Constant-Time Lookup Alper Mizrak (Presenter) Yuchung Cheng Vineet Kumar Stefan Savage Department.
Advertisements

Distributed Hash Tables
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Kademlia: A Peer-to-peer Information System Based on the XOR Metric.
Xiaoli Zhang P-Grid: A self-organizing access structure for P2P information systems Karl Aberer Department of Communication Systems Swiss Federal Institute.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
Massively Distributed Database Systems Distributed Hash Spring 2014 Ki-Joune Li Pusan National University.
*Towards A Common API for Structured Peer-to-Peer Overlays Frank Dabek, Ben Y. Zhao, Peter Druschel, John Kubiatowicz, Ion Stoica MIT, U. C. Berkeley,
Presented by Elisavet Kozyri. A distributed application architecture that partitions tasks or work loads between peers Main actions: Find the owner of.
Based on last years lecture notes, used by Juha Takkinen.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Peer-to-Peer Filesystems Tom Roeder CS sp.
Structure Overlay Networks and Chord Presentation by Todd Gardner Figures from: Ion Stoica, Robert Morris, David Liben- Nowell, David R. Karger, M. Frans.
Secure Overlay Services Adam Hathcock Information Assurance Lab Auburn University.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Wide-area cooperative storage with CFS
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
3/30/2005 Auburn University Information Assurance Lab 1 Simulating Secure Overlay Services.
Structured P2P Network Group14: Qiwei Zhang; Shi Yan; Dawei Ouyang; Boyu Sun.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.
Security Considerations for Structured p2p Peng Wang 6/04/2003.
Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / st October, 2007.
1 New Peer to Peer Systems University of California, Irvine Presented By : Ala Khalifeh Estimated Time:15 Minutes (Note: Presented)
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
CONTENT ADDRESSABLE NETWORK Sylvia Ratsanamy, Mark Handley Paul Francis, Richard Karp Scott Shenker.
Wide-area cooperative storage with CFS Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
Jan 31, 2001CSCI {4,6}900: Ubiquitous Computing1 Recap. Ubiquitous Computing Vision –The Computer for the Twenty-First Century, Mark Weiser –The Coming.
Peer-to-Peer Filesystems Slides originally created by Tom Roeder.
1 Distributed Systems: an Introduction G53ACC Chris Greenhalgh.
Information-Centric Networks07a-1 Week 7 / Paper 1 Internet Indirection Infrastructure –Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / M th September, 2008.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.
Introduction. Readings r Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 m Note: All figures from this book.
Peer-to-Peer Supported Cache System for File Transfer Joonbok Lee
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Kademlia: A Peer-to-peer Information System Based on the XOR Metric
Squirrel: A decentralized peer-to- peer web cache Paper by Sitaram Iyer, Antony Rowstron and Peter Druschel (© 2002) Presentation* by Alexander Prohaska.
Server HW CSIS 4490 n-Tier Client/Server Dr. Hoganson Server Hardware Mission-critical –High reliability –redundancy Massive storage (disk) –RAID for redundancy.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Topics for iWORK 2005(st)
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
1 Distributed Hash Tables and Structured P2P Systems Ningfang Mi September 27, 2004.
Distributed Systems and Security: An Introduction Brad Karp and Steve Hailes UCL Computer Science CS Z03 / nd October, 2006.
Performance Evaluation When: Wed. 1:20am~4:20pm Where: Room 107 Instructor: 周承復 –Office hours: by appointment – –
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Peer-to-Peer File Sharing Systems Group Meeting Speaker: Dr. Xiaowen Chu April 2, 2004 Centre for E-transformation Research Department of Computer Science.
CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.
Brocade: Landmark Routing on Overlay Networks
Simple Load Balancing for Distributed Hash tables
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
Magdalena Balazinska, Hari Balakrishnan, and David Karger
CHAPTER 3 Architectures for Distributed Systems
Advanced Operating Systems
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service
Distributed Hash Tables
Introduction To Distributed Systems
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Consistent Hashing and Distributed Hash Table
P2P: Distributed Hash Tables
Presentation transcript:

Data Management in Distributed Systems Minqi Zhou Software Engineering Institute Office: Room 111 Mathematics Building Phone:

Course Introduction Data Management in P2P Systems – 1-4 weeks Data Management in Cloud Systems – 5-10 weeks Computational Advertisement – weeks

Final Grades Usual Grades (60%) – Attendance – Presentation Final Report (40%), (English Preferred) – Survey – Paper

A Brief Introduction to Distributed Systems

5 What Is a Distributed System? Multiple computers (“machines,” “hosts,” “boxes,” &c.) – Each with CPU, memory, disk, network interface – Interconnected by LAN or WAN (e.g., Internet) Application runs across this dispersed collection of networked hardware But user sees single, unified system

6 What Is a Distributed System? (Alternate Take) “A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” – Leslie Lamport, Microsoft Research (ex DEC)

7 Start Simple: Centralized System Suppose you run Gmail Workload: – Inbound arrives; store on disk – Users retrieve, delete their You run Gmail on one server with disk Gmail Server (PC) Sender Sender Sender Reader Reader Reader What are shortcomings of this design?

8 Why Distribute? For Availability Suppose Gmail server goes down, or network between client and it goes down No incoming mail delivered, no users can read their inboxes Fix: replicate the data on several servers – Increased chance some server will be reachable – Consistency? One server down when delete message, then comes back up; message returns in inbox – Latency? Replicas should be far apart, so they fail independently – Partition resilience? e.g., airline seat database splits, one seat remains, bought twice, once in each half!

9 Why Distribute? For Scalable Capacity What if Gmail a huge success? Workload exceeds capacity of one server Fix: spread users across several servers – Best case: linear scaling—if U users per box, N boxes support NU users – Bottlenecks? If each user’s inbox on one server, how to route inbound mail to right server? – Scaling? How close to linear? – Load balance? Some users get more mail than others!

10 Performance Can Be Subtle Goal: predictable performance under high load 2 employees run a Starbucks – Employee 1: takes orders from customers, calls them out to Employee 2 – Employee 2: writes down drink orders (5 seconds per order) makes drinks (10 seconds per order) What is throughput under increasing load?

11 Starbucks Throughput Peak system performance: 4 drinks / min What happens when load > 4 orders / min? What happens to efficiency as load increases? What would preferable curve be? What design achieves that goal?

12 Why Are Distributed Systems Hard to Design? Failure: of hosts, of network – Remember Lamport’s lament Heterogeneity – Hosts may have different data representations Need consistency (many specific definitions) – Users expect familiar “centralized” behavior Need concurrency for performance – Avoid waiting synchronously, leaving resources idle – Overlap requests concurrently whenever possible

References Books – Legitimate applications of peer-to-peer networks 。 Dinesh Verma 。 Wiley-IEEE, 2004 – Cloud Computing: Web-Based Applications That Change the Way You Work and Collaborate Online 。 Michael Miller , Que, 2008 。 – F. von Lohmann, “P2P File ShDavid P. Anderson and John Kubiatowicz, The Worldwide Computer, Scientific American, March 2002 Papers – Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan, “Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications”, Proceedings of ACM SIGCOMM’01, San Diego, CA, August – Bujor Silaghi, Bobby Bhattacharjee, Pete Keleher, “Query Routing in the TerraDir Distributed Directory”, Proceedings of SPIE ITCOM, Boston, MA, July – Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker, “A Scalable Content-Addressable Network”, Proceedings of ACM SIGCOMM’01, San Diego, CA, August 2001.

– OceanStore: An Architecture for Global-Scale Persistent Storage, John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, and Ben Zhao. Appears in Proceedings of the Ninth international Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2000), November 2000 – W. J. Bolosky, J. R. Douceur, D. Ely, M. Theimer; Feasibility of a Serverless Distributed File System Deployed on an Existing Set of Desktop PCs, Proceedings of the international conference on Measurement and modeling of computer systems, 2000, pp – J. Kleinberg, The Small-World Phenomenon: An Algorithmic Perspective, Proc. 32nd ACM Symposium on Theory of Computing, Portland, OR, May, 2000 – R. Albert, H. Joeong, A. Barabasi, Error and Attack Tolerance of Complex Networks, Nature, vol. 46, July – H. Zhang, A. Goel, R. Govindan, Using the Small-World Model to Improve Freenet Performance, Proceedings of IEEE Infocom, New York, NY, June – J. Chu, K. Labonte, B. Levine, Availability and Locality Measurements of Peer-to- Peer File Systems, Proceedings of SPIE ITCOM, Boston, MA, July – R. Bhagwan, S. Savage, G. Voelker, Understanding Availability, in Proc. 2nd International Workshop on Peer-to-Peer Systems (IPTPS), Berkeley, CA, Feb – S. Saroiu, P. Gummadi, S. Gribble, A Measurement Study of Peer-to-Peer File Sharing Systems, in Proceedings of Multimedia Computing and Networking 2002 (MMCN'02), San Jose, CA, January – aring and Copyright Law: A Primer for Developers,” IPTPS 2003

– Antony Rowstron and Peter Druschel, “Pastry: Scalable, Decentralized, Object Location and Routing for Large-scale Peer-to-peer Systems”, Proceedings of IFIP/ACM International Conference on Distributed Systems Platforms (Middelware)’02 – Ben Y. Zhao, John Kubiatowicz, Anthony Joseph, “Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing”, Technical Report, UC Berkeley – A. Rowstron and P. Druschel, "Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility", 18th ACM SOSP'01, Lake Louise, Alberta, Canada, October – S. Iyer, A. Rowstron and P. Druschel, "SQUIRREL: A decentralized, peer- to-peer web cache", appeared in Principles of Distributed Computing (PODC 2002), Monterey, CA – Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica, Wide-area cooperative storage with CFS, ACM SOSP 2001, Banff, October 2001 – Ion Stoica, Daniel Adkins, Shelley Zhaung, Scott Shenker, and Sonesh Surana, Internet Indirection Infrastructure, in Proceedings of ACM SIGCOMM'02, Pittsburgh, PA, August 2002, pp – L. Garces-Erce, E. Biersack, P. Felber, K.W. Ross, G. Urvoy-Keller, Hierarchical Peer-to-Peer Systems, 2003, – Kangasharju, K.W. Ross, D. Turner, Adaptive Content Management in Structured P2P Communities, 2002,

– K.W. Ross, E. Biersack, P. Felber, L. Garces-Erce, G. Urvoy-Keller, TOPLUS: Topology Centric Lookup Service, 2002, – P. Felber, E. Biersack, L. Garces-Erce, K.W. Ross, G. Urvoy-Keller, Data Indexing and Querying in P2P DHT Networks, – K.W. Ross, Hash-Routing for Collections of Shared Web Caches, IEEE Network Magazine, Nov-Dec 1997 – A. Keromytis, V. Misra, D. Rubenstein, SOS: Secure Overlay Services, in Proceedings of ACM SIGCOMM'02, Pittsburgh, PA, August 2002 – M. Reed, P. P. Syverson, D. Goldschlag, Anonymous Connections and Onion Routing, IEEE Journal on Selected Areas of Communications, Volume 16, No. 4, – V. Scarlata, B. Levine, C. Shields, Responder Anonymity and Anonymous Peer-to-Peer File Sharing, in Proc. IEEE Intl. Conference on Network Protocols (ICNP), Riverside, CA, November – E. Sit, R. Morris, Security Considerations for Peer-to-Peer Distributed Hash Tables, in Proc. 1st International Workshop on Peer-to-Peer Systems (IPTPS), Cambridge, MA, March – J. Saia, A. Fiat, S. Gribble, A. Karlin, S. Sariou, Dynamically Fault- Tolerant Content Addressable Networks, in Proc. 1st International Workshop on Peer-to-Peer Systems (IPTPS), Cambridge, MA, March 2002.

– M. Castro, P. Druschel, A. Ganesh, A. Rowstron, D. Wallach, Secure Routing for Structured Peer-to-Peer Overlay Netwirks, In Proceedings of the Fifth Symposium on Operating Systems Design and Implementation (OSDI'02), Boston, MA, December – Edith Cohen and Scott Shenker, “Replication Strategies in Unstructured Peer-to-Peer Networks”, in Proceedings of ACM SIGCOMM'02, Pittsburgh, PA, August 2002 – Dan Rubenstein and Sambit Sahu, “An Analysis of a Simple P2P Protocol for Flash Crowd Document Retrieval”, Columbia University Technical Report