Khoj: A Highly Scalable and Available Search Harneet Singh, Avinaash Gupta and Krishna Gayatri Kuchimanchi System Overview Load BalancingFailure Detection.

Slides:



Advertisements
Similar presentations
Dynamo: Amazon’s Highly Available Key-value Store
Advertisements

Introduction to Data Center Computing Derek Murray October 2010.
CASSANDRA-A Decentralized Structured Storage System Presented By Sadhana Kuthuru.
Dynamo: Amazon’s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat.
Scalable Content-Addressable Network Lintao Liu
Precept 6 Hashing & Partitioning 1 Peng Sun. Server Load Balancing Balance load across servers Normal techniques: Round-robin? 2.
A Survey of Distributed Database Management Systems Brady Kyle CSC
1 Dynamo Amazon’s Highly Available Key-value Store Scott Dougan.
Coding for Atomic Shared Memory Emulation Viveck R. Cadambe (MIT) Joint with Prof. Nancy Lynch (MIT), Prof. Muriel Médard (MIT) and Dr. Peter Musial (EMC)
Scalable Content-aware Request Distribution in Cluster-based Network Servers Jianbin Wei 10/4/2001.
Distributed components
A Dependable Auction System: Architecture and an Implementation Framework
Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:
1/19 Presented by: Maedeh Tashakkorian Supervisor: Hadi Salimi Mazandaran University of Science and Technology February, 2011.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
2/25/2004 The Google Cluster Architecture February 25, 2004.
2/23/2004 Load Balancing February 23, /23/2004 Assignments Work on Registrar Assignment.
Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications Stoica et al. Presented by Tam Chantem March 30, 2007.
Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?
2/11/2004 Internet Services Overview February 11, 2004.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Rethinking Dynamo: Amazon’s Highly Available Key-value Store --An Offense Shih-Chi Chen Hongyu Gao.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
OverCite: A Cooperative Digital Research Library Jeremy Stribling, Isaac G. Councill, Jinyang Li, M. Frans Kaashoek, David Karger, Robert Morris, Scott.
Copyright © 2002 Wensong Zhang. Page 1 Free Software Symposium 2002 Linux Virtual Server: Linux Server Clusters for Scalable Network Services Wensong Zhang.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Amazon’s Dynamo System The material is taken from “Dynamo: Amazon’s Highly Available Key-value Store,” by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
ACMS: The Akamai Configuration Management System A. Sherman, P. H. Lisiecki, A. Berkheimer, and J. Wein Presented by Parya Moinzadeh.
Cloud Storage – A look at Amazon’s Dyanmo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed File Systems Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
UC Berkeley Scaleable Structured Datastorage for Web 2.0 Michael Armbrust, David Patterson October, 2007.
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Dynamo: Amazon’s Highly Available Key-value Store DeCandia, Hastorun, Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels PRESENTED.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Partitioning and Replication.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
Application Development
Md Tareq Adnan Centralized Approach : Server & Clients Slow content must traverse multiple backbones and long distances Unreliable.
Distributed databases A brief introduction with emphasis on NoSQL databases Distributed databases1.
Big Data Yuan Xue CS 292 Special topics on.
PERFORMANCE MANAGEMENT IMPROVING PERFORMANCE TECHNIQUES Network management system 1.
Fault – Tolerant Distributed Multimedia Streaming Web Application By Nirvan Sagar – Srishti Ganjoo – Syed Shahbaaz Safir
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Cluster-Based Scalable
Scaling Network Load Balancing Clusters
Slicer: Auto-Sharding for Datacenter Applications
TECHNOLOGY GUIDE THREE
Improving searches through community clustering of information
Network Load Balancing
Partitioning and Replication
Dynamo: Amazon’s Highly Available Key-value Store
TECHNOLOGY GUIDE THREE
Replication Middleware for Cloud Based Storage Service
Sajitha Naduvil-vadukootu
Massively Parallel Cloud Data Storage Systems
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
Database System Architectures
Network management system
TECHNOLOGY GUIDE THREE
Presentation transcript:

Khoj: A Highly Scalable and Available Search Harneet Singh, Avinaash Gupta and Krishna Gayatri Kuchimanchi System Overview Load BalancingFailure Detection Architecture Data Partitioning and Replication Fault ToleranceEvaluation References [1 Karger, D.; Sherman, A.; Berkheimer, A.; Bogstad, B.; Dhanidina, R.; Iwamoto, K.; Kim, B.; Matkins, L.; Yerushalmi, Y. (1999). Computer Networks 31 (11): 1203–1213. Web caching with consistent hashing. [2] Giuseppe DeCandia, et al Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October Dynamo: Amazon's Highly Available Key-Value Store. [3] Rajesh Nishtala, et al NSDI Scaling Memcache at Facebook. [4] Vivek Pai, Guarav Banga, ASPLOS-VIII. Locality-Aware Request Distribution. Backend server to multiple virtual nodes mapping Even partitioning of the data amongst servers Load Redistribution on addition/removal of a backend server Replication at N backend servers where N=3  High Availability Khoj is a distributed search engine which combines well known techniques to achieve high scalability and availability. Works on a locality aware request distribution infrastructure with multiple front end servers. The front-end server to serve a request is selected using round-robin scheduling. Front-end server uses two level consistent hash ring to determine the backend server that would serve the request. Coordinator server manages addition and removal of nodes. Inverted Indices sharded across the backend servers. Replication across backend servers to achieve fault tolerance and good availability.

Khoj: A Highly Scalable and Available Search Harneet Singh, Avinaash Gupta and Krishna Gayatri Kuchimanchi System Overview Load BalancingFailure Detection Architecture Data Partitioning and Replication Fault ToleranceEvaluation References [1 Karger, D.; Sherman, A.; Berkheimer, A.; Bogstad, B.; Dhanidina, R.; Iwamoto, K.; Kim, B.; Matkins, L.; Yerushalmi, Y. (1999). Computer Networks 31 (11): 1203–1213. Web caching with consistent hashing. [2] Giuseppe DeCandia, et al Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October Dynamo: Amazon's Highly Available Key-Value Store. [3] Rajesh Nishtala, et al NSDI Scaling Memcache at Facebook. [4] Vivek Pai, Guarav Banga, ASPLOS-VIII. Locality-Aware Request Distribution. Backend server to multiple virtual nodes mapping Even partitioning of the data amongst servers Load Redistribution on addition/removal of a backend server Replication at N backend servers where N=3  High Availability Khoj is a distributed search engine which combines well known techniques to achieve high scalability and availability. Works on a locality aware request distribution infrastructure with multiple front end servers. Clients send requests to the front-end servers using round-robin scheduling. Front-end server uses two level consistent hash ring to determine the backend server that would serve the request. Coordinator server manages addition and removal of nodes. Inverted Indices sharded across the backend servers. Replication across backend servers to achieve fault tolerance and good availability.