Ali Kaplan Advisor: Prof. Geoffrey C. Fox 2/02/20091.

Slides:



Advertisements
Similar presentations
Giggle: A Framework for Constructing Scalable Replica Location Services Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschekk, Adriana.
Advertisements

The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Globus DataGrid Overview Bill Allcock, ANL GridPP Meeting 30 June 2003.
1 GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.
High Performance Computing Course Notes Grid Computing.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Reliability on Web Services Presented by Pat Chan 17/10/2005.
GridFTP: File Transfer Protocol in Grid Computing Networks
Network-Attached Storage
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Technical Architectures
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
Protocols and the TCP/IP Suite
Asper School of Business University of Manitoba Systems Analysis & Design Instructor: Bob Travica System architectures Updated: November 2014.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
1 CS6320 – Why Servlets? L. Grewe 2 What is a Servlet? Servlets are Java programs that can be run dynamically from a Web Server Servlets are Java programs.
Distributed Systems: Client/Server Computing
Module – 7 network-attached storage (NAS)
Gursharan Singh Tatla Transport Layer 16-May
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Lecture slides prepared for “Business Data Communications”, 7/e, by William Stallings and Tom Case, Chapter 8 “TCP/IP”.
Protocols and the TCP/IP Suite Chapter 4. Multilayer communication. A series of layers, each built upon the one below it. The purpose of each layer is.
CLIENT A client is an application or system that accesses a service made available by a server. applicationserver.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Ali Kaplan Advisor: Prof. Geoffrey C. Fox 2/02/20091.
Ali Kaplan Advisor: Prof. Geoffrey C. Fox 14/27/2009.
Thesis Proposal Ali Kaplan
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Remote Access Chapter 4. Learning Objectives Understand implications of IEEE 802.1x and how it is used Understand VPN technology and its uses for securing.
Remote Access Chapter 4. Learning Objectives Understand implications of IEEE 802.1x and how it is used Understand VPN technology and its uses for securing.
Protocols and the TCP/IP Suite
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
DataGrid Middleware: Enabling Big Science on Big Data One of the most demanding and important challenges that we face as we attempt to construct the distributed.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
1 Distributed Systems: an Introduction G53ACC Chris Greenhalgh.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Secure, Collaborative, Web Service enabled and Bittorrent Inspired High-speed Scientific Data Transfer Framework.
Integrated Collaborative Information Systems Ahmet E. Topcu Advisor: Prof Dr. Geoffrey Fox 1.
1 The Internet and Networked Multimedia. 2 Layering  Internet protocols are designed to work in layers, with each layer building on the facilities provided.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
UDT as an Alternative Transport Protocol for GridFTP Raj Kettimuthu Argonne National Laboratory The University of Chicago.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
BASIC NETWORK PROTOCOLS AND THEIR FUNCTIONS Created by: Ghadeer H. Abosaeed June 23,2012.
CHAPTER 4 PROTOCOLS AND THE TCP/IP SUITE Acknowledgement: The Slides Were Provided By Cory Beard, William Stallings For Their Textbook “Wireless Communication.
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
The Globus eXtensible Input/Output System (XIO): A protocol independent IO system for the Grid Bill Allcock, John Bresnahan, Raj Kettimuthu and Joe Link.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Research and Service Support Resources for EO data exploitation RSS Team, ESRIN, 23/01/2013 Requirements for a Federated Infrastructure.
IST 201 Chapter 11 Lecture 2. Ports Used by TCP & UDP Keep track of different types of transmissions crossing the network simultaneously. Combination.
University of Technology
Protocols and the TCP/IP Suite
Software models - Software Architecture Design Patterns
Chapter 2: Operating-System Structures
Protocols and the TCP/IP Suite
Computer Networking A Top-Down Approach Featuring the Internet
GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.
Chapter 2: Operating-System Structures
Presentation transcript:

Ali Kaplan Advisor: Prof. Geoffrey C. Fox 2/02/20091

Outline Introduction Background Motivation and Research Issues GridTorrent Framework Architecture Measurements and Analysis Contributions and Future Works 2/02/20092

Data, Data, more Data Computational science is changing to be data intensive Scientists are faced with mountains of data that stem from three sources[1]: 1. New scientific instruments double their output every year or so 2. Simulations generates flood of data 3. The Internet and computational Grid allow the replication, creation, and recreation of more data[2] 2/02/20093

Data, Data, more Data (cont.) Scientific discovery increasingly driven by data collection[3] Computationally intensive analyses Massive data collections Data distributed across networks of varying capability Internationally distributed collaborations Data Intensive Science: Dominant factor: data growth (1 Petabyte = 1000 TB) 2000~0.5 Petabyte 2005~10 Petabytes 2010~100 Petabytes 2015~1000 Petabytes? 2/02/20094

Scientific Application Examples Scientific applications generates petabytes of data are very diverse. – Fusion power – Climate modeling – Earthquake engineering – Astronomy – Bioinformatics – High-energy physics 2/02/20095

Scientific Application Examples (cont.) Some examples  Climate modeling Community Climate System Model and other simulation applications generates 1.5 petabytes/year  Bioinformatics The Pacific Northwest National Laboratory is building new Confocal microscopes which will be generating 5 petabytes/year  High-energy physics The large hadron collider (LHC) project at CERN will create 15 petabytes/year 2/02/20096

Background Systems for transferring bulk data Network level solutions System level solutions Application level solutions 2/02/20097

Background (cont.) Cost Prevalence 2/02/20098

System Level Solutions - Require modifications to the operating systems of the machine The network apparatus Or both + Yield very good performance - Expensive solutions - Not applicable to every system Group Transport Protocol for Lambda-Grids (GTP) 2/02/20099

Network Level Solutions Network Attached Storage (NAS) File-level storage system attached to traditional network Use higher-level protocols Does not allow direct access to individual storage Simpler and more economical solution than SAN Storage Area Network (SAN) Storage devices attached directly to LAN Utilize low-level network protocols (Fibre Channels) Handle large data transfers Provide better performance 2/02/200910

Application Level Solutions +Use parallel streaming to improve performance +Require no modifications to underlying systems +Inexpensive +Broader use +-May require auxiliary component for data management -May not be as fast as Network/System level solutions Type of application solutions TCP based solution UDP based Solutions 2/02/200911

TCP-Based Solutions +Harness the good features of TCP +Reliability +-Built-in congestion control mechanism (TCP Window) +Require no changes on existing system +Easy to implement +Broader use -Not suitable for real-time applications GridFTP, GridHTTP, bbFTP and bbcp Use mainly FTP or HTTP as base protocol 2/02/200912

UDP-Based Solutions +Small segment head overhead (8 vs. 20 bytes) -Unreliable +-Require additional mechanism for reliability and congestion control (at application level) +May overcome existing problems of TCP +May make UDP faster -Integration with existing systems require some changes and efforts SABUL, UDT, FOBS, RBUDP, Tsunami, and UFTP Utilized mainly rate-based control mechanism 2/02/200913

Auxiliary Components Used for file indexing and discovery GridFTP utilizes the Replica Location Service (RLS) Local Replica Catalogs (LRCs) Replica Location Indices (RLIs) LRCs send information about their state to RLIs using soft state protocols Optional "Bloom Filter" compression can be used to summarize the contents of the LRC. The current RLS implementation maintains static information about the LRCs and RLIs participating in the distributed system 2/02/200914

Motivation and Research Issues Problems of Existing Solutions Built-on client/server model Why not P2P? Utilize mainly FTP/HTTP type of protocols Suffer from drawbacks of FTP/HTTP Modification is very difficult Require to build some vital services as separate modules Use existing system resources inefficiently 2/02/200915

Motivation and Research Issues (cont.) If a P2P model can be solution Which P2P can be the right model? What additional features does it require? Collaborative Framework P2P Client communication hub Moderate security Is it scalable? How is the performance of it? What is the overhead of it? How is it flexible and extensible? 2/02/200916

GridTorrent Framework Architecture 2/02/200917

Collaboration and Content Manager An Interface between users and the system Capabilities: Share content Browse content Download content Add/remove group Add/remove users for a particular content (Access Right Controls) Add/remove users for a particular group (Access Right Controls) Everything is metadata 2/02/200918

GridTorrent Framework Architecture 2/02/200919

WS-Tracker Service The communication hub of the system Loosely-coupled, flexible and extensible Deliver tasks to GridTorrent clients Update tasks status in database Store and serve.torrent files 2/02/ Database WS-Tracker Service GridTorrent Client Get Available Tasks Ask for tasks Deliver Task Deliver.torrent file Update Records

GridTorrent Framework Architecture 2/02/200921

Task A task is simply metadata (wrapped actions) Request Response Periodic Non-periodic Instructs a GridTorrent client what to do with whom Created by users Exchanged between WS-Tracker service and GridTorrent client 2/02/200922

Task Format 2/02/200923

GridTorrent Client Modular architecture Provides extensibility and flexibility Built-on P2P file sharing protocol Enables to utilize idle resources efficiently Provides adequate security Authentication Authorization 2/02/ Utilizes regular and parallel stream connection (other transferring mechanism could be used)

Security in GridTorrent Client Only Security Module port is public Each peer has to be authenticated and authorized (A&A) before starting download process After a successful A&A, they receive data port number and passkey Peers use passkey for second verification just before download process If everything is valid and successful, actual data downloading is started 2/02/200925

PeerA’s Data Sharing Module PeerB’s Security ModulePeerA’s Security Module Security in GridTorrent Client-I 2/02/ PeerA starts authentication process PeerB handles PeerA’s request Authorization successful? Yes PeerA in ACL? PeerB gives PeerA data port number and passkey, also save passkey for further use Reject Connection PeerA’s Data Sharing Module PeerA connects received data port and sends passkey to start download process PeerB starts data transferring process Passkey verification Yes No

Measurements and Analysis The set of benchmarks Performance Overhead Utilized PTCP transferring method for comparison Performed test-bed in these benchmarks LAN (Bloomington, IN-Indianapolis, IN) WAN (Bloomington, IN-Tallahassee, FL) 2/02/200927

LAN Test Setup PTCP GridTorrent 2/02/200928

LAN Test Result 2/02/200929

WAN Test-I Setup PTCP GridTorrent 2/02/200930

WAN Test-I Result 2/02/200931

WAN Test-II Setup PTCP GridTorrent 2/02/200932

WAN Test-II Result 2/02/200933

Evaluation of Test Results GridTorrent provides better or same performance on WAN PTCP reaches maximum data transfer speed at 15 streams Utilizing PTCP in GridTorrent yields higher data transfer rate Total size of the overhead message is between KB for transferring 300 MB file Scalability is not an issue due to bulk data transfer characteristic 2/02/200934

Contributions System research  A Collaborative framework with P2P based data moving technique  Efficient, scalable and modular  Integrating with SOA to increase modularity, flexibility and extensibility  Strategies for increasing performance and scalability  Unification of many useful techniques such as reliable file transfer, third-party transfer and disk allocation in a simple but efficient way  Benchmarks to evaluate the GridTorrent performance System software  Designing and implementing a infrastructure consists of GridTorrent client, WS-Tracker service, and Collaborative framework 2/02/200935

Future Works Utilizing other high-performance low-level TCP or UDP based data transfer protocols in data layer Improving existing P2P technique Adapting existing system to support dynamic(real-time) content Developing and deploying Intelligent source selection algorithm into WS-Tracker Service Security Security framework for WS-Tracker Service if necessary Transforming Collaborative framework into portlets for reusability 2/02/200936