Ali Kaplan Advisor: Prof. Geoffrey C. Fox 2/02/20091.

Slides:



Advertisements
Similar presentations
The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Advertisements

Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
1 GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.
High Performance Computing Course Notes Grid Computing.
GridFTP: File Transfer Protocol in Grid Computing Networks
Technical Architectures
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
Protocols and the TCP/IP Suite Chapter 4 (Stallings Book)
Protocols and the TCP/IP Suite
Kyushu University Graduate School of Information Science and Electrical Engineering Department of Advanced Information Technology Supervisor: Professor.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
Module – 7 network-attached storage (NAS)
Gursharan Singh Tatla Transport Layer 16-May
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Protocols and the TCP/IP Suite Chapter 4. Multilayer communication. A series of layers, each built upon the one below it. The purpose of each layer is.
CLIENT A client is an application or system that accesses a service made available by a server. applicationserver.
Christopher Bednarz Justin Jones Prof. Xiang ECE 4986 Fall Department of Electrical and Computer Engineering University.
1 Proceeding the Second Exercises on Computer and Systems Engineering Professor OKAMURA Laboratory. Othman Othman M.M.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Guide to TCP/IP, Second Edition1 Guide To TCP/IP, Second Edition Chapter 6 Basic TCP/IP Services.
Ali Kaplan Advisor: Prof. Geoffrey C. Fox 14/27/2009.
Ali Kaplan Advisor: Prof. Geoffrey C. Fox 2/02/20091.
Professor OKAMURA Laboratory. Othman Othman M.M. 1.
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network Chapter 3: TCP/IP Architecture.
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago.
Thesis Proposal Ali Kaplan
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
Remote Access Chapter 4. Learning Objectives Understand implications of IEEE 802.1x and how it is used Understand VPN technology and its uses for securing.
Remote Access Chapter 4. Learning Objectives Understand implications of IEEE 802.1x and how it is used Understand VPN technology and its uses for securing.
Protocols and the TCP/IP Suite
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
DataGrid Middleware: Enabling Big Science on Big Data One of the most demanding and important challenges that we face as we attempt to construct the distributed.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Professor OKAMURA Laboratory. Othman Othman M.M. 1.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
material assembled from the web pages at
1 Distributed Systems: an Introduction G53ACC Chris Greenhalgh.
Secure, Collaborative, Web Service enabled and Bittorrent Inspired High-speed Scientific Data Transfer Framework.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
1 The Internet and Networked Multimedia. 2 Layering  Internet protocols are designed to work in layers, with each layer building on the facilities provided.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Othman Othman M.M., Koji Okamura Kyushu University 1.
OS Services And Networking Support Juan Wang Qi Pan Department of Computer Science Southeastern University August 1999.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
POSTECH DP&NM Lab. Internet Traffic Monitoring and Analysis: Methods and Applications (1) 1.Introduction.
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
SURENDRA INSTITUTE OF ENGINEERING & MANAGEMENT PRESENTED BY : Md. Mubarak Hussain DEPT-CSE ROLL
IST 201 Chapter 11 Lecture 2. Ports Used by TCP & UDP Keep track of different types of transmissions crossing the network simultaneously. Combination.
Protocols and the TCP/IP Suite
Accelerating Peer-to-Peer Networks for Video Streaming
An example of peer-to-peer application
Working at a Small-to-Medium Business or ISP – Chapter 7
University of Technology
Working at a Small-to-Medium Business or ISP – Chapter 7
Chapter 2 Introduction Application Requirements VS. Transport Services
Protocols and the TCP/IP Suite
Working at a Small-to-Medium Business or ISP – Chapter 7
ECEN “Internet Protocols and Modeling”
Software models - Software Architecture Design Patterns
Protocols and the TCP/IP Suite
GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.
Presentation transcript:

Ali Kaplan Advisor: Prof. Geoffrey C. Fox 2/02/20091

Outline Introduction Background Motivation and Research Issues GridTorrent Framework Architecture Measurements and Analysis Contributions and Future Works 2/02/20092

Data, Data, more Data Computational science is changing to be data intensive Scientists are faced with mountains of data that stem from three sources[1]: 1. New scientific instruments data generation is monotonic 2. Simulations generates flood of data 3. The Internet and computational Grid allow the replication, creation, and recreation of more data[2] 2/02/20093

Data, Data, more Data (cont.) Scientific discovery increasingly driven by data collection[3] Computationally intensive analyses Massive data collections Data distributed across networks of varying capability Internationally distributed collaborations Data Intensive Science: [4] Dominant factor: data growth (1 Petabyte = 1000 TB) 2000~0.5 Petabyte 2007~10 Petabytes 2013~100 Petabytes 2020~1000 Petabytes? 2/02/20094

Scientific Application Examples Scientific applications generates petabytes of data are very diverse. – Fusion power – Climate modeling – Astronomy – High-energy physics – Bioinformatics – Earthquake engineering 2/02/20095

Scientific Application Examples (cont.) Some examples  Climate modeling Community Climate System Model and other simulation applications generates 1.5 petabytes/year  Bioinformatics The Pacific Northwest National Laboratory is building new Confocal microscopes which will be generating 5 petabytes/year  High-energy physics The Large Hadron Collider (LHC) project at CERN will create 100 petabytes/year 2/02/20096

Background Systems for transferring bulk data Network level solutions System level solutions Application level solutions 2/02/20098

Background (cont.) Cost Prevalence 2/02/20099

Network Level Solutions Network Attached Storage (NAS) File-level storage system attached to traditional network Use higher-level protocols Does not allow direct access to individual storage Simpler and more economical solution than SAN Storage Area Network (SAN) Storage devices attached directly to LAN Utilize low-level network protocols (Fiber Channels) Handle large data transfers Provide better performance 2/02/200910

System Level Solutions - Require modifications to the operating systems of the machine The network apparatus Or both + Yield very good performance - Expensive solutions - Not applicable to every system Group Transport Protocol for Lambda-Grids (GTP) 2/02/200911

2/02/20012 Application Level Solutions +Use parallel streaming to improve performance +Tweak TCP buffer size to improve performance +Require no modifications to underlying systems +Inexpensive +Prevalent use +-May require auxiliary component for data management -May not be as fast as Network/System level solutions Type of application solutions TCP based solution UDP based Solutions

TCP-Based Solutions +Harness the good features of TCP +Reliability +-Built-in congestion control mechanism (TCP Window) +Require no changes on existing system +Easy to implement +Prevalent use -Not suitable for real-time applications GridFTP, GridHTTP, bbFTP and bbcp Use mainly FTP or HTTP as base protocol 2/02/200913

UDP-Based Solutions +Small segment head overhead (8 vs. 20 bytes) -Unreliable +-Require additional mechanism for reliability and congestion control (at application level) +May overcome existing problems of TCP +May make UDP faster -Integration with existing systems require some changes and efforts SABUL, UDT, FOBS, RBUDP, Tsunami, and UFTP Utilized mainly rate-based control mechanism 2/02/200914

Auxiliary Components Used for file indexing and discovery GridFTP utilizes the Replica Location Service (RLS) Local Replica Catalogs (LRCs) Replica Location Indices (RLIs) LRCs send information about their state to RLIs using soft state protocols 2/02/200915

Motivation and Research Issues Problems of Existing Solutions Built-on client/server model Why not P2P? Utilize mainly FTP/HTTP type of protocols Suffer from drawbacks of FTP/HTTP Modification is very difficult Require to build some vital services as separate modules Use existing system resources inefficiently 2/02/200916

Comparison of BitTorrent and GridTorrent’s Architecture BitTorrentGridTorrentReason P2P data-sharing protocol No change Simple HTTP ClientSOA-based Tracker Client To enable advanced operations exchange with WS- Tracker Service -Task ManagerTo enable execution of advanced operations in Client such as remote sharing and ACL Web Server based Tracker Advanced SOA- based Tracker To allow the system to build and to handle complex actions required by scientific community -Security ManagerTo provide authentication and authorization mechanism -Collaboration and Content Manager To empower users to control access rights to their content and to start remote sharing, downloading processes and permit interactions between them -Supporting Multiple Streams To improve further data transmission performance 2/02/200919

2/02/200920

Collaboration and Content Manager An Interface between users and the system Capabilities: Share content Browse content Download content Add/remove group Add/remove users for a particular content (Access Right Controls) Add/remove users for a particular group (Access Right Controls) Everything is metadata 2/02/200922

WS-Tracker Service component of GridTorrent Framework Architecture 2/02/200923

WS-Tracker Service The communication hub of the system Loosely-coupled, flexible and extensible Deliver tasks to GridTorrent clients Update tasks status in database Store and serve.torrent files 2/02/ Database WS-Tracker Service GridTorrent Client Get Available Tasks Ask for tasks Deliver Task Deliver.torrent file Update Records

Task A task is simply metadata (wrapped actions) Request Response Periodic Non-periodic Instructs a GridTorrent client what to do with whom Created by users Exchanged between WS-Tracker service and GridTorrent client 2/02/200925

Task Format 2/02/200926

Tasks overview NoTask NameCreatorSourceDestinationCategory 1Task List RequestGTFC WS-Trackerrequest, periodic 2Share Content Request UserWS- Tracker GTFCrequest, nonperiodic 3Share Content Response GTFC WS-TrackerResponse, nonperiodic 4Download Content Request UserWS- Tracker GTFCRequest, nonperiodic 5Download Content Response GTFC WS-Trackerresponse, periodic 6ACL RequestGTFC WS-Trackerrequest, periodic 7ACL ResponseUserWS- Tracker GTFCresponse 8Update StatusGTFC WS-Trackerperiodic

GridTorrent Client component of GridTorrent Framework Architecture 2/02/200928

GridTorrent Client Modular architecture Provides extensibility and flexibility Built-on P2P file sharing protocol Enables to utilize idle resources efficiently Provides adequate security Authentication Authorization 2/02/ Utilizes regular and parallel stream connection (other transferring mechanism could be used)

PeerA’s Data Sharing Module PeerB’s Security ModulePeerA’s Security Module Security in GridTorrent Client 2/02/ PeerA starts authentication process PeerB handles PeerA’s request Authorization successful? Yes PeerA in ACL? PeerB gives PeerA data port number and passkey, also save passkey for further use Reject Connection PeerA’s Data Sharing Module PeerA connects received data port and sends passkey to start download process PeerB starts data transferring process Passkey verification Yes No Reject Connection

Security in GridTorrent Client Only security port number on which Security Manager listens is publicly known to other peers Each peer has to be authenticated and authorized (A&A) before starting download process After a successful A&A, they receive data port number and passkey Peers use passkey for second verification just before download process If everything is valid and successful, actual data downloading is started 2/02/200931

Measurements and Analysis The set of benchmarks Performance Overhead Utilized PTCP transferring method for comparison Parallel streaming is one of the major performance improvement methods It has similar structure with GridTorrent Performed test-bed in these benchmarks LAN (Bloomington, IN-Indianapolis, IN) WAN (Bloomington, IN-Tallahassee, FL) 2/02/200932

Modeling of PTCP and GridTorrent PTCP with 3 streams GridTorrent with 3 sources 2/02/200933

LAN Test Setup PTCP GridTorrent 2/02/200934

Theoretical and Practical Limits RTT = 0.30 ms Theoretical Bandwidth = 1000 Mbps Maximum TCP Bandwidth =.9493*1000=949 Mbps Ethernet’s Maximum Transmission Unit = 1500 Byte TCP’s Header = 20 Byte IP’s Header =20 Byte Ethernet’s additional preamble = 38 Byte U=( )/( )= Measured Bandwidth with Iperf = 857 Mbps Server side: Iperf -s -w 256k Client side: Iperf -c -w 512k -P /02/200935

LAN Test Result (RTT = 0.30 ms) 2/02/200936

WAN Test-I Setup PTCP GridTorrent with regular socket 2/02/200937

Theoretical and Practical Limits RTT = 50 ms Theoretical Bandwidth = 1000 Mbps Maximum TCP Bandwidth =.9493*1000=949 Mbps Measured Bandwidth with Iperf = 30.2 Mbps Server side: Iperf -s -w 256k Client side: Iperf -c -w 256k -P 50 2/02/200938

WAN Test-I Result (RTT = 50 ms) 2/02/200939

WAN Test-II Setup PTCP GridTorrent with 4 parallel sockets 2/02/200940

WAN Test-II Result (RTT = 50 ms) 2/02/200941

Evaluation of Test Results GridTorrent provides better or same performance on WAN PTCP reaches maximum data transfer speed at 15 streams Utilizing PTCP in GridTorrent yields higher data transfer rate Total size of the overhead message is between KB for transferring 300 MB file Scalability is not an issue due to bulk data transfer characteristic 2/02/200942

Characteristics of Participation in Scientific Community Number of participator is scale of 10,100, 1000s Fully distributed Team work CERN: The European Organization for Nuclear Research The world's largest particle physics laboratory Supported by twenty European member states Currently the workplace of approximately 2,600 full-time employees Some 7,931 scientists and engineers representing 580 universities and research facilities 80 nationalities 2/02/200943

Advantages of GridTorrent More peers, more available services Unlike client/server model, mitigate loads on server with more peers Optimal resources usage Computing power Storage space Bandwidth Very efficient for replica systems P2P networks are more scalable than client/server model Reliable file transfer Resume capability when data transfer interrupted Third-party transfer Disk allocation before actual data transfer 2/02/200944

2/02/200945

Transmission sequence matrix of PTCP Time (sec)S-C1S-C2S-C3C1C2C3 1N1 2N2N1,N2 3N3N1,N2,N3 4N1 5N2N1,N2 6N3N1,N2,N3 7N1 8N2N1,N2 9N3N1,N2,N3 2/02/200946

Transmission sequence matrix of GridTorrent Time (sec)S-C1S-C2S-C3C1-C2C2-C3C1-C3C1C2C3 1N1 2N2N1 N2N1 3N3N1 N1,N2N1,N3 4N2 N1,N2 N1,N2,N3 5N3 N1,N2,N3 2/02/200947

Contributions System research  A Collaborative framework with P2P based data moving technique  Efficient, scalable and modular  Integrating with SOA to increase modularity, flexibility and extensibility  Strategies for increasing performance and scalability  Unification of many useful techniques such as reliable file transfer, third-party transfer and disk allocation in a simple but efficient way  Benchmarks to evaluate the GridTorrent performance System software  Designing and implementing a infrastructure consists of GridTorrent client, WS-Tracker service, and Collaborative framework 2/02/200948

Future Works Utilizing other high-performance low-level TCP or UDP based data transfer protocols in data layer Improving existing P2P technique Certification handling service for different certificates Adapting existing system to support dynamic (real-time) content Developing and deploying Intelligent source selection algorithm into WS-Tracker Service Security Security framework for WS-Tracker Service if necessary Transforming Collaborative framework into portlets for reusability 2/02/200949

References 1. Petascale computational systems, Bell, G.; Gray, J.; Szalay, A. Computer Volume 39, Issue 1, Jan Page(s): 110 – Getting Up To Speed, The Future of Supercomputing, Graham, S.L. Snir, M., Patterson, C.A., (eds), NAE Press, 2004, ISBN Overview of Grid Computing, Ian Foster, fp.mcs.anl.gov/~foster/Talks/ResearchLibraryGroupGrid sApril2002.ppt, last seen Science-Driven Network Requirements for Esnet, With-Exec-Sum-v5.doc, last seen 2007

2/02/ Create MyFile.torrent MyFile.torrent

2/02/ Upload MyFile.torrent MyFile.torrent

2/02/ Join to Tracker MyFile.torrent

2/02/ Find and obtain MyFile.torrent MyFile.torrent

2/02/ Join Tracker Node MyFile.torrent

2/02/ Tracker Node replies with list of peers = {Seed Node} MyFile.torrent

2/02/ Download pieces of content MyFile.torrent