A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer.

Slides:



Advertisements
Similar presentations
Global MP3 Geoffrey Beers Deborah Ford Mike Quinn Mark Ridao.
Advertisements

Architecture and Measured Characteristics of a Cloud Based Internet of Things May 22, 2012 The 2012 International Conference.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
MapReduce Online Veli Hasanov Fatih University.
ELearning Solutions eLearning Solutions The business of education is learning.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
Chapter 19: Network Management Business Data Communications, 4e.
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
A Web Services Based Streaming Gateway for Heterogeneous A/V Collaboration Hasan Bulut Computer Science Department Indiana University.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 1: Introduction to Windows Server 2003.
Principles for Collaboration Systems Geoffrey Fox Community Grids Laboratory Indiana University Bloomington IN 47404
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Network and Active Directory Performance Monitoring and Troubleshooting NETW4008 Lecture 8.
© 2011 Open Mobile Alliance Ltd. All Rights Reserved. Used with the permission of the Open Mobile Alliance Ltd. under the terms as stated in this document.
Sharing Geographic Content
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Chapter 4 Networking and the Internet Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Design of a Collaborative System Minjun Wang Department of Electrical Engineering and Computer Science Syracuse University, U.S.A
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
CIM6400 CTNW (04/05) 1 CIM6400 CTNW Lesson 6 – More on Windows 2000.
Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
material assembled from the web pages at
A Transport Framework for Distributed Brokering Systems Shrideep Pallickara, Geoffrey Fox, John Yin, Gurhan Gunduz, Hongbin Liu, Ahmet Uyar, Mustafa Varank.
 Communication Tasks  Protocols  Protocol Architecture  Characteristics of a Protocol.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Computer and Information Science Ch1.3 Computer Networking Ch1.3 Computer Networking Chapter 1.
Chapter 4 Realtime Widely Distributed Instrumention System.
MapReduce How to painlessly process terabytes of data.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal VideoConference Archives Indexing System.
DISTRIBUTED COMPUTING PARADIGMS. Paradigm? A MODEL 2for notes
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Shrideep Pallickara, Jaliya Ekanayake, Geoffrey Fox Community Grids Lab Indiana University Collaborative Analysis of Distributed Data Applied to Particle.
Tao Huang, Shrideep Pallickara, Geoffrey Fox Community Grids Lab Indiana University, Bloomington {taohuang, spallick,
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
A Demonstration of Collaborative Web Services and Peer-to-Peer Grids Minjun Wang Department of Electrical Engineering and Computer Science Syracuse University,
Thin Client Collaboration Web Services Minjun Wang Department of Electrical Engineering and Computer Science Syracuse University, U.S.A
Mobile agents based implementation of a distance evaluation system Vikram Jamwal Sridhar Iyer School of Information Technology IIT Bombay SAINT’2003.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Ubiquitous Access for Collaborative Information System Using SVG July Sangmi Lee, Geoffrey Fox, Sunghoon Ko, Minjun Wang, Xiaohong Qui
By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison Spring 2000.
Scaling and Fault Tolerance for Distributed Messages in a Service and Streaming Architecture Hasan Bulut Advisor: Prof. Geoffrey Fox Ph.D. Defense Exam.
AMSA TO 4 Advanced Technology for Sensor Clouds 09 May 2012 Anabas Inc. Indiana University.
Powerpoint Templates Data Communication Muhammad Waseem Iqbal Lecture # 07 Spring-2016.
Grid Computing.
Chapter 3: Windows7 Part 4.
Design and Implementation of Audio/Video Collaboration System Based on Publish/subscribe Event Middleware CTS04 San Diego 19 January 2004 PTLIU Laboratory.
湖南大学-信息科学与工程学院-计算机与科学系
An Introduction to Computer Networking
Computer Science Department
Mobile Agents.
Remarks on Peer to Peer Grids
Introduction to Operating Systems
Indirect Communication Paradigms (or Messaging Methods)
Indirect Communication Paradigms (or Messaging Methods)
New Tools In Education Minjun Wang
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer Science Indiana University Bloomington, IN, /16/20151Jaliya Ekanayake - cts2008 CTS-2008 Irvine California

Talk Outline Collaborative Data Analysis Typical Collaborative Techniques Proposed Architecture High Energy Physics Data Analysis Conclusion 11/16/20152Jaliya Ekanayake - cts2008

Collaborative Scientific Data Analysis The final step of data analyses involves human interpretation The data, the processing power, and the experts in the field are all distributed Collaboration brings all these to a single session Participants from different geographic locations Different interests (active participation or simply observe results) 11/16/20153Jaliya Ekanayake - cts2008

Collaborative Techniques Focused on sharing multimedia content – Audio, video streams – Desktop sharing – Collaborative whiteboards, online meetings – E.g. WebEx, Windows Meeting Place, Anabas, EVO The Data Turbine and the Real Time Data Viewer (RDV) – Remote monitoring of events/streams from scientific instruments – The content dissemination is closely coupled with the architecture 11/16/20154Jaliya Ekanayake - cts2008

The Proposed Architecture Compute Server acts as the gateway for a particular domain of control Results shared among the participants Set of agents manage the sessions, and track entities in the system 11/16/20155Jaliya Ekanayake - cts2008 Session Management Entity Tracking Gossip

How does it work? 11/16/20156 Site 1 Data C1C1 R11R11 R1m1R1m1 Compute Client 1 Compute Client p Site n Data CnCn Rn1Rn1 RnmnRnmn Content Dissemination Network Agents ComputeServers Register with an Agent Agent Keeps Track of the ComputeServers ComputeClient Retrieve Details of ComputeServers ComputeClient Submit Compute jobs Results Reach all the Interested Entities Time Line 11/16/20156Jaliya Ekanayake - cts2008

Collaborative Modes - Shared Events Support further processing of data by the receiving end – Active participation Push paradigm Clients can further process the events if necessary Higher quality data Compute server notifies either the results or the location of the results to the participating clients For small data products, the output can directly be sent to the clients For larger data products, the outputs can be stored in a file system and the clients can retrieve them via Compute server 11/16/20157Jaliya Ekanayake - cts2008

Collaborating Modes – Shared Display One client captures its display and share it as an image Suitable for passive participation Suitable for clients joining with minimum computation capabilities – E.g. hand held devices Capability to publish data to the public May limits further analysis Less accurate than the shared events 11/16/20158Jaliya Ekanayake - cts2008

Security and Fault Tolerance Compute server Security – Authentication via PKI – Authorization via grid-map file Content Dissemination Network provides secure, end to end delivery of messages Content Dissemination Network is fault tolerant Multiple set of agents maintains the state of the system No single point of failure Compute server failure results manual re-start 11/16/20159Jaliya Ekanayake - cts2008

High Energy Physics Data Analysis Large volumes of data Distributed data Identify a certain type of data products from a collection of millions of data products Analyses are fine tuned iteratively Same analysis on different data sets Collaborative interpretation Site 1 Data C1C1 R11R11 R1m1R1m1 Compute Client 1 NaradaBrokering Agents ROOT 11/16/201510Jaliya Ekanayake - cts2008

User Interface Available Clarens Servers Session Information Results received & merged Results received & currently merging Results not yet received 11/16/201511Jaliya Ekanayake - cts2008

Results: # Participants vs. Event Propagation Time 11/16/201512Jaliya Ekanayake 11/16/201512Jaliya Ekanayake - cts2008

Results : Event Rate vs. Communication Latency 11/16/201513Jaliya Ekanayake - cts2008

Conclusions & Future Work A Collaborative Framework for Scientific Data Analysis Processing data across domains of control Sharing results – Shared Event – Shared Display – Synchronous / Asynchronous Complete the Agent Implementation Map-reduce style programming model for the Compute Server 11/16/201514Jaliya Ekanayake - cts2008

Thank You! 11/16/201515Jaliya Ekanayake - cts2008

Security The framework spans into multiple domains of control Use PKI for security Each entity in the framework owns a X509 certificate Communication medium - > Content dissemination framework The messages carries a signature Messages from unauthorized entities are discarded Agent uses a proxy certificate to submit computation jobs on behalf of the ComputeClient The framework provides the necessary APIs to generate a proxy certificate ComputeServer maps user’s DN to the user account Computation jobs are executed as user processes The code which performs the above user account mapping is kept auditable 11/16/201516Jaliya Ekanayake - cts2008

Handling Failures 1: ComputeServer Agent detects the failure of a ComputeServer Agent notifies the ControlConsole about the failure User restarts the failed ComputeServers ComputeServer keeps the status of the processing jobs in memory – This will simplify the ComputeServer’s functionality Once restarted, the agent will re-submit the incomplete jobs to the ComputeServer ComputeClient can retrieve the results of the completed computations (even the results of the computations, which were completed before the failure) aft the restart 11/16/201517Jaliya Ekanayake - cts2008

Handling Failures 2: Agent Master Agent(MA) keeps the status of the entire framework A set of Buddy Agent(BA)s keeps track of the MA MA assigns a unique ID to each BA MA sends the status of the framework to BAs BAs detect a failure of MA First BA will assume duty of MA New MA contacts ComputeServers and build the status BA 1 MA BA 2 BA 3 11/16/201518Jaliya Ekanayake - cts2008

Computation Tasks and the Associated Cost 11/16/201519Jaliya Ekanayake - cts2008