Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer.

Similar presentations


Presentation on theme: "A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer."— Presentation transcript:

1 A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer Science Indiana University Bloomington, IN, 47404 {jekanaya,spallick,gcf}@indiana.edu 11/16/20151Jaliya Ekanayake - cts2008 CTS-2008 Irvine California

2 Talk Outline Collaborative Data Analysis Typical Collaborative Techniques Proposed Architecture High Energy Physics Data Analysis Conclusion 11/16/20152Jaliya Ekanayake - cts2008

3 Collaborative Scientific Data Analysis The final step of data analyses involves human interpretation The data, the processing power, and the experts in the field are all distributed Collaboration brings all these to a single session Participants from different geographic locations Different interests (active participation or simply observe results) 11/16/20153Jaliya Ekanayake - cts2008

4 Collaborative Techniques Focused on sharing multimedia content – Audio, video streams – Desktop sharing – Collaborative whiteboards, online meetings – E.g. WebEx, Windows Meeting Place, Anabas, EVO The Data Turbine and the Real Time Data Viewer (RDV) – Remote monitoring of events/streams from scientific instruments – The content dissemination is closely coupled with the architecture 11/16/20154Jaliya Ekanayake - cts2008

5 The Proposed Architecture Compute Server acts as the gateway for a particular domain of control Results shared among the participants Set of agents manage the sessions, and track entities in the system 11/16/20155Jaliya Ekanayake - cts2008 Session Management Entity Tracking Gossip

6 How does it work? 11/16/20156 Site 1 Data C1C1 R11R11 R1m1R1m1 Compute Client 1 Compute Client p Site n Data CnCn Rn1Rn1 RnmnRnmn Content Dissemination Network Agents ComputeServers Register with an Agent Agent Keeps Track of the ComputeServers ComputeClient Retrieve Details of ComputeServers ComputeClient Submit Compute jobs Results Reach all the Interested Entities 1 2 3 4 5 Time Line 11/16/20156Jaliya Ekanayake - cts2008

7 Collaborative Modes - Shared Events Support further processing of data by the receiving end – Active participation Push paradigm Clients can further process the events if necessary Higher quality data Compute server notifies either the results or the location of the results to the participating clients For small data products, the output can directly be sent to the clients For larger data products, the outputs can be stored in a file system and the clients can retrieve them via Compute server 11/16/20157Jaliya Ekanayake - cts2008

8 Collaborating Modes – Shared Display One client captures its display and share it as an image Suitable for passive participation Suitable for clients joining with minimum computation capabilities – E.g. hand held devices Capability to publish data to the public May limits further analysis Less accurate than the shared events 11/16/20158Jaliya Ekanayake - cts2008

9 Security and Fault Tolerance Compute server Security – Authentication via PKI – Authorization via grid-map file Content Dissemination Network provides secure, end to end delivery of messages Content Dissemination Network is fault tolerant Multiple set of agents maintains the state of the system No single point of failure Compute server failure results manual re-start 11/16/20159Jaliya Ekanayake - cts2008

10 High Energy Physics Data Analysis Large volumes of data Distributed data Identify a certain type of data products from a collection of millions of data products Analyses are fine tuned iteratively Same analysis on different data sets Collaborative interpretation Site 1 Data C1C1 R11R11 R1m1R1m1 Compute Client 1 NaradaBrokering Agents ROOT 11/16/201510Jaliya Ekanayake - cts2008

11 User Interface Available Clarens Servers Session Information Results received & merged Results received & currently merging Results not yet received 11/16/201511Jaliya Ekanayake - cts2008

12 Results: # Participants vs. Event Propagation Time 11/16/201512Jaliya Ekanayake 11/16/201512Jaliya Ekanayake - cts2008

13 Results : Event Rate vs. Communication Latency 11/16/201513Jaliya Ekanayake - cts2008

14 Conclusions & Future Work A Collaborative Framework for Scientific Data Analysis Processing data across domains of control Sharing results – Shared Event – Shared Display – Synchronous / Asynchronous Complete the Agent Implementation Map-reduce style programming model for the Compute Server 11/16/201514Jaliya Ekanayake - cts2008

15 Thank You! 11/16/201515Jaliya Ekanayake - cts2008

16 Security The framework spans into multiple domains of control Use PKI for security Each entity in the framework owns a X509 certificate Communication medium - > Content dissemination framework The messages carries a signature Messages from unauthorized entities are discarded Agent uses a proxy certificate to submit computation jobs on behalf of the ComputeClient The framework provides the necessary APIs to generate a proxy certificate ComputeServer maps user’s DN to the user account Computation jobs are executed as user processes The code which performs the above user account mapping is kept auditable 11/16/201516Jaliya Ekanayake - cts2008

17 Handling Failures 1: ComputeServer Agent detects the failure of a ComputeServer Agent notifies the ControlConsole about the failure User restarts the failed ComputeServers ComputeServer keeps the status of the processing jobs in memory – This will simplify the ComputeServer’s functionality Once restarted, the agent will re-submit the incomplete jobs to the ComputeServer ComputeClient can retrieve the results of the completed computations (even the results of the computations, which were completed before the failure) aft the restart 11/16/201517Jaliya Ekanayake - cts2008

18 Handling Failures 2: Agent Master Agent(MA) keeps the status of the entire framework A set of Buddy Agent(BA)s keeps track of the MA MA assigns a unique ID to each BA MA sends the status of the framework to BAs BAs detect a failure of MA First BA will assume duty of MA New MA contacts ComputeServers and build the status BA 1 MA BA 2 BA 3 11/16/201518Jaliya Ekanayake - cts2008

19 Computation Tasks and the Associated Cost 11/16/201519Jaliya Ekanayake - cts2008


Download ppt "A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer."

Similar presentations


Ads by Google