DCC Workshop Input from Computing Coordination

Slides:



Advertisements
Similar presentations
2000 Making DADS distributed a Nordunet2 project Jochen Hollmann Chalmers University of Technology.
Advertisements

Chapter 10: Designing Databases
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
eGovernance Under guidance of Dr. P.V. Kamesam IBM Research Lab New Delhi Ashish Gupta 3 rd Year B.Tech, Computer Science and Engg. IIT Delhi.
Kelly Davis Architecture of GAT Kelly Davis AEI-MPG.
 MODERN DATABASE MANAGEMENT SYSTEMS OVERVIEW BY ENGINEER BILAL AHMAD
Client-Server Computing in Mobile Environments
Introduction to the course January 9, Points to Cover  What is GIS?  GIS and Geographic Information Science  Components of GIS Spatial data.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.
Understanding Data Warehousing
DECISION SUPPORT SYSTEMS
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Austin code camp 2010 asp.net apps with azure table storage PRESENTED BY CHANDER SHEKHAR DHALL
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
Computing and the Web Operating Systems. Overview n What is an Operating System n Booting the Computer n User Interfaces n Files and File Management n.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Architecture Models. Readings r Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 m Note: All figures from this book.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
Design Reuse Earlier we have covered the re-usable Architectural Styles as design patterns for High-Level Design. At mid-level and low-level, design patterns.
REST By: Vishwanath Vineet.
Learning Objectives Understand the concepts of Information systems.
Chapter 2 Database Environment.
CS223: Software Engineering Lecture 18: The XP. Recap Introduction to Agile Methodology Customer centric approach Issues of Agile methodology Where to.
Future of Distributed Production in US Facilities Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13,
Panda Monitoring, Job Information, Performance Collection Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008.
J2EE Platform Overview (Application Architecture)
Databases and DBMSs Todd S. Bacastow January 2005.
Software Project Configuration Management
Database Replication and Monitoring
Virtualization and Clouds ATLAS position
Introduction to Load Balancing:
(on behalf of the POOL team)
GWE Core Grid Wizard Enterprise (
POW MND section.
MVC and other n-tier Architectures
Software Design and Architecture
Distribution and components
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Job workflow Pre production operations:
R&D for HL-LHC from the CWP
#01 Client/Server Computing
Cisco’s Intelligent Automation for Cloud
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Replication Middleware for Cloud Based Storage Service
Chapter 2 Database Environment Pearson Education © 2009.
Data, Databases, and DBMSs
Building a Database on S3
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Database Environment Transparencies
 YongPyong-High Jan We appreciate that you give an opportunity to have this talk. Our Belle II computing group would like to report on.
AWS Cloud Computing Masaki.
Specialized Cloud Architectures
Why Threads Are A Bad Idea (for most purposes)
Software Architecture Lecture 7
Software Architecture Lecture 7
System Reengineering Restructuring or rewriting part or all of a system without changing its functionality Applicable when some (but not all) subsystems.
Software Architecture Lecture 7
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
#01 Client/Server Computing
Software Architecture Lecture 6
Presentation transcript:

DCC Workshop Input from Computing Coordination Simone Campana, Torre Wenaus Simone Campana – ATLAS Week 13/03/17

Simone Campana – ATLAS Week Introduction The mandate for this talk: “We thought that this could be a blue skies talk looking at Run 3 and beyond, what does and doesn’t scale, event streaming services, etc.  How could we reimagine data management and production systems interacting in the future, what granularity will we need to control data processing for HPC workflows, etc.” … and so this is (not) what I will talk about. Simone Campana – ATLAS Week 13/03/17

Simone Campana – ATLAS Week Introduction We are defining the roadmap for a Run-4 computing model We like an adiabatic approach, where new concepts and components are prototyped and evaluated in the context of the existing ecosystem Such components can go to production much earlier than Run-4 or be abandoned. This is how we think we should do R&D
 Data Curation and Characterization plays a central role in the model we are defining. Simone Campana – ATLAS Week 13/03/17

Axioms (https://en.wikipedia.org/wiki/Axiom) The granularity of our data processing is the event: we process one event at the time. We organize events in files because this is what file systems supports, we organize files in datasets because this is practical, we organize datasets in containers to characterize data
 Most of our data is “cold” data. We write it once, we access it O(10) times, peaked in time. Treating all data as equally “hot” has a cost. Treating all metadata as “hot” has a cost 
 Most of our data is reproducible. RAW data is not reproducible, all the rest is reproducible. “Very complicated, labor intensive, organizationally expensive, error prone to reproduce” == “irreproducible”. Treating all data as irreproducible has a cost Simone Campana – ATLAS Week 13/03/17

Simone Campana – ATLAS Week Events The event service works today with fine grained data at the event (range) level. Presently this fine granularity is short-lived DCC could take on a scalable, flexible, extensible means of recording such fine grained information. (Quasi) persistently Simone Campana – ATLAS Week 13/03/17

Event Streaming and Caching The Event Streaming Service would be the complement of the Event Service, for asynchronous delivery of data to be processed at fine granularity. Events or event collections A first implementation could deliver client-side pre fetch. In a more sophisticated scenario, a central intelligence mediates the dialog between the WFMS and the DDM systems, complementing the data transfer capabilities of DDM with a server side data access system It can be complemented with hierarchical caching based on data meta-information and access pattern statistics Simone Campana – ATLAS Week 13/03/17

Simone Campana – ATLAS Week The DCC “whiteboard” Supplements the capability to “annotate” meta information ‘Data in play’ whiteboard: at any given time, a subset of datasets/collections is ‘in play’, in use in the system. As this data is manipulated, replicated etc. knowledge about it could be dynamically cached in a whiteboard at the event collection level The whiteboard can flexibly receive arbitrary information associated with particular event collections (eg via tags). Information can be auto-generated from the system or from users 
 Usage Examples Physics collections in use by an analysis group could be annotated as such in the whiteboard, with consequent special treatment Simone Campana – ATLAS Week 13/03/17

Simone Campana – ATLAS Week The DCC “whiteboard” “Processing in play” whiteboard Tasks currently (or recently) in active processing could also benefit from a whiteboard. e.g. for monitoring purposes. Also, others could add to task whiteboard entries with annotations and information.
 A “request level” whiteboard would have its uses also
 could cross correlate prodsys requests with spreadsheets entries managed by PMG/MC prod (via programmatic API) to automate the refreshing of information on submission and processing status
 Requests associated with a particular analysis group, paper or CP group could be tagged as such, and this tagging propagated through the downstream processing and data products Simone Campana – ATLAS Week 13/03/17

Simone Campana – ATLAS Week The DCC “whiteboard” The definition of “in play” is flexible. Annotations can be made persistent after the entity they refer to is not “in play” anymore. If we accept the concept of hierarchical storage, we have to accept the concept of hierarchical meta-storage (different latency for different levels) How many whiteboards do we need, which technology, which architecture... You do not expect me to answer all that, right? The ‘whiteboard’ approach is an R&D/exploratory approach that gives an easy means of adding and accessing information to play with possible uses, quickly prototype an idea. Simone Campana – ATLAS Week 13/03/17

More ideas and use cases DCC being an essential piece in ensuring reproducibility. Needs to be implemented consistently across components DCC as a great opportunity to reduce complexity: simple decisions like “how to call a campaign: MC16c of MC17” today are forced upon us by complexity (information being sometime disperse, eventually consistent, hard coded) Simone Campana – ATLAS Week 13/03/17