The LHC Computing Models and Concepts Continue to Evolve Rapidly and Need To How to Respond ?

Slides:



Advertisements
Similar presentations
Advanced 5G Network Infrastructure for the Future Internet From IoT to U-HDTV, ubiquity Restless Pressure on bandwidth, spectrum crunch Complex traffic.
Advertisements

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Module 1: Demystifying Software Defined Networking Module 2: Realizing SDN - Microsoft’s Software Defined Networking Solutions with Windows Server 2012.
Cloud Computing to Satisfy Peak Capacity Needs Case Study.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Dynamically Provisioned Networks as a Substrate for Science David Foster CERN.
1© Copyright 2015 EMC Corporation. All rights reserved. SDN INTELLIGENT NETWORKING IMPLICATIONS FOR END-TO-END INTERNETWORKING Simone Mangiante Senior.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Network Aware Resource Allocation in Distributed Clouds.
جلسه دهم شبکه های کامپیوتری به نــــــــــــام خدا.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
PanDA A New Paradigm for Computing in HEP Kaushik De Univ. of Texas at Arlington NRC KI, Moscow January 29, 2015.
Thoughts on Future LHCOPN Some ideas Artur Barczyk, Vancouver, 31/08/09.
Challenging CMS Computing with network-aware systems CHEP 2013T.Wildish / Princeton1 Challenging data and workload management in CMS Computing with network-aware.
FUTURE OF NETWORKING SAJAN PAUL JUNIPER NETWORKS.
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Connect communicate collaborate GÉANT3 Services Connectivity and Monitoring Services by and for NRENs Ann Harding, SWITCH TNC 2010.
1 Networking in the WLCG Facilities Michael Ernst Brookhaven National Laboratory.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
ALICE – networking LHCONE workshop 10/02/ Quick plans: Run 2 data taking Both for Pb+Pb and p+p – Reach 1 nb -1 integrated luminosity for rare.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
William Stallings Data and Computer Communications
ANSE: Advanced Network Services for [LHC] Experiments Artur Barczyk, California Institute of Technology for the ANSE team LHCONE Point-to-Point Service.
Communication Networks - Overview CSE 3213 – Fall November 2015.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
From the Transatlantic Networking Workshop to the DAM Jamboree to the LHCOPN Meeting (Geneva-Amsterdam-Barcelona) David Foster CERN-IT.
Advanced Network Services for Experiments (ANSE) Introduction: Progress, Outlook and Issues Harvey B Newman California Institute of Technology ANSE Annual.
Connect. Communicate. Collaborate Operations of Multi Domain Network Services Marian Garcia Vidondo, DANTE COO TNC 2008, Bruges May.
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
ANSE: Advanced Network Services for Experiments Institutes: –Caltech (PI: H. Newman, Co-PI: A. Barczyk) –University of Michigan (Co-PI: S. McKee) –Vanderbilt.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data Management Highlights in TSA3.3 Services for HEP Fernando Barreiro Megino,
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
PanDA & Networking Kaushik De Univ. of Texas at Arlington ANSE Workshop, CalTech May 6, 2013.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT 
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
The DYNES Architecture & LHC Data Movement Shawn McKee/University of Michigan For the DYNES collaboration Contributions from Artur Barczyk, Eric Boyd,
Admela Jukan jukan at uiuc.edu March 15, 2005 GGF 13, Seoul Issues of Network Control Plane Interactions with Grid Applications.
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
Run - II Networks Run-II Computing Review 9/13/04 Phil DeMar Networks Section Head.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
تجارت الکترونیک سیار جلسه پنجم مدرس : دکتررامین کریمی.
PanDA & Networking Kaushik De Univ. of Texas at Arlington UM July 31, 2013.
DATA ACCESS and DATA MANAGEMENT CHALLENGES in CMS.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Computing models, facilities, distributed computing
Establishing End-to-End Guaranteed Bandwidth Network Paths Across Multiple Administrative Domains The DOE-funded TeraPaths project at Brookhaven National.
Joint Genome Institute
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Thoughts on Computing Upgrade Activities
ExaO: Software Defined Data Distribution for Exascale Sciences
Cloud Consulting Services and Solutions
Presentation transcript:

The LHC Computing Models and Concepts Continue to Evolve Rapidly and Need To How to Respond ?

PhEDEx (CMS): Physics Experiment Data Export 2  Pull Model: Moderate the download rate to match the storage capabilities  Tier2 sites for download still selected manually  Download agents communicate via a Database [“blackboard”]  Original assumption: network is a scarce and fragile resource  Now need to adapt to higher speed + more reliable networks  Desired Upgrades (Wildish): Faster switching among source-sites  CMS-wide scheduling to avoid competition on shared links and end points  Wider range of use cases: Possibly include downloads directly to desktops  Dynamic Circuits to ensure, and control bandwidth In Use Since Hundreds of Petabytes transferred

1. Faster User Analysis  Analysis jobs normally go to sites with local data: sometimes leads to long wait times due to queuing  Could use network information to assign work to ‘nearby’ sites with idle CPUs and good connectivity 2. Cloud Selection  Tier2s are connected to Tier1 “Clouds”, manually by the ops team (may be attached to multiple Tier1s)  To be automated using network info: Algorithm under test 3. PD2P = PanDA Dynamic Data Placement: Asynchronous usage-based  Repeated use of data or Backlog in Processing  Make add’l copies  Rebrokerage of queues  New data locations  PD2P is perfect for network integration  Use network for site selection – to be tested soon  Try SDN provisioning since this usually involves large datasets; requires some dedicated network capacity USE CASES Kaushik De

CMS: Location Independent Access: Blurring the Boundaries Among Sites  Once the archival functions are separated from the Tier-1 sites, the functional difference between the Tier-1 and Tier-2 sites becomes small  Connections and functions of sites are defined by their capability, including the network!! Maria Girone 4 Tier-1 Tier-1 Tier-1 Tier-1 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-0 Tier-2Tier-2Tier-2Tier-2 Tier-2 CAF Tier-2Tier-2 Tier-1 Tier-1 Tier Scale tests ongoing: Goal: 20% of data across wide area; 200k jobs/day, 60k files/day, O(100TB)/day “Cloud” Model

T2s in several regions are getting ~an order of magnitude more data from BNL than the associated T1s CA T1 CA T2s DE T1 DE T2s UK T1 UK T2s FR T1 FR T2s 2H 2013 Volume was ~twice that of 1H 2012, even without data taking. Exponential growth in data transfers continues, driven by Tier2 data usage. Expect new peaks by and during LHC Run 2

ANA-100 Link in Service July 16 Transfer Rates: Caltech Tier2 to Europe July 17 ●Peak upload rate: 26.9 Gbps ●Average upload rate over 1h of manual transfer requests : 23.4 Gbps ●Average upload rate over 2h (1h manual+ 1h automatic) : 20.2 Gbps ●Peak rate to CNAF alone: 20 Gbps

Transfer Caltech  Europe elevates usage of Internet2 to > 40% occupancy on some segments

Gbps Traffic peak Gbps Phoenix - LA observed during these transfers This is a possible limiting factor on the traffic received at Caltech Microbursts are often not reported by the monitoring clients Traffic peak Gbps Phoenix - LA observed during these transfers This is a possible limiting factor on the traffic received at Caltech Microbursts are often not reported by the monitoring clients Internet2 Network Map AL2S Traffic Statistics Message: At anywhere near this level of capability, we need to control Message: At anywhere near this level of capability, we need to control our network use, to prevent saturation as we move into production.

9   W. Johnston, ESnet Manager (2008) On Circuit-Oriented Network Services Traffic Isolation; Security; Deadline Scheduling; High Utilization; Fairness 

DYNES: Dynamic Circuits Nationwide System. Created by Caltech, Led by Internet2 10 DYNES goal is to extend circuit capabilities to ~50 US campuses Turns out to be nontrivial Functionality will be an integral part of LHCONE point-to-point service: An Opportunity - Via SDN (OpenFlow and OpenDaylight) Partners: I2, Caltech, Michigan, Vanderbilt. Working with ESnet on dynamic circuit software Extending the OSCARS scope; Transition: DRAGON to PSS, OESS

HEP Energy Frontier Computing Decadal Retrospective and Outlook for 2020(+)  Resources & Challenges Grow at Different Rates Compare Tevatron Vs LHC ( )  Computing capacity/experiment: 30+ X  Storage capacity: X  Data served per day: 400 X  WAN Capacity to Host Lab 100 X  TA Network Transfers Per Day 100 X  Challenge: 100+ X the storage (tens of EB) unlikely to be affordable  Need to better use the technology  An agile architecture exploiting globally distributed clouds, grids, specialized (e.g. GPU) & opportunistic resources  A Services System that provisions all of it, moves the data more flexibly and dynamically, and behaves coherently;  Co-scheduling network, CPU and storage Snowmass Computing Frontier Sessions Challenges Shared by Sky Survey, Dark Matter and CMB Experiments. SKA: 300 – 1500 Petabyes per Year SKA: Several Pbps to the Correlators

Key Issue and Approach to a Solution: Next Generation System for Data Intensive Research  Present Solutions will not scale  We need: an agile architecture exploiting globally distributed grid, cloud, specialized (e.g. GPU) & opportunistic computing resources  A Services System that moves the data flexibly and dynamically, and behaves coherently  Examples do exist, with smaller but still very large scope  A pervasive, agile autonomous agent architecture that deals with complexity  Developed by talented system developers with a deep appreciation of networks Grid Job Lifelines-* Grid Topology MonALISA Automated Transfers on Dynamic Networks MonALISA ALICE Grid

Message on Science Drivers Discoveries and Global Impact  Reaching for the next level of knowledge - New “Invariants”: (1) Data + Network Intensiveness (2) Global Reach  Instruments with unprecedented reach (energy, intensity; speed & scope of investigations; dealing with complexity; precision)  Mission Focus: Envisage and develop the solutions (physics); Design and build new detectors, instruments, methods; Design and build the Systems  Persistence: Program Lasting Years  Ph. D Units Decades  The Imperative of New Models: vastly greater operational efficiency leads to greater science output … and Discoveries  Key Questions for the Scientist/Designer/Discoverer:  How to Solve the Problems; How to bring the Discoveries in Reach  Grappling with many fields of science, and many fields of technology  The Art and Science of (Network) Requirements:  Bottom Up: Fixed budgets;  Top Down - Evolution and Revolution – to the next generation  Tracking emerging technologies, capabilities, affordability  Asking the Right Question: Maximizing capability within the budget 13