A. Vaniachine XXIV International Symposium on Nuclear Electronics & Computing Varna, Bulgaria, 9-16 September 2013 Big Data Processing on the Grid: Future.

Slides:

Advertisements

Similar presentations

Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.

Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.

U.S. Department of Energy’s Office of Science Basic Energy Sciences Advisory Committee Dr. Daniel A. Hitchcock October 21, 2003

Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.

Retail Planning & Optimization Solution Elevator Pitches.

NetWORKS Strategy Manugistics NetWORKS Strategy 6.2.

Maria Grazia Pia, INFN Genova 1 Part V The lesson learned Summary and conclusions.

1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,

Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

“Software's Chronic Crisis” by W. Wayt Gibbs

Office of Science U.S. Department of Energy U.S. Department of Energy’s Office of Science Dr. Raymond L. Orbach Under Secretary for Science U.S. Department.

Workload Management Massimo Sgaravatto INFN Padova.

NGNS Program Managers Richard Carlson Thomas Ndousse ASCAC meeting 11/21/2014 Next Generation Networking for Science Program Update.

DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.

2015 World Forum on Energy Regulation May 25, 2015

1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,

Mining Large Data at SDSC Natasha Balac, Ph.D.. A Deluge of Data Astronomy Life Sciences Modeling and Simulation Data Management and Mining Geosciences.

The Pursuit for Efficient S/C Design The Stanford Small Sat Challenge: –Learn system engineering processes –Design, build, test, and fly a CubeSat project.

The Microsoft Office 2007 Enterprise Project Management Solution:

4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.

High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.

Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.

Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.

What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.

4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.

ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.

1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.

ATLAS Grid Data Processing: system evolution and scalability D Golubkov, B Kersevan, A Klimentov, A Minaenko, P Nevski, A Vaniachine and R Walker for the.

Experts in numerical algorithms and High Performance Computing services Challenges of the exponential increase in data Andrew Jones March 2010 SOS14.

ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.

ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.

Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.

CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.

MODEL-BASED SOFTWARE ARCHITECTURES.  Models of software are used in an increasing number of projects to handle the complexity of application domains.

Experiment Management from a Pegasus Perspective Jens-S. Vöckler Ewa Deelman

Digital recordkeeping strategy for mobile work processes Joel Smith HPRM System Administrator.

Integrated Corridor Management Initiative ITS JPO Lead: Mike Freitas Technical Lead: John Harding, Office of Transportation Management.

HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.

David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.

Ian Bird CERN, 17 th July 2013 July 17, 2013

1 Kostas Glinos European Commission - DG INFSO Head of Unit, Géant and e-Infrastructures "The views expressed in this presentation are those of the author.

Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Scientific days, June 16 th & 17 th, 2014 This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX ) funded by the French program.

Nigel Lockyer Fermilab Operations Review 16 th -18 th May 2016 Fermilab in the Context of the DOE Mission.

1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.

IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.

Overview of SAP Application Services By Accely. Introduction Developed organizations in any business industry will invest in SAP programs to offer progressive.

Nigel Lockyer Fermilab Operations Review 16 th -18 th May 2016 Fermilab in the Context of the DOE Mission.

Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.

INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.

EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu

Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit

Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.

Workload Management Workpackage

Organizations Are Embracing New Opportunities

Ian Bird WLCG Workshop San Francisco, 8th October 2016

BigPanDA Workflow Management on Titan

CIM Modeling for E&U - (Short Version)

for the Offline and Computing groups

RESEARCH, EDUCATION, AND TRAINING FOR THE SMART GRID

New strategies of the LHC experiments to meet

SDM workshop Strawman report History and Progress and Goal.

ExaO: Software Defined Data Distribution for Exascale Sciences

Presentation transcript:

A. Vaniachine XXIV International Symposium on Nuclear Electronics & Computing Varna, Bulgaria, 9-16 September 2013 Big Data Processing on the Grid: Future Research Directions

A. Vaniachine Big Data Processing on the Grid 2

A Lot Can be Accomplished in 50 Years: Nuclear Energy Took 50 Years from Discovery to Use  1896: Becquerel discovered radioactivity  1951: Reactor at Argonne generated electricity for light bulbs A. Vaniachine Big Data Processing on the Grid 3

A Lot Has Happened in 14 Billion Years A. Vaniachine Big Data Processing on the Grid 4 Electroweek phase transition  Everything is a remnant of the Big Bang, including the energy we use: –Chemical energy: scale is eV Stored millions of years ago –Nuclear energy: scale is MeV or million times higher than chemical Stored billions of years ago –Electroweek energy: scale is 100 GeV or 100,000 times higher than nuclear Stored right after the Big Bang –Can this energy be harnessed in some useful way?

2012: Higgs Boson Discovery A. Vaniachine Big Data Processing on the Grid 5 JHEP 08 (2012) 098 Meta-stability: a prerequisite for energy use

Higgs Boson Study Makes LHC a Top Priority  European Strategy A. Vaniachine Big Data Processing on the Grid 6  US Snowmass Study

The LHC Roadmap A. Vaniachine Big Data Processing on the Grid 7

Big Data A. Vaniachine Big Data Processing on the Grid 8 LHC RAW data per year  In 2010 the LHC experiments produced 13 PB of data –That rate outstripped any other scientific effort going on

Big Data A. Vaniachine Big Data Processing on the Grid 9 LHC RAW data per year WLCG data on the Grid  LHC RAW data volumes are inflated by storing derived data products, replication for safety and efficient access, and by the need for storing even more simulated data than the RAW data  In 2010 the LHC experiments produced 13 PB of data –That rate outstripped any other scientific effort going on

Big Data A. Vaniachine Big Data Processing on the Grid 10 LHC RAW data per year WLCG data on the Grid  In 2010 the LHC experiments produced 13 PB of data –That rate outstripped any other scientific effort going on Scheduled LHC upgrades will increase RAW data taking rates tenfold  LHC RAW data volumes are inflated by storing derived data products, replication for safety and efficient access, and by the need for storing even more simulated data than the RAW data

Big Data A. Vaniachine Big Data Processing on the Grid 11 Brute force approach to scale up Big Data processing on the Grid for LHC upgrade needs is not an option

Physics Facing Limits  The demands on computing resources to accommodate the Run2 physics needs increase –HEP now risks to compromise physics because of lack of computing resources Has not been true for ~20 years From I. Bird presentation at the “HPC and super-computing workshop for Future Science Applications” (BNL, June 2013)  The limits are those of tolerable cost for storage and analysis. Tolerable cost is established in an explicit or implicit optimization of physics dollars for the entire program. The optimum rate of data to persistent storage depends on the capabilities of technology, the size and budget of the total project, and the physics lost by discarding data. There is no simple answer! From US Snowmass Study:  Physics needs drives future research directions in Big Data processing on the Grid A. Vaniachine Big Data Processing on the Grid 12

A. Vaniachine Big Data Processing on the Grid 13

US Big Data Research and Development Initiative  At the time of the “Big Data Research and Development Initiative” announcement, a $200 million investment in tools to handle huge volumes of digital data needed to spur U.S. science and engineering discoveries, two examples of successful HEP technologies were already in place: –Collaborative big data management ventures include PanDA (Production and Distributed Analysis) Workload Management System and XRootD, a high performance, fault tolerant software for fast, scalable access to data repositories of many kinds.  Supported by the DOE Office of Advanced Scientific Computing Research, PanDA is now being generalized and packaged, as a Workload Management System already proven at extreme scales, for the wider use of the Big Data community –Progress in this project was reported by A. Klimentov earlier in this session A. Vaniachine Big Data Processing on the Grid 14

Synergistic Challenges  As HEP is facing the Big Data processing challenges ahead of other sciences, it is instructive to look for commonalities in the discovery process across the sciences –In 2013 the Subcommittee of the US DOE Advanced Scientific Computing Advisory Committee prepared the Summary Report on Synergistic Challenges in Data-Intensive Science and Exascale Computing A. Vaniachine Big Data Processing on the Grid 15

Knowledge-Discovery Life-Cycle for Big Data: 1 A. Vaniachine Big Data Processing on the Grid 16 Data may be generated by instruments, experiments, sensors, or supercomputers

Knowledge-Discovery Life-Cycle for Big Data: 2 A. Vaniachine Big Data Processing on the Grid 17 (Re)organizing, processing, deriving subsets, reduction, visualization, query analytics, distributing, and other aspects In LHC experiments, this includes common operations on and derivations from raw data. The output of data processing is used by thousands of scientists for knowledge discovery.

Knowledge-Discovery Life-Cycle for Big Data: 3 A. Vaniachine Big Data Processing on the Grid 18 Although the discovery process can be quite specific to the scientific problem under consideration, repeated evaluations, what-if scenarios, predictive modeling, correlations, causality and other mining operations at scale are common at this phase Given the size and complexity of data and the need for both top-down and bottom up discovery, scalable algorithms and software need to be deployed in this phase

Knowledge-Discovery Life-Cycle for Big Data: 4 A. Vaniachine Big Data Processing on the Grid 19 Insights and discoveries from previous phases help close the loop to determine new simulations, models, parameters, settings, observations, thereby, making the closed loop While this represents a common high-level approach to data-driven knowledge discovery, there can be important differences among different sciences as to how data is produced, consumed, stored, processed, and analyzed

Data-Intensive Science Workflow  The Summary Report identified an urgent need to simplify the workflow for Data- Intensive Science –Analysis and visualization of increasingly larger-scale data sets will require integration of the best computational algorithms with the best interactive techniques and interfaces –The workflow for data-intensive science is complicated by the need to simultaneously manage large volumes of data as well as large amounts of computation to analyze the data, and this complexity is increasing at an inexorable rate  These complications can greatly reduce the productivity of the domain scientist, if the workflow is not simplified and made more flexible –For example, the workflow should be able to transparently support decisions such as when to move data to computation or computation to data A. Vaniachine Big Data Processing on the Grid 20

Lessons Learned  The distributed computing environment for the LHC has proved to be a formidable resource, giving scientists access to huge resources that are pooled worldwide and largely automatically managed –However, the scale of operational effort required is burdensome for the HEP community, and will be hard to replicate in other science communities Could the current HEP distributed environments be used as a distributed systems laboratory to understand how more robust, self-healing, self-diagnosing systems could be created?  Indeed, Big Data processing on the Grid must tolerate a continuous stream of failures, errors, and faults –Transient job failures on the Grid can be recovered by managed re-tries However, workflow checkpointing at the level of a file or a job delays turnaround times  Advancements in reliability engineering provide a framework for fundamental understanding of the Big Data processing turnaround time –Designing fault tolerance strategies that minimize the duration of Big Data processing on the Grid is an active area of research A. Vaniachine Big Data Processing on the Grid 21

Future Research Direction: Workflow Management  To significantly shorten the time needed to transform scientific data into actionable knowledge, the US DOE Advance Scientific Computing Research office is preparing a call that will include From R. Carlson presentation at the “HPC and super-computing workshop for Future Science Applications” (BNL, June 2013) A. Vaniachine Big Data Processing on the Grid 22

Maximizing Physics Output through Modeling  In preparations for LHC data taking future networking perceived as a limit –Monarc model serves as an example how to circumvent the resource limitation WLCG implemented hierarchical data flow maximizing reliable data transfers A. Vaniachine Big Data Processing on the Grid 23  Today networking is not a limit and WLCG abandoned the hierarchy –No fundamental technical barriers to transport 10x more traffic within 4 years  In contrast, future CPU and storage are perceived as a limit –HEP now risks to compromise physics because of lack of computing resources As in the days of Monarc, HEP needs comprehensive modeling capabilities that would enable maximizing physics output within the resource constraints Picture by I. Bird

Future Research Direction: Workflow Modeling From R. Carlson presentation at the “HPC and super-computing workshop for Future Science Applications” (BNL, June 2013) A. Vaniachine Big Data Processing on the Grid 24

Conclusions  Study of Higgs boson properties is a top priority for LHC physics –LHC upgrades increase demands for computing resources beyond flat budgets HEP now risks to compromise physics because of lack of computing resources  A comprehensive end-to-end solution for the composition and execution of Big Data processing workflow within given CPU and storage constraints is necessary –Future research in workflow management and modeling are necessary to provide the tools for maximizing scientific output within given resource constraints  By bringing Nuclear Electronics and Computing experts together, the NEC Symposium continues to be in unique position to promote HEP progress as the solution requires optimization cross-cutting Trigger and Computing domains A. Vaniachine Big Data Processing on the Grid 25

Extra Slides

A. Vaniachine Big Data Processing on the Grid 27