Overview of the Belle II computing

Slides:



Advertisements
Similar presentations
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Advertisements

K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
A tool to enable CMS Distributed Analysis
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep ,
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Status of ILC Computing Model and Cost Study Akiya Miyamoto Norman Graf Frank Gaede Andre Sailer Marcel Stanitzki Jan Strube 1.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Belle II Data Management System Junghyun Kim, Sunil Ahn and Kihyeon Cho * (on behalf of the Belle II Computing Group) *Presenter High Energy Physics Team.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Management System of Event Processing and Data Files Based on XML Software Tools at Belle Ichiro Adachi, Nobu Katayama, Masahiko Yokoyama IPNS, KEK, Tsukuba,
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
PHENIX and the data grid >400 collaborators Active on 3 continents + Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep , 2014 Draft.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
KISTI & Belle experiment Eunil Won Korea University On behalf of the Belle Collaboration.
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
Jean-Roch Vlimant, CERN Physics Performance and Dataset Project Physics Data & MC Validation Group McM : The Evolution of PREP. The CMS tool for Monte-Carlo.
Status Report on the Validation Framework S. Banerjee, D. Elvira, H. Wenzel, J. Yarba Fermilab 15th Geant4 Collaboration Workshop 10/06/
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
The GridPP DIRAC project DIRAC for non-LHC communities.
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
Workflows and Data Management. Workflow and DM Run3 and after: conditions m LHCb major upgrade is for Run3 (2020 horizon)! o Luminosity x 5 ( )
OPERATIONS REPORT JUNE – SEPTEMBER 2015 Stefan Roiser CERN.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
TAGS in the Analysis Model Jack Cranshaw, Argonne National Lab September 10, 2009.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
The GridPP DIRAC project DIRAC for non-LHC communities.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
(Prague, March 2009) Andrey Y Shevel
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Grid site as a tool for data processing and data analysis
Belle II Physics Analysis Center at TIFR
SuperB and its computing requirements
ALICE Monitoring
POW MND section.
Joseph JaJa, Mike Smorul, and Sangchul Song
LHCb Software & Computing Status
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Readiness of ATLAS Computing - A personal view
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Computing at CEPC Xiaomei Zhang Xianghu Zhao
Job workflow Pre production operations:
Cloud based Open Source Backup/Restore Tool
WLCG Collaboration Workshop;
ILD Ichinoseki Meeting
Monitoring of the infrastructure from the VO perspective
Xiaomei Zhang On behalf of CEPC software & computing group Nov 6, 2017
N. De Filippis - LLR-Ecole Polytechnique
R. Graciani for LHCb Mumbay, Feb 2006
 YongPyong-High Jan We appreciate that you give an opportunity to have this talk. Our Belle II computing group would like to report on.
Gridifying the LHCb Monte Carlo production system
The LHCb Computing Data Challenge DC06
Presentation transcript:

Overview of the Belle II computing Y. Kato (KMI, Nagoya)

Belle → Belle II → Huge computing resource KEKB→SuperKEKB ・ Fine segmentation. ・ Waveform sampling. ・~40 times luminosity (8×1035 cm-2s-1) ・~50 times integrated luminosity (50 ab-1) Huge data sample for large collaboration → Huge computing resource

Computing resource for Belle II CPU (kHEPSpec) Tape (PB) **1core~=10 HepSpec ~104 cores ・ Estimation until 2021 (~20 ab-1). ・At the end of data taking (50 ab-1), more than - 100000 core CPU - 100 PB storage are expected to be needed to store and analyze data in a timely manner.

More than 100 PB? Big data in 2012 ・Similar to Google search index or Contents uploaded to Facebook (per year). ・Impossible to be hosted by a single institute. → Distributed computing Each institute prepare the resources. Connect by network. LHC (2012) 15PB Census Bureau 3.7PB Business e-mail 3000PB/year Google search index 100PB/year Climate DB 6PB NASDAQ 3PB YouTube video 15PB/year Library of Congress 3PB Contents of Facebook 180PB/year Digital Health 30PB https://www.wired.com/2013/04/bigdata

Belle II computing model First 3 years

BelleDIRAC Belle II computing system Grid Cloud Local cluster …. What we need is.. Resource -Grid: Relatively big sites. Store data/MC sample (and produce). Cloud “Commercial resource” MC production Local computing cluster Relatively small resource in universities. Extension to meet experimental requirements - Automation of MC production, raw data processing → Production system - User interface. - Analysis framework - etc Interface for different resource type=DIRAC (Distributed Infrastructure with Remote Agent Control) Originally developed by LHCb The open source, good extensibility Used in many experiments: LHCb, BES III, CTA, ILC, and so on BelleDIRAC Grid Cloud Local cluster ….

PS Production system and its development Belle DIRAC DIRAC Resource Definition ・ MC prod / data process ・ Type (BB, ττ, ccbar..) ・ # of events ・ software version ・ etc.. PS -Production -Distribution -Merge Production manager (human) Define “Production” Distributed data management system Belle DIRAC Fabrication system output info ・Gather outputs to major storage (and distribute over the world) ・Check status of storages ・Define “Transfers” ・ Define jobs ・ Re-define failed job ・ Verify output files Monitor DIRAC DIRAC Transfer management DIRAC Job management Submit job on site Resource Destination storage Temporary storage Computing site

MC production campaigns ・ Test the validity of the computing model/system. ・ Provide simulation samples for the sensitivity study. Auto job submission Automatic Issue detection (5th) Data distribution by DDM 6th 7th(ongoing) 20 kjobs 4th 5th Manual job submission 3rd 2nd 1st ・ ~50 computing sites join in the latest campaign. ・ More than 20k jobs can be handled now. ・ Gradually automating the production procedure. ・ Belle II colleagues take computing shifts from 4th campaign as an official service task.

KMI contributions Significant contribution from KMI ・ Resource Belle II dedicated resource in KMI ・ Resource ・ 360 (+α) CPU cores. ・ 250 TB storage. ・ Grid middleware (EMI 3) installed. ・ DIRAC server. ・ Operation by physicists → Learned a lot on operation of a computing site. ・ Development of monitoring system - To maximize the availability of resources - Automatic detection of the problematic sites - Operation and development of the shift manual

Contribution of KMI resources CPU usage during MC7 ~7% Data transferred during MC7 ・ Serving as destination storage. ・ Executed more than 1×105 jobs in MC 7 ・ We will purchase ~200 cores CPU in this year (~1.5 times).

Monitoring system ・Many interfaces → Need to identify “where the trouble happens” ・ Store and process information of each step in database. ・ Analyze log file to identify the origin of problem further. ・ Show the list of problems on the web, if detected. Automatic issue detector

Monitoring system (active way) Actively collect site status by submitting diagnosis job. SiteCrawler: Check the site environment to execute Belle II job Job Submission check: As DIRAC does not record failure reason, job submission is tried and record the result. Our activities maximize the availability of the resource !

Production progress monitoring ・Under job ・Transfer ・Submitted ・Finished

Future prospects and coming events ・ Continue to improve the system - Maximize the throughput. - More automated monitoring/operation. ・ Cosmic ray data processing (2017) - First real use case to try raw data processing workflow. ・ System dress rehearsal (2017): before Phase-2 runs - To try the full chain workflow from raw data to skim. ・ Start of the phase 2 run in 2018.

Summary ・ Belle II adopted the distributed computing model to cope with required computing resource (first experiment hosted at Japan). ・ "BelleDIRAC" is being developed to meet experimental requirements and validated at the MC production campaigns. ・ KMI has a huge contribution on distributed computing: - Resources - Development of the monitor and upgrade the resource in this year . ・ In 2017, the processing of comic ray data and System dress rehearsal will be performed.

Backup

Distributed computing Local computer: -Interactive -Data on local disk. Computer cluster (Belle case): Send “job” “Homogeneous” resource in single place. Data on shared disk Distributed computing: - Send Job (to central service) “Heterogeneous” resource distributed over the world. - Data distributed over the world.

Network data challenge result Network challenge results

User analysis in Belle II Physics gurus perform bulk MDST Production Skimming group makes skims from MDST Almost all user analysis will originate from skims (Data samples > 1 ab-1 require this) We should plan for at least 200 simultaneous analyses Skim production will require an automated Fabrication system Options for Physics skims from bulk MDST Output skimmed MDST Output Physics Objects + MDST (μDST) <= Baseline solution Direct copy to WN using DIRAC Stream data using XRootD/http Output Index files which point to skimmed events <= Under investigation Access bulk MDST directly via root on large cluster Stream from bulk MDST using XRootD/http Local Cluster/Workstation root Validated by Physics group

μDST E E T T T B B E T E T T MDST, Keep T E E T T T E T T B μDST, Keep   E E T       T         T B B     E T     E T   T MDST, Keep T E E T T T E T T B μDST, Keep T E E T T T E T T