HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.

Slides:



Advertisements
Similar presentations
Categories of I/O Devices
Advertisements

Operating System.
Processes Management.
Distributed systems Programming with threads. Reviews on OS concepts Each process occupies a single address space.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Processes CSCI 444/544 Operating Systems Fall 2008.
ACAT 2002, Moscow June 24-28thJ. Hernández. DESY-Zeuthen1 Offline Mass Data Processing using Online Computing Resources at HERA-B José Hernández DESY-Zeuthen.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
The Publisher-Subscriber Interface Timm Morten Steinbeck, KIP, University Heidelberg Timm Morten Steinbeck Technical Computer Science Kirchhoff Institute.
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration Timm Morten Steinbeck, Computer Science.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Belle computing ACAT'2002, June 24-28, 2002, Moscow Pavel Krokovny BINP, Novosibirsk On behalf of Belle Collaboration.
Chapter 3 Operating Systems Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Operating systems CHAPTER 7.
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
Input/OUTPUT [I/O Module structure].
1 Distributed Operating Systems and Process Scheduling Brett O’Neill CSE 8343 – Group A6.
Resource management system for distributed environment B4. Nguyen Tuan Duc.
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
Computing and the Web Operating Systems. Overview n What is an Operating System n Booting the Computer n User Interfaces n Files and File Management n.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
The Computing System for the Belle Experiment Ichiro Adachi KEK representing the Belle DST/MC production group CHEP03, La Jolla, California, USA March.
Ichiro Adachi ACAT03, 2003.Dec.021 Ichiro Adachi KEK representing for computing & DST/MC production group ACAT03, KEK, 2003.Dec.02 Belle Computing System.
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
David N. Brown Lawrence Berkeley National Lab Representing the BaBar Collaboration The BaBar Mini  BaBar  BaBar’s Data Formats  Design of the Mini 
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Data Acquisition System of SVD2.0 This series of slides explains how we take normal, calibration and system test data of the SVD 2.0 and monitor the environment.
OS, , Part II Processes Department of Computer Engineering, PSUWannarat Suntiamorntut.
1 Implementing Monitoring and Reporting. 2 Why Should Implement Monitoring? One of the biggest complaints we hear about firewall products from almost.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Operating System Principles And Multitasking
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
SuperBelle Event Building System Takeo Higuchi (IPNS, KEK) S.Y.Suzuki (CC, KEK) 2008/12/12 1st Open Meeting of the SuperKEKB Collaboration.
The KLOE computing environment Nuclear Science Symposium Portland, Oregon, USA 20 October 2003 M. Moulson – INFN/Frascati for the KLOE Collaboration.
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
LHCb DAQ system LHCb SFC review Nov. 26 th 2004 Niko Neufeld, CERN.
PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.
Distributed System Services Fall 2008 Siva Josyula
Modeling PANDA TDAQ system Jacek Otwinowski Krzysztof Korcyl Radoslaw Trebacz Jagiellonian University - Krakow.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Operating System. Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered.
MC Production in Canada Pierre Savard University of Toronto and TRIUMF IFC Meeting October 2003.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
L. Perini DATAGRID WP8 Use-cases 19 Dec ATLAS short term grid use-cases The “production” activities foreseen till mid-2001 and the tools to be used.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
1.3 Operating system services An operating system provide services to programs and to the users of the program. It provides an environment for the execution.
Modeling event building architecture for the triggerless data acquisition system for PANDA experiment at the HESR facility at FAIR/GSI Krzysztof Korcyl.
OpenMosix, Open SSI, and LinuxPMI
Operating System.
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
Process Management Presented By Aditya Gupta Assistant Professor
VIRTUAL SERVERS Presented By: Ravi Joshi IV Year (IT)
Introduction to Operating System (OS)
 YongPyong-High Jan We appreciate that you give an opportunity to have this talk. Our Belle II computing group would like to report on.
Introduction to Operating Systems
Introduction to Operating Systems
Operating System Overview
Presentation transcript:

HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis Framework

BELLE/CHEP20002 Introduction to B Factory at KEK KEK-B Accelerator is e + e - asymmetric energy collider: 3.5GeV/c for positrons 8.0GeV/c for electrons Designed luminosity is 1.0 x cm -2 s -1 Now KEK-B is operated at ~5.0 x cm -2 s -1 BELLE Experiment Goal of BELLE experiment is to study CP violation in B meson decays Experiment is in progress at KEK

3 BELLE Detector

BELLE/CHEP20004 BELLE Detector SVD Precise vertex detection CDC Track momentum reconstruction Particle ID with dE/dx ACC Aerogel Cherenkov counter for particle ID TOF Particle ID and trigger ECL Electromagnetic calorimeter for e - and  reconstruction KLM Muon and K L and detection EFC Electromagnetic calorimeter for luminosity measurement

BELLE/CHEP20005 Current Event Reconstruction Computing Environments Event reconstruction is performed on 8 SMP machines UltraEnterprise x 7 servers equipped with 28 CPUs Total CPU power is 1,200 SEPCint95 Sharing CPUs with user analysis jobs MC production is done on PC farm (P3 500MHz x 4 x 16) Reconstruction Speed 15Hz/server 70Hz/server with L4 (5.0 x cm -2 s -1 )

BELLE/CHEP20006 Necessity for System Upgrade In Future We will have more luminosity 200Hz after L4 (1.0 x cm -2 s -1 ) Data size may increase more Possibly background It causes lack of computing power We need 10 times of current computing power when considering DST reproduction and user analysis activities

BELLE/CHEP20007 Next Computing System Low Cost Solution We will build new computing farm with sufficient computing power Computing servers will consist of ~50 units of 4-CPU PC servers with Linux ~50 units of 4-CPU SPARC servers with Solaris Total CPU power will be 12,000 SPECint95

8 Configuration of Next System switch hub PC Sun I/O servers PC servers Tape I/O: 24MB/s Gigabit 100Base-T tape libraryswitch FS file server

BELLE/CHEP20009 Current Analysis Framework BELLE AnalysiS Framework (B.A.S.F.) B.A.S.F. supports event by event parallel processing on SMP machines hiding parallel processing nature from users B.A.S.F. is currently used widely in BELLE from DST production to user analysis We develop an extension to B.A.S.F. to utilize many PC servers connected via network to be used in next computing system

10 New Analysis Framework New Framework Should Provide: Event by event parallel processing capability over network Resource usage optimization Maximize total CPU usage Draw maximum I/O rate from tape servers Capability of handling other purpose than DST production User analysis, Monte Carlo simulation or anything Application for parallel processing at university site dBASF – Distributed B.A.S.F Super-framework for B.A.S.F.

Job Client Resource Link of dBASF Servers report of resource usages SPARC PC server B.A.S.F. I/O B.A.S.F. I/O PC server dynamic change of node allocation init/term B.A.S.F.

BELLE/CHEP Communication among Servers Functionality Call function on a remote node by sending a message Shared memory expanded over network space Implementation NSM – Network Shared Memory House-grown product Originally used for BELLE DAQ Based on TCP and UDP

BELLE/CHEP Components of dBASF dBASF Client User interface Accepts from user: B.A.S.F. execution script Number of CPUs to be allocated for analysis Asks Resource manager to allocate B.A.S.F. daemons Resource manager returns allocated nodes Initiates B.A.S.F. execution on allocated nodes Waits for completion Notified from B.A.S.F. daemons when job ends

BELLE/CHEP Components of dBASF Resource Manager Collects resource usage from B.A.S.F. daemons through NSM shared memory CPU load Network traffic rate Monitors idling B.A.S.F. daemons of each dBASF session Increase/decrease number of allocated B.A.S.F. daemons dynamically when better assignment is discovered

BELLE/CHEP Components of dBASF B.A.S.F. Daemon Runs on each computing server Accepts ‘initiation request’ from dBASF client and forks B.A.S.F. processes Reports resource usage to Resource manager through NSM shared memory

BELLE/CHEP Components of dBASF I/O Daemon Reads tapes or disk files and distributes events to B.A.S.F. running on each node through network Collects processed data from B.A.S.F. through network and writes them to tapes or disk files In case of Monte Carlo event generation, event generator output is distributed to B.A.S.F. where detector simulation is running

BELLE/CHEP Components of dBASF Miscellaneous Servers Histogram server Merges histogram data accumulated on each node Output server Collects standard out on each node and saves them to file

BELLE/CHEP Resource Management Best Performance Achieved when total I/O rate becomes maximum with minimum number of CPUs Dynamic Load Balancing CPU bound: Increase number of Computing servers so that I/O speed becomes maximum I/O bound: Decrease number of Computing servers so as not to change I/O speed

BELLE/CHEP Resource Management Load Balancing When n now CPUs are assigned for a job, best assignment number of CPUs; n new is given by:

report of resource usage Resource Management Resource B.A.S.F. best allocation?no B.A.S.F. Job Client increase node decrease node initiate B.A.S.F. terminate B.A.S.F. B.A.S.F.

BELLE/CHEP HistogramSTDOUT PC servers Data Flow I/O SPARC B.A.S.F. Raw Data Processed Data PC servers TCP/IP

22 Status System test is in progress on BELLE PC farm consisting of 16 units of P3 550MHz x 4 servers Node-to-node communication framework was developed and being tested Resource management algorithm is under study Basic speed test of network data transfer has been finished Fastether: Point-to-Point, 1-to-n GigbitEther: Point-to-point, 1-to-n New computing system will be available in March 2001

BELLE/CHEP Summary We will build computing farm of 12,000 SPECint95 with PC Linux and Solaris servers to solve facing computing power shortness We began to develop management scheme of computing system extending current analysis framework We have developed communication framework and are studying resource management algorithm