Download presentation
Presentation is loading. Please wait.
Published byHoward Cunningham Modified over 9 years ago
1
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery1 High Energy Physics and Data Grids Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu US/UK Grid Workshop San Francisco August 4-5, 2001
2
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery2 Essentials of High Energy Physics è Better name “Elementary Particle Physics” Science: Elementary particles, fundamental forces e udud cscs tbtb è Goal unified theory of nature Unification of forces (Higgs, superstrings, extra dimensions, …) Deep connections to large scale structure of universe Large overlap with astrophysics, cosmology, nuclear physics Quarks Leptons ParticlesForces Strong gluon Electro-weak , W , Z 0 Gravity graviton
3
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery3 10 -10 m~ 10 eV>300,000 Y 10 -15 m MeV - GeV 10 -16 m>> GeV~10 -6 sec 10 -18 m ~ 100 GeV~10 -10 sec 1900....Quantum Mechanics Atomic physics 1940-50Quantum Electro Dynamics 1950-65Nuclei, Hadrons Symmetries, Field theories 1965-75Quarks. Gauge theories 1990 LEP3 families, Precision Electroweak 10 -19 m~10 2 GeV Origin of masses The next step... ~10 -12 sec2007 LHCHiggs ? Supersymmetry ? 1970 83 SPS ElectroWeak unification, QCD ~ 3 min 10 -32 m~10 16 GeV~10 -32 secProton Decay ? Underground GRAND Unified Theories ? 10 -35 m~10 19 GeV (Planck scale) ~10 -43 sec??Quantum Gravity? Superstrings ? The Origin of the Universe 1994 TevatronTop quark ue + Z e - u HEP Short History + Frontiers
4
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery4 HEP Research è Experiments primarily accelerator based Fixed target, colliding beams, special beams è Detectors Small, large, general purpose, special purpose è … but wide variety of other techniques Cosmic rays, proton decay, g-2, neutrinos, space missions è Increasing scale of experiments and laboratories Forced on us by ever higher energies Complexity, scale, costs large collaborations International collaborations are the norm today Global collaborations are the future (LHC) LHC discussed in next few slides
5
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery5 The CMS Collaboration 1010 448 351 1809 Member States Non-Member States Total USA 58 36 144 Member States Total USA 50 Non-Member States Number of Scientists Number of Laboratories Slovak Republic CERN France Italy UK Switzerland USA Austria Finland Greece Hungary Belgium Poland Portugal Spain Pakistan Georgia Armenia Ukraine Uzbekistan Cyprus Croatia China Turkey Belarus Estonia India Germany Korea Russia Bulgaria China (Taiwan) 1809 Physicists and Engineers 31 Countries 144 Institutions Associated Institutes Number of Scientists Number of Laboratories 36 5
6
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery6 CERN LHC site CMS Atlas LHCb ALICE
7
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery7 High Energy Physics at the LHC “Compact” Muon Solenoid at the LHC (CERN) Smithsonian standard man
8
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery8 Particle Proton Proton 2835 bunch/beam Protons/bunch10 11 Beam energy7 TeV (7x10 12 ev) Luminosity10 34 cm 2 s 1 Crossing rate40 MHz (every 25 nsec) Collision rate~10 9 Hz Parton (quark, gluon) Proton Selection: 1 in 10 13 l l jet Bunch New physics rate ~ 10 5 Hz Collisions at LHC (2007 ?) (Average ~20 Collisions/Crossing)
9
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery9 HEP Data è Scattering is principal technique for gathering data Collisions of beam-beam or beam-target particles Typically caused by a single elementary interaction But also background collisions obscures physics è Each collision generates many particles: “Event” Particles traverse detector, leaving electronic signature Information collected, put into mass storage (tape) Each event is independent trivial computational parallelism è Data Intensive Science Size of raw event record: 20KB 1MB 10 6 10 9 events per year 0.3 PB per year (2001)BaBar (SLAC) 1 PB per year (2005)CDF, D0 (Fermilab) 5 PB per year (2007)ATLAS, CMS (LHC)
10
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery10 Data Rates: From Detector to Storage Level 1 Trigger: Special Hardware 40 MHz ~1000 TB/sec 75 KHz 75 GB/sec 5 KHz 5 GB/sec Level 2 Trigger: Commodity CPUs 100 Hz 100 MB/sec Level 3 Trigger: Commodity CPUs Raw Data to storage Physics filtering
11
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery11 LHC Data Complexity è “Events” resulting from beam-beam collisions: Signal event is obscured by 20 overlapping uninteresting collisions in same crossing CPU time does not scale from previous generations 20002007
12
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery12 40M events/sec, selectivity: 1 in 10 13 Example: Higgs Decay into 4 Muons
13
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery13 1800 Physicists 150 Institutes 32 Countries LHC Computing Challenges è Complexity of LHC environment and resulting data è Scale: Petabytes of data per year (100 PB by ~2010) Millions of SpecInt95s of CPU è Geographical distribution of people and resources
14
Transatlantic Net WG (HN, L. Price) Tier0 - Tier1 BW Requirements [*] è [*] Installed BW in Mbps. Maximum Link Occupancy 50%; work in progress
15
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery15 Hoffmann LHC Computing Report 2001 Tier0 – Tier1 link requirements (1) Tier1 Tier0 Data Flow for Analysis0.5 - 1.0 Gbps (2) Tier2 Tier0 Data Flow for Analysis0.2 - 0.5 Gbps (3) Interactive Collaborative Sessions (30 Peak) 0.1 - 0.3 Gbps (4) Remote Interactive Sessions (30 Flows Peak) 0.1 - 0.2 Gbps (5) Individual (Tier3 or Tier4) data transfers 0.8 Gbps Limit to 10 Flows of 5 Mbytes/sec each TOTAL Per Tier0 - Tier1 Link1.7 - 2.8 Gbps Corresponds to ~10 Gbps Baseline BW Installed on US-CERN Link Adopted by the LHC Experiments (Steering Committee Report)
16
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery16 LHC Computing Challenges è Major challenges associated with: Scale of computing systems Network-distribution of computing and data resources Communication and collaboration at a distance Remote software development and physics analysis Result of these considerations: Data Grids
17
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery17 Tier0 CERN Tier1 National Lab Tier2 Regional Center (University, etc.) Tier3 University workgroup Tier4 Workstation Global LHC Data Grid Hierarchy Tier 1 T2 3 3 3 3 3 3 3 3 3 3 3 Tier 0 (CERN) 4 4 4 4 3 3 Key ideas: è Hierarchical structure è Tier2 centers è Operate as unified Grid
18
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery18 Example: CMS Data Grid Tier2 Center Online System CERN Computer Center > 20 TIPS USA Center France Center Italy Center UK Center Institute Institute ~0.25TIPS Workstations, other portals ~100 MBytes/sec 2.5 Gbits/sec 100 - 1000 Mbits/sec Bunch crossing per 25 nsecs. 100 triggers per second Event is ~1 MByte in size Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Physics data cache ~PBytes/sec 2.5 Gbits/sec Tier2 Center ~622 Mbits/sec Tier 0 +1 Tier 1 Tier 3 Tier 4 Tier2 Center Tier 2 Experiment CERN/Outside Resource Ratio ~1:2 Tier0/( Tier1)/( Tier2) ~1:1:1
19
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery19 Tier1 and Tier2 Centers è Tier1 centers National laboratory scale: large CPU, disk, tape resources High speed networks Many personnel with broad expertise Central resource for large region è Tier2 centers New concept in LHC distributed computing hierarchy Size [national lab * university] 1/2 Based at large University or small laboratory Emphasis on small staff, simple configuration & operation è Tier2 role Simulations, analysis, data caching Serve small country, or region within large country
20
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery20 LHC Tier2 Center (2001) Router FEth FEth Switch GEth Switch Data Server >1 RAID Tape WAN Hi-speed channel
21
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery21 è Buy late, but not too late: phased implementation R&D Phase 2001-2004 Implementation Phase2004-2007 R&D to develop capabilities and computing model itself Prototyping at increasing scales of capability & complexity 1.4 years 1.2 years 1.1 years 2.1 years Hardware Cost Estimates
22
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery22 HEP Related Data Grid Projects è Funded projects GriPhyNUSANSF, $11.9M + $1.6M PPDG IUSADOE, $2M PPDG IIUSADOE, $9.5M EU DataGridEU$9.3M è Proposed projects iVDGLUSANSF, $15M + $1.8M + UK DTFUSANSF, $45M + $4M/yr DataTagEUEC, $2M? GridPPUKPPARC, > $15M è Other national projects UK e-Science (> $100M for 2001-2004) Italy, France, (Japan?)
23
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery23 (HEP Related) Data Grid Timeline Q2 00 Q3 00 Q4 00 Q1 01 Q2 01 Q3 01 GriPhyN approved, $11.9M+$1.6M Outline of US-CMS Tier plan Caltech-UCSD install proto-T2 Submit GriPhyN proposal, $12.5M Submit iVDGL preproposal EU DataGrid approved, $9.3M 1 st Grid coordination meeting Submit PPDG proposal, $12M Submit DTF proposal, $45M Submit iVDGL proposal, $15M PPDG approved, $9.5M 2 nd Grid coordination meeting iVDGL approved? DTF approved? DataTAG approved Submit DataTAG proposal, $2M
24
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery24 Coordination Among Grid Projects è Particle Physics Data Grid (US, DOE) Data Grid applications for HENP Funded 1999, 2000 ($2M) Funded 2001-2004 ($9.4M) http://www.ppdg.net/ è GriPhyN (US, NSF) Petascale Virtual-Data Grids Funded 9/2000 – 9/2005 ($11.9M+$1.6M) http://www.griphyn.org/ è European Data Grid (EU) Data Grid technologies, EU deployment Funded 1/2001 – 1/2004 ($9.3M) http://www.eu-datagrid.org/ HEP in common Focus: infrastructure development & deployment International scope Now developing joint coordination framework GridPP, DTF, iVDGL very soon?
25
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery25 Data Grid Management
26
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery26 PPDG BaBar Data Management BaBar D0 CDF Nuclear Physics CMSAtlas Globus Users SRB Users Condor Users HENP GC Users CMS Data Management Nuclear Physics Data Management D0 Data Management CDF Data Management Atlas Data Management Globus Team Condor SRB Team HENP GC
27
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery27 EU DataGrid Project
28
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery28 PPDG and GriPhyN Projects è PPDG focus on today’s (evolving) problems in HENP Current HEP:BaBar, CDF, D0 Current NP:RHIC, JLAB Future HEP:ATLAS, CMS è GriPhyN focus on tomorrow’s solutions ATLAS, CMS, LIGO, SDSS Virtual data, “Petascale” problems (Petaflops, Petabytes) Toolkit, export to other disciplines, outreach/education è Both emphasize Application sciences drivers CS/application partnership (reflected in funding) Performance è Explicitly complementary
29
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery29 University CPU, Disk, Users PRIMARY SITE Data Acquisition, Tape, CPU, Disk, Robot Satellite Site Tape, CPU, Disk, Robot Satellite Site Tape, CPU, Disk, Robot University CPU, Disk, Users University CPU, Disk, Users Satellite Site Tape, CPU, Disk, Robot Resource Discovery, Matchmaking, Co-Scheduling/Queueing, Tracking/Monitoring, Problem Trapping + Resolution PPDG Multi-site Cached File Access System
30
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery30 GriPhyN: PetaScale Virtual-Data Grids Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Transforms Distributed resources (code, storage, CPUs, networks) è Resource è Management è Services Resource Management Services è Security and è Policy è Services Security and Policy Services è Other Grid è Services Other Grid Services Interactive User Tools Production Team Individual Investigator Workgroups Raw data source ~1 Petaflop ~100 Petabytes
31
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery31 Virtual Data in Action è Data request may Compute locally Compute remotely Access local data Access remote data è Scheduling based on Local policies Global policies Cost Major facilities, archives Regional facilities, caches Local facilities, caches Item request
32
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery32 GriPhyN Goals for Virtual Data è Transparency with respect to location Caching, catalogs, in a large-scale, high-performance Data Grid è Transparency with respect to materialization Exact specification of algorithm components Traceability of any data product Cost of storage vs CPU vs networks è Automated management of computation Issues of scale, complexity, transparency Complications: calibrations, data versions, software versions, … Explore concept of virtual data and its applicability to data-intensive science
33
HEP and Data Grids (Aug. 4-5, 2001)Paul Avery33 Data Grid Reference Architecture
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.