Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oracle Tech DayNovember 2004 Building the world’s largest Scientific Grid Jamie Shiers Database Group, CERN, Geneva, Switzerland.

Similar presentations


Presentation on theme: "Oracle Tech DayNovember 2004 Building the world’s largest Scientific Grid Jamie Shiers Database Group, CERN, Geneva, Switzerland."— Presentation transcript:

1 Oracle Tech DayNovember 2004 Building the world’s largest Scientific Grid Jamie Shiers Database Group, CERN, Geneva, Switzerland

2 2 22 2 Jamie ShiersNovember 2004Scientific & Enterprise Grids Agenda The Need for a World-Wide Grid An Overview of the World’s Largest Scientific Grid The role of the Database Group in the above The role of the CERN openlab for DataGrid Applications The role of Enterprise Grids Summary and Conclusions

3 Oracle Tech DayNovember 2004 The Requirements The Large Hadron Collider at CERN (LHC)

4 4 44 4 Jamie ShiersNovember 2004Scientific & Enterprise Grids The European Organisation for Nuclear Research The European Laboratory for Particle Physics Fundamental research in particle physics Designs, builds & operates large accelerators Financed by 20 European countries –member states + others (Russia, US, Canada, India, ….)  1BSF budget - operation + new accelerators  2000 staff + 6000 users (researchers) from all over the world  New accelerator, Large Hadron Collidor (LHC)

5 5 55 5 Jamie ShiersNovember 2004Scientific & Enterprise Grids airport Computer Centre Geneva  27km 

6 6 66 6 Jamie ShiersNovember 2004Scientific & Enterprise Grids

7 7 77 7 Jamie ShiersNovember 2004Scientific & Enterprise Grids The LHC machine Two counter-circulating proton beams Collision Energy 7 + 7 TeV 27 Km of magnets with a field of 8.4 Tesla Super-fluid Helium cooled to 1.9°K The world’s largest superconducting structure!

8 8 88 8 Jamie ShiersNovember 2004Scientific & Enterprise Grids The Atlas Detector The ATLAS collaboration is –~2000 physicists from.. –~ 150 universities and labs –from ~ 35 countries –distributed resources –remote development The ATLAS detector is –26m long, –stands 20m high, –weighs 7000 tons –has 200 million read-out channels One of 4 LHC experiments –ALICE, ATLAS, CMS, LHCb

9 9 99 9 Jamie ShiersNovember 2004Scientific & Enterprise Grids Selectivity: 1 in 10 13 - 1 person in a thousand world populations - A needle in 20 million haystacks LHC: Higgs Decay into 4 muons

10 10 Jamie ShiersNovember 2004Scientific & Enterprise Grids simulation reconstruction analysis interactive physics analysis batch physics analysis batch physics analysis detector event summary data raw data event reprocessing event reprocessing event simulation event simulation analysis objects (extracted by physics topic) Data Handling and Computation for Physics Analysis event filter (selection & reconstruction) event filter (selection & reconstruction) processed data les.robertson@cern.ch CERN

11 11 Jamie ShiersNovember 2004Scientific & Enterprise Grids Data Hierarchy RAW Detector digitisation 10 9 events/yr * 2 MB =2 PB/yr ~2 MB/event ESD Pattern recognition information: Clusters, track candidates ~100 kB/event AOD Physical information: Transverse momentum, Association of particles, jets, (best) id of particles, ~10 kB/event TAG ~1 kB/event Relevant information for fast event selection Triggered events recorded by DAQ Reconstructed information Analysis information Classification information

12 12 Jamie ShiersNovember 2004Scientific & Enterprise Grids Event Data Complex data models –~500 structure types References to describe relationships between event objects –unidirectional Need to support transparent navigation Need ultimate resolution on selected events –need to run specialised algorithms –work interactively Not affordable if uncontrolled Event Raw Rec Phys VeloCalo Coord Tracks Event Cand RAWESDAOD versions Event MyTrk Phys Private Event AOD Collaboration Data

13 13 Jamie ShiersNovember 2004Scientific & Enterprise Grids LHC data 40 million collisions per second After filtering, 100-200 collisions of interest per second 1-10 Megabytes of data digitised for each collision = recording rate of 0.1-1 Gigabytes/sec 10 10 collisions recorded each year = ~15 Petabytes/year of data CMSLHCbATLASALICE 1 Megabyte (1MB) A digital photo 1 Gigabyte (1GB) = 1000MB A DVD movie 1 Terabyte (1TB) = 1000GB World annual book production 1 Petabyte (1PB) = 1000TB Annual production of one LHC experiment 1 Exabyte (1EB) = 1000 PB World annual information production

14 14 Jamie ShiersNovember 2004Scientific & Enterprise Grids Agenda The Need for a World-Wide Grid An Overview of the World’s Largest Scientific Grid The role of the Database Group in the above The role of the CERN openlab for DataGrid Applications The role of Enterprise Grids Summary and Conclusions

15 Oracle Tech DayNovember 2004 The Solution The Large Hadron Collider Grid (LCG)

16 16 Jamie ShiersNovember 2004Scientific & Enterprise Grids LCG Project Goals To prepare, deploy and operate the computing environment for the experiments to analyse the data from the LHC detectors Applications development environment, common tools and frameworks Build and operate the LHC computing service The Grid is just a tool towards achieving this goal

17 17 Jamie ShiersNovember 2004Scientific & Enterprise Grids LCG-2/EGEE-0 Status 24-09-2004 Total: 78 Sites ~9000 CPUs 6.5 PByte Total: 78 Sites ~9000 CPUs 6.5 PByte Cyprus

18 18 Jamie ShiersNovember 2004Scientific & Enterprise Grids Collaborating Computer Centres Building a Grid  The virtual LHC Computing Centre Grid ATLAS Virtual Organisation CMS Virtual Organisation

19 19 Jamie ShiersNovember 2004Scientific & Enterprise Grids Data Hierarchy RAW Detector digitisation 10 9 events/yr * 2 MB =2 PB/yr ~2 MB/event ESD Pattern recognition information: Clusters, track candidates ~100 kB/event AOD Physical information: Transverse momentum, Association of particles, jets, (best) id of particles, ~10 kB/event TAG ~1 kB/event Relevant information for fast event selection Triggered events recorded by DAQ Reconstructed information Analysis information Classification information

20 20 Jamie ShiersNovember 2004Scientific & Enterprise Grids Event Data Complex data models –~500 structure types References to describe relationships between event objects –unidirectional Need to support transparent navigation Need ultimate resolution on selected events –need to run specialised algorithms –work interactively Not affordable if uncontrolled Event Raw Rec Phys VeloCalo Coord Tracks Event Cand RAWESDAOD versions Event MyTrk Phys Private Event AOD Collaboration Data

21 21 Jamie ShiersNovember 2004Scientific & Enterprise Grids RAL IN2P3 BNL FZK CNAF PIC ICEPP FNAL Tier-1 small centres Tier-2 desktops portables USC NIKHEF Krakow CIEMAT Rome Taipei TRIUMF CSCS Legnaro UB IFCA IC MSU Prague Budapest Cambridge IFIC LHC Computing Model Tier-0 – CERN Filter  raw data Reconstruction  summary data (ESD) Record raw data and ESD Distribute raw and ESD to Tier-1 Tier-1 – Data-heavy analysis Re-processing raw  ESD National, regional support Tier-2 – End-user analysis – batch and interactive

22 22 Jamie ShiersNovember 2004Scientific & Enterprise Grids RAL IN2P3 BNL FZK CNAF USC PIC ICEPP FNAL NIKHEF Krakow Taipei CIEMAT TRIUMF Rome CSCS Legnaro UB IFCA IC MSU Prague Budapest Cambridge Data distribution ~70 Gbits/sec Processing M SI2000** IC

23 23 Jamie ShiersNovember 2004Scientific & Enterprise Grids 2004 Data Challenges Large-scale tests of the experiments’ computing models, processing chains, grid technology readiness, operating infrastructure The big challenge for this year – - data - file catalogue, - replica management, - database access, - integrating mass storage Planning for a second operations & support centre in Taipei Grid Operations Centre at RAL User Support Centre at FZK

24 Oracle Tech DayNovember 2004 Experiences during the data challenges

25 25 Jamie ShiersNovember 2004Scientific & Enterprise Grids Data Challenges – ALICE Phase I  120k Pb+Pb events produced in 56k jobs  1.3 million files (26TByte) in Castor@CERN  Total CPU: 285 MSI-2k hours (2.8 GHz PC working 35 years)  ~25% produced on LCG-2  Phase II (underway)  1 million jobs, 10 TB produced, 200TB transferred,500 MSI2k hours CPU  ~15% on LCG-2

26 26 Jamie ShiersNovember 2004Scientific & Enterprise Grids Data Challenges – ATLAS Phase I  7.7 Million events fully simulated (Geant 4) in 95.000 jobs  22 TByte  Total CPU: 972 MSI-2k hours  >40% produced on LCG-2 (used LCG-2, GRID3, NorduGrid)

27 27 Jamie ShiersNovember 2004Scientific & Enterprise Grids Data Challenges – CMS ~30 M events produced 25Hz reached (only once for a full day) RLS, Castor, control systems, T1 storage, … Not a CPU challenge, but a full chain demonstration Pre-challenge production in 2003/04 70 M Monte Carlo events (30M with Geant-4) produced Classic and grid (CMS/LCG-0, LCG-1, Grid3) productions

28 28 Jamie ShiersNovember 2004Scientific & Enterprise Grids DIRAC alone LCG in action 1.8 10 6 /day LCG paused 3-5 10 6 /day LCG restarted Data Challenges – LHCb Phase I  186 M events 61 TByte  Total CPU: 424 CPU years (43 LCG-2 and 20 DIRAC sites)  Up to 5600 concurrent running jobs in LCG-2 This is 5-6 times what was possible at CERN alone

29 29 Jamie ShiersNovember 2004Scientific & Enterprise Grids Data challenges – Summary First time such a set of large scale grid productions has been done? –Significant efforts invested on all sides – very fruitful collaborations –Middleware is actually quite stable now –But – single largest issue is lack of stable operations Close to 500TB (half a PB…) of Data Stored

30 30 Jamie ShiersNovember 2004Scientific & Enterprise Grids Preparing for 7,000 boxes in 2008

31 31 Jamie ShiersNovember 2004Scientific & Enterprise Grids LCG Summary The LHC Computing Grid is real and is running production From a ‘single world-wide Grid’ to a ‘federation of Grids’ See also Economist, October 7 th 2004

32 32 Jamie ShiersNovember 2004Scientific & Enterprise Grids Agenda The Need for a World-Wide Grid An Overview of the World’s Largest Scientific Grid The role of the Database Group in the above The role of the CERN openlab for DataGrid Applications The role of Enterprise Grids Summary and Conclusions

33 Oracle Tech DayNovember 2004 The Role of the Database Group

34 34 Jamie ShiersNovember 2004Scientific & Enterprise Grids CERN Database Group Provides support for Oracle based solutions across whole spectrum of activities of the laboratory: –Internal e-business applications Uses, inter alia, Oracle HR –Technical infrastructure for the laboratory Accelerator + Detector design, construction and operation –Physics related services Will be used in real-time mode for detector monitoring and calibration Also for storing some fraction of the scientific data CERN has been an Oracle customer since > 20 years! –http://cern.ch/db/

35 35 Jamie ShiersNovember 2004Scientific & Enterprise Grids DB Group – Physics Activities Develop and maintain Physics-related Applications –POOL Persistency Framework for storing Physics Data –Conditions DB for conditions of the massive detectors themselves These activities part of the LCG Applications Area… General Database and Application Server services –Currently at the level of 10-20TB –Essentially all based on Intel / Linux Core Grid Services –Includes the LCG File Catalog –Used to schedule jobs (where the data is) and… –For running jobs to access the data

36 36 Jamie ShiersNovember 2004Scientific & Enterprise Grids Physics Activities - Futures Re-engineering all DB services for Physics on Oracle 10g RAC Goals are: –Isolation – 10g ‘services’ and / or physical separation –Scalability - in both database processing power and storage –Reliability – automatic failover in case of problems –Manageability – significantly easier to administer than now Will revisit this under ‘Enterprise Grids’ later…

37 37 Jamie ShiersNovember 2004Scientific & Enterprise Grids CERN & Oracle Share a common vision regarding the future of high performance computing –Wide spread use of commodity dual processor PCs running Linux; –Focus on Grid computing CERN has managed to influence Oracle product  Oracle 10g features: –Support for native IEEE floats & doubles; –Support for “Ultra large” Databases (ULDB); –Cross-platform transportable tablespaces.

38 38 Jamie ShiersNovember 2004Scientific & Enterprise Grids Agenda The Need for a World-Wide Grid An Overview of the World’s Largest Scientific Grid The role of the Database Group in the above The role of the CERN openlab for DataGrid Applications The role of Enterprise Grids Summary and Conclusions

39 39 Jamie ShiersNovember 2004Scientific & Enterprise Grids

40 40 Jamie ShiersNovember 2004Scientific & Enterprise Grids CERN openlab The CERN openlab for DataGrid applications is a framework for evaluating and integrating cutting-edge technologies or services in partnership with industry, focusing on potential solutions for the LCG The openlab invites members of the industry to join and contribute systems, resources or services, and carry out with CERN large-scale highly-performing evaluations of their solutions in an advanced integrated environment CERN – Oracle focus: –Areas that will lead to tangible benefits in the short-medium term –Also look at longer term, strategic issues  Not limited to Physics-specific! Solutions preferably of general interest!

41 41 Jamie ShiersNovember 2004Scientific & Enterprise Grids openlab - achievements DataGuard –Typically viewed as ‘disaster protection’ (which does happen) –Also suitable for handling scheduled interventions Primary cause of interventions in our Grid are O/S security patches Cannot afford for critical Grid component to be down! (Impacts whole Grid!) Streams –Often viewed as ‘some sort of replication technique’ (true) –Has great potential in handling upgrades in a quasi-transparent manner –openlab fellow has demonstrated transparent upgrades: From one version of Oracle to another (e.g. 9i to 10g) From one platform to another (e.g. from Solaris to Intel)  We have sufficient confidence in this technique that we will be using it in production for critical services, e.g. network DB, together with RAC

42 42 Jamie ShiersNovember 2004Scientific & Enterprise Grids simulation reconstruction analysis interactive physics analysis batch physics analysis batch physics analysis detector event summary data raw data event reprocessing event reprocessing event simulation event simulation analysis objects (extracted by physics topic) Data Handling and Computation for Physics Analysis event filter (selection & reconstruction) event filter (selection & reconstruction) processed data les.robertson@cern.ch CERN

43 43 Jamie ShiersNovember 2004Scientific & Enterprise Grids simulation reconstruction analysis interactive physics analysis batch physics analysis batch physics analysis detector event summary data raw data event reprocessing event reprocessing event simulation event simulation analysis objects (extracted by physics topic) Physics Analysis event filter (selection & reconstruction) event filter (selection & reconstruction) processed data les.robertson@cern.ch

44 44 Jamie ShiersNovember 2004Scientific & Enterprise Grids RAWRAW ESDESD AODAOD TAG random seq. 1PB/yr (1PB/s prior to reduction!) 100TB/yr 10TB/yr 1TB/yr Data Users Tier0 Tier1

45 45 Jamie ShiersNovember 2004Scientific & Enterprise Grids openlab – future focus Ultra-large scientific databases for end-user analysis –An area that is not well understood in current LCG –Exploitation of native floats; low selectivity server-side query –Joint work with other openlab partners! World-wide monitoring / deployment –Extensive use of Enterprise Grid Control to handle deployment of core DB and iAS services –Monitoring, Capacity Planning, Patch Deployment, Backup / Recovery etc. Further development of quasi non-stop services using Oracle Streams, 10g DataGuard + 10g RAC etc  All with focus on early production deployment of successes!

46 46 Jamie ShiersNovember 2004Scientific & Enterprise Grids

47 47 Jamie ShiersNovember 2004Scientific & Enterprise Grids Agenda The Need for a World-Wide Grid An Overview of the World’s Largest Scientific Grid The role of the Database Group in the above The role of the CERN openlab for DataGrid Applications The role of Enterprise Grids Summary and Conclusions

48 Oracle Tech DayNovember 2004 Enterprise Grids The Role of Enterprise Grids in Scientific Grids

49 49 Jamie ShiersNovember 2004Scientific & Enterprise Grids Grid - Component Services A Grid such as the LCG is built upon a large number of component services and applications Traditional wisdom: –Hand tailor each service according to its specific needs –Hard limits in terms of scalability and capacity –Maintenance nightmare! Alternate approach: –Build services in a standard way out of common building blocks –Layer upon an Enterprise Grid –Scalable; Configurable; Manageable.

50 50 Jamie ShiersNovember 2004Scientific & Enterprise Grids CERN DB Physics Services Currently being re-engineered on an Enterprise Grid –24 PCs (now) – expand to 36 / 48 end 2005 dual processor, 4GB memory, mirrored system disk, RHEL 3.0 –Redundant (dual) 64-port SAN infrastructure –Some 50TB of mirrored SAN storage Based on Oracle 10g RAC and 10g Services –Hardware on order – installation expected before Christmas 2004 –Watch this space!

51 Oracle Tech DayNovember 2004 Summary and Conclusions

52 52 Jamie ShiersNovember 2004Scientific & Enterprise Grids What are Grids all about? Grids are about sharing and pooling of resources We all know that when we can and do work together we achieve much more than if we work alone –CERN is a classic example of this at a world-wide scale! Two other examples (from CHEP ’04 in Interlaken, CH) 1.Resilience Security of valuable data Continuity in case of major disruption 2.Expedience Access to additional resources Engagement of distributed communities

53 53 Jamie ShiersNovember 2004Scientific & Enterprise Grids The Grid – Disruptive Technology? From OracleWorld San Francisco press panel Sep 2003: [ Work on LHC Computing started around 1992 ] “What would happen if something came along that would change everything? Like the Web. We would simply have to take it into account” “We believe that thing has come along, and that thing is the grid.” “We are actively involved in making it happen, and it is the underlying cornerstone of our computing model”

54 Oracle Tech DayNovember 2004 The Grid is unstoppable...


Download ppt "Oracle Tech DayNovember 2004 Building the world’s largest Scientific Grid Jamie Shiers Database Group, CERN, Geneva, Switzerland."

Similar presentations


Ads by Google