T. Bowcock Liverpool Sept 00. Sept 11 2000LHCb-GRID T. Bowcock 2 University of Liverpool Successes Issues Improving the system Comments.

Slides:



Advertisements
Similar presentations
T. Bowcock A.Moreton, M.McCubbin CERN-IT 5/ May 2000CERN-IT T. Bowcock2 University of Liverpool MAP System COMPASS Grid Summary.
Advertisements

T. Bowcock Liverpool December Nov-00T. Bowcock University of Liverpool Status CDF/GRID.
LHCb Computing Activities in UK Current activities UK GRID activities RICH s/w activities.
National Grid's Contribution to LHCb IFIN-HH Serban Constantinescu, Ciubancan Mihai, Teodor Ivanoaica.
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
2. Computer Clusters for Scalable Parallel Computing
Beowulf Supercomputer System Lee, Jung won CS843.
RAID Technology. Use Arrays of Small Disks? 14” 10”5.25”3.5” Disk Array: 1 disk design Conventional: 4 disk designs Low End High End Katz and Patterson.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
 Changes to sources of funding for computing in the UK.  Past and present computing resources.  Future plans for computing developments. UK Status &
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
CS 300 – Lecture 22 Intro to Computer Architecture / Assembly Language Virtual Memory.
11 Dec 2000F Harris Datagrid Testbed meeting at Milan 1 LHCb ‘use-case’ - distributed MC production
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Server Types Different servers do different jobs. Proxy Servers Mail Servers Web Servers Applications Servers FTP Servers Telnet Servers List Servers Video/Image.
Process Management A process is a program in execution. It is a unit of work within the system. Program is a passive entity, process is an active entity.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
Nick Brook Current status Future Collaboration Plans Future UK plans.
Jan. 17, 2002DØRAM Proposal DØRACE Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Remote Analysis Station ArchitectureRemote.
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
Stephen Wolbers CHEP2000 February 7-11, 2000 Stephen Wolbers CHEP2000 February 7-11, 2000 CDF Farms Group: Jaroslav Antos, Antonio Chan, Paoti Chang, Yen-Chu.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
MAP Project T. Bowcock, A. Kinvig, I. Last M. McCubbin, A. Moreton C. Parkes, G. Patel University of Liverpool.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
F. Rademakers - CERN/EPLinux Certification - FOCUS Linux Certification Fons Rademakers.
Spending Plans and Schedule Jae Yu July 26, 2002.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
WP8 Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid WP8 Meeting, 16th November 2000 Glenn Patrick (RAL)
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
McLean HIGHER COMPUTER NETWORKING Lesson 15 (a) Disaster Avoidance Description of disaster avoidance: use of anti-virus software use of fault tolerance.
ALMA Archive Operations Impact on the ARC Facilities.
ATLAS Grid Data Processing: system evolution and scalability D Golubkov, B Kersevan, A Klimentov, A Minaenko, P Nevski, A Vaniachine and R Walker for the.
1 MMORPG Servers. 2 MMORPGs Features Avatar Avatar Levels Levels RPG Elements RPG Elements Mission Mission Chatting Chatting Society & Community Society.
Operating systems - history & principles. This lesson includes the following sections: History of OS Development Process Allocation Memory Management.
UK Grid Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid Prototype and Globus Technical Meeting QMW, 22nd November 2000 Glenn Patrick (RAL)
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
MC Production in Canada Pierre Savard University of Toronto and TRIUMF IFC Meeting October 2003.
Feb. 13, 2002DØRAM Proposal DØCPB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Partial Workshop ResultsPartial.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.
1 CMS Virtual Data Overview Koen Holtman Caltech/CMS GriPhyN all-hands meeting, Marina del Rey April 9, 2001.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)
BIG DATA/ Hadoop Interview Questions.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Storage HDD, SSD and RAID.
Video Security Design Workshop:
U.S. ATLAS Grid Production Experience
LQCD Computing Operations
Introduction to Networks
OffLine Physics Computing
Computer communications
Proposal for a DØ Remote Analysis Model (DØRAM)
Distributed Systems and Concurrency: Distributed Systems
Presentation transcript:

T. Bowcock Liverpool Sept 00

Sept LHCb-GRID T. Bowcock 2 University of Liverpool Successes Issues Improving the system Comments

Sept LHCb-GRID T. Bowcock 3 Architecture Master External Ethernet MAP Slaves Hub (Switch- 00) Hub (Switch - 00) 100BaseT

Sept LHCb-GRID T. Bowcock 4 Successes Many events generated for –LHCb –H1 –DELPHI –ATLAS –(CDF) Software –Fault tolerant on transfers –Can distribute MC’s, update system etc. Prototype a big system

Sept LHCb-GRID T. Bowcock 5 Issues Ease of use –Expert system –Robustness OS (RH5.1 and 6.2) Storage Up-time since Jan’00 only - 90% –Development + Recovery –Typically run with 95% CPU’s in MAP

Sept LHCb-GRID T. Bowcock 6 Issues COMPASS nodes –Do not appear as a single volume 10 disks /scsi1 /scsi2….. User unfriendly? Yup. Do we care? –Who is our user?

Sept LHCb-GRID T. Bowcock 7 Bigger Issues Hardware failures –NIC (20%): low-cost but annoying Replacement fast. –Disk’s –Power Supplies Problem with sleave-bearing fans. Leading to higher failure rate. Higher quality fans will be installed (Sept ’00) Down-time expected of about 1 week. –Hubs replaced by Switches

Sept LHCb-GRID T. Bowcock 8 Improvements needed Multi-Redundant Masters –Plan 6-fold redundancy for security and simplicity of operation (Oct ’00) MAP-FCS –Need to bullet-proof it and make it ‘idiot proof’ Interface to User –Grid or remote login needs development Storage & Transfer –How do disks appear to the outside world? Data-Analysis Capability

Sept LHCb-GRID T. Bowcock 9 Software Improvements Sept ’00 –Complete upgrade to RH6.2 –Complete upgrade of MAP-FCS Oct-Dec ’00 –reduce system vulnerability Large improvement from above –GLOBUS interface?

Sept LHCb-GRID T. Bowcock 10 Expanding MAP’s role MAP conceived of as a MC engine –Throwaway MC(or ntuples) Keeping MC involves moving data to COMPASS –Done How do we (re-analyse) larger chunks of data? –Assuming we want to do this….

Sept LHCb-GRID T. Bowcock 11 Challenge - Example(a) –LHCb. Want to produce 10 6 events. Reprocess it once or twice and then analyse it for a while. Optimistically 10 6 is about 1TByte of space –Solution Increase disk store on each MAP node, and store data there. Analysis/repro possible. –But implies we now need resource management Disks can get full up. Who gets to play?!

Sept LHCb-GRID T. Bowcock 12 Challenge - Example(b) –CDF. Want to import data from FNAL. Want to analyse at Liverpool. –Solution(none yet!) Importing data tricky. Rely on transfer from tape to stage (e.g. COMPASS). –At 5MBytes/s (Fast Ethernet) 1Tbyte would take s!!!! To get onto nodes (2days). Using 6 COMPASS nodes about 8 hrs Installation Gbit –Expensive, reduces to about 1Hr.

Sept LHCb-GRID T. Bowcock 13 Architecture Modification Currently: (MC-mode) 6-fold redundant master can control PC’s Split into subfarms to increase the I/O bandwidth

Sept LHCb-GRID T. Bowcock 14 Subfarm Solution Possible –but substantial development of system MC is the biggest problem –But we still need to analyse the data –Where is the balance?

Sept LHCb-GRID T. Bowcock 15 … so Suggest following steps –Complete installation of the 6 MAP masters (COMPASS nodes), 0.5Tbyes/each. O/P from jobs can be directed there –Increase disk capacity on existing nodes Purchase Gbyte disks (about 50KChF) Hopefully by Oct 1. (Total disk capacity of 6Tbytes on MAP, 3 on COMPASS nodes). –Allow users to create persistant stores on MAP –Business (bazaar style) as usual…

Sept LHCb-GRID T. Bowcock 16 Further Improvements Make MAP accessible! –Globus –Care required More users More store More management…. –Package the software for distribution But does anybody want it????? Hardware Upgrades –More nodes

Sept LHCb-GRID T. Bowcock 17 Comments Can any one system provide all the facilities and capabilities? –cpu, storage, data-access, i/o? How do institutes/regional centres really fit in? –Balance of politics and effectiveness Lessons for 2004…