Download presentation
Presentation is loading. Please wait.
1
Open Science Grid at Condor Week
Ruth Pordes Fermilab April 25th 2006
2
Outline OSG goals and organization Drivers and use today Middleware
Focus and roadmap 2/21/2019
3
What, Who is Open Science Grid?
High Throughput Distributed Facility Shared opportunistic access to existing clusters, storage and networks. Owner controlled resources and usage policies. Supports Science Funded by NSF and DOE projects. Common technologies & cyber-infrastructure. Open & Inclusive Collaboration of users, developers, grid technologists & facility administrators. Training & help for existing & new administrators and users Heterogeneous 2/21/2019
4
OSG Organization 2/21/2019
5
OSG Organization Executive Director: Ruth Pordes
Facility Coordinator: Miron Livny Application Coordinators: Torre Wenaus & fkw Resource Managers: P. Avery & A. Lazzarini Education Coordinator: Mike Wilde Engagement Coord.: Alan Blatecky Middleware Coord.: Alain Roy Ops Coordinator: Leigh Grundhoefer Security Officer: Don Petravick Liaison to EGEE: John Huth Liaison to Teragrid: Mark Green Council Chair: Bill Kramer Condor project 2/21/2019
6
OSG Drivers: LIGO- gravitational wave physics; STAR - nuclear physics, CDF, D0, - high energy physics, SDSS - astrophysics GADU - bioinformatics Nanohub etc. Research groups transitioning from & extending (legacy) systems to Grids: Evolution of iVDGL, GriPhyN, PPDG US LHC Collaborations Contribute to & depend on milestones, functionality, capacity of OSG. Commitment to general solutions, sharing resources & technologies; Application Computer Scientists Contribute to technology, integration, operation. Excited to see technology put to production use! Federations with Campus Grids Bridge & interface Local & Wide Area Grids. Interoperation & partnerships with national/ international infrastructures Ensure transparent and ubiquitous access. Work towards standards. NMI, Condor, Globus, SRM GLOW, FermiGrid, GROW, Crimson, TIGRE EGEE, TeraGrid, INFNGrid 2/21/2019
7
LHC Physics gives schedule and performance stress:
Beam starts in 2008: Distributed System must serve 20PB of data in served across 30PB disk distributed across 100 sites worldwide to be analyzed by 100MSpecInt2000 of CPU. Service Challenges give steps to full system 1 GigaByte/sec 2/21/2019
8
Priority to many other stakeholders
New science enabled by opportunistic use of resources E.g. From OSG Proposal: LIGO : With an annual science run of data collected at roughly a terabyte of raw data per day, this will be critical to the goal of transparently carrying out LIGO data analysis on the opportunistic cycles available on other VOs hardware Opportunity to share use of “standing army” of resources E.g. From OSG news: Genome Analysis and Database Update system, uses OSG resources to run BLAST, Blocks and Chisel for all publicly available genome sequence data used by over 2,400 researchers worldwide GADU processes 3.1 million protein sequences, about 500 batch jobs per day for several weeks, to be repeated every monthly. Interface existing computing and storage facilities and Campus Grids to a common infrastructure. E.g. FermiGrid Strategy: To allow opportunistic use of otherwise dedicated resources. To save effort by implementing shared services. To work coherently to move all of our applications and services to run on the Grid. 2/21/2019
9
OSG Use - the Production Grid
2/21/2019
10
OSG Sites Today Not all registered slots are available to grid users.
23 VOs More than 18,000 batch slots registered. … but only 15% of it used via grid interfaces that are monitored. Large fraction of local use rather than grid use. Not all registered slots are available to grid users. Not all available slots are available to every grid user. Not all slots used are monitored. 2/21/2019
11
Use - Daily Monitoring 04/23/2006
“bridged” GLOW jobs Genome analyis (GADU) 2000 running jobs 500 waiting jobs 2/21/2019
12
Use - 1 Year View Deploy OSG Release 0.4.0 LHC OSG “launch” 2/21/2019
13
Common Middleware provided through Virtual Data Toolkit
Domain science requirements. Globus, Condor, EGEE etc OSG stakeholders and middleware developer (joint) projects. Test on “VO specific grid” Integrate into VDT Release. Deploy on OSG integration grid Include in OSG release & deploy to OSG production. Condor project 2/21/2019
14
Middleware Services Grid Site Interfaces Grid Access:
X509 User certificate extended using EGEE-VOMS. Condor-G job submission. Jobs may place initial data. Middleware Services VO Management: EGEE-VOMS extends certificate with attributes based on Role. Grid Site Interfaces Monitoring & Information: Processing Service: Job Submission: GRAM (pre-WS or WS) Overhead Mitigations: Condor Gridmonitor; Condor Managed Fork Queue Site and VO based Authorization and Account Mapping Data Movement: GridFTP Identity & Authorization: GUMS Certificate to Account Mapping. Data Storage and Management: Storage Resource Management (SRM) or Local Environment Variables ($APP, $GRID_WRITE etc) Shared File Systems Site and VO based Authorization and Access mapping 2/21/2019
15
S U Secure R F Usable Reliable Flexible OSG allows you to Providing a
Distributed Facility S U R F Usable Reliable Flexible The Grid 2/21/2019
16
Security - Management, Operational and Technical Controls
Management: Risk assessment and security planning, Service auditing and checking. Operational: Incident response, Awareness and Training, Configuration management. Technical: Authentication and Revocation, Auditing and analysis. End to end trust in quality of code executed on remote CPU. 2/21/2019
17
Usability - Throughput, Scaling, Fault Diagnosis
OSG users need high throughput. Focus on effective utilization. VO control of use: examples: Batch system priority in based on VO activity. Control Write-authorization and quotas based on VO role. Prioritize data transfer based on role of the user. Usability Goals include: Minimize entry threshold for resource owners Minimize software stack. Minimize support load. Minimize entry threshold for users Feature rich software stack. Excellent user support. 2/21/2019
18
Reliability - Central Operations Activities
Daily Grid Exerciser: Automated validation of basic services and site configuration Configuration of HeadNode and Storage to reduce errors: Remove dependence on Shared File System Condor-managed GRAM fork queue Scaling tests of WS-GRAM and GridFTP. 2/21/2019
19
Flexibility Multiple user interfaces.
Support for multiple versions of the Middleware Releases. Heterogeneous resources. Sites have local control of use of the resources - including storage and space (no global file system). Distributed support model through VO and Facility Support Centers which contribute to global operations. 2/21/2019
20
Finally - the OSG Roadmap
Sustain the Users, the Facility, and the Experts. Continually improve effectiveness and throughput. Capabilities and schedule driven by Science Stakeholders - maintain production system while integrating new features. Enable new Sites, Campus and Grids to contribute while maintaining self-control. Release Planned Actual OSG 0.2 Spring 2005 July 2005 OSG 0.4.0 Dec 2005 January 2006 OSG 0.4.1 April 2005 May 2006: WS-Gram; condor based fork queue; accounting V0; Inteoperability with LCG using RB. OSG 0.6.0 July 2006 Edge services; Condor-C; test gLITE wms; OSG 0.8.0 End of 2006 OSG 1.0 2007 2/21/2019
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.