Download presentation
Presentation is loading. Please wait.
Published byDulcie Stevens Modified over 8 years ago
1
Russ Miller & Mark Green Center for Computational Research & Computer Science & Engineering SUNY-Buffalo Hauptman-Woodward Medical Inst GT’04 Panel: Storage Considerations for Grid Computing Environments University at Buffalo The State University of New York NSF, NIH, DOE, NYS
2
University at BuffaloThe State University of New York CCR Center for Computational Research Apex Bioinformatics System Sun V880 (3), Sun 6800 Sun 280R (2) Intel PIIIs Sun 3960: 7 TB Disk Storage HP/Compaq SAN 75 TB Disk 190 TB Tape 64 Alpha Processors (400 MHz) 32 GB RAM; 400 GB Disk IBM RS/6000 SP: 78 Processors Sun Cluster: 80 Processors SGI Intel Linux Cluster 150 PIII Processors (1 GHz) Myrinet Major CCR Resources (12TF & 290TB) Dell Linux Cluster: #22 #25 #38 600 P4 Processors (2.4 GHz) 600 GB RAM; 40 TB Disk; Myrinet Dell Linux Cluster: #187 #368 off 4036 Processors (PIII 1.2 GHz) 2TB RAM; 160TB Disk; 16TB SAN IBM BladeCenter Cluster 532 P4 Processors (2.8 GHz) 5TB SAN SGI Origin3700 (Altix) 64 Processors (1.3GHz ITF2) 256 GB RAM 2.5 TB Disk SGI Origin3800 64 Processors (400 MHz) 32 GB RAM; 400 GB Disk
3
University at BuffaloThe State University of New York CCR Center for Computational Research Advanced CCR Data Center (ACDC) Computational Grid Overview 300 Dual Processor 2.4 GHz Intel Xeon RedHat Linux 7.3 38.7 TB Scratch Space Joplin: Compute Cluster 75 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 1.8 TB Scratch Space Nash: Compute Cluster 9 Single Processor Dell P4 Desktops School of Dental Medicine 13 Various SGI IRIX Processors Hauptman-Woodward Institute 25 Single Processor Sun Ultra5s Computer Science & EngineeringCrosby: Compute Cluster SGI Origin 3800 64 - 400 MHz IP35 IRIX 6.5.14m 360 GB Scratch Space 9 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 315 GB Scratch Space Mama: Compute Cluster 16 Dual Sun Blades 47 Sun Ultra5 Solaris 8 770 GB Scratch Space Young: Compute Cluster T1 Connection Note: Network connections are 100 Mbps unless otherwise noted. 19 IRIX, RedHat, & WINNT Processors CCR RedHat, IRIX, Solaris, WINNT, etc Expanding ACDC: Grid Portal 4 Processor Dell 6650 1.6 GHz Intel Xeon RedHat Linux 9.0 66 GB Scratch Space 1 Dual Processor 250 MHz IP30 IRIX 6.5 Fogerty: Condor Flock Master
4
University at BuffaloThe State University of New York CCR Center for Computational Research BCOEB Medical/Dental Network Connections
5
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC Data Grid Overview (Grid-Available Data Repositories ) 300 Dual Processor 2.4 GHz Intel Xeon RedHat Linux 7.3 38.7 TB Scratch Space Joplin: Compute Cluster 75 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 1.8 TB Scratch Space Nash: Compute Cluster 4 Processor Dell 6650 1.6 GHz Intel Xeon RedHat Linux 9.0 66 GB Scratch Space ACDC: Grid PortalCrosby: Compute Cluster SGI Origin 3800 64 - 400 MHz IP35 IRIX 6.5.14m 360 GB Scratch Space 9 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 315 GB Scratch Space Mama: Compute Cluster 16 Dual Sun Blades 47 Sun Ultra5 Solaris 8 770 GB Scratch Space Young: Compute Cluster Note: Network connections are 100 Mbps unless otherwise noted. 182 GB Storage 100 GB Storage 56 GB Storage 100 GB Storage 70 GB Storage Network Attached Storage 1.2 TB Storage Area Network 75 TB 136 GB Storage CSE Multi-Store 2 TB
6
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC Data Grid Overview 300 Dual Processor 2.4 GHz Intel Xeon RedHat Linux 7.3 38.7 TB Scratch Space Joplin: Compute Cluster 75 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 1.8 TB Scratch Space Nash: Compute Cluster 4 Processor Dell 6650 1.6 GHz Intel Xeon RedHat Linux 9.0 66 GB Scratch Space ACDC: Grid PortalCrosby: Compute Cluster SGI Origin 3800 64 - 400 MHz IP35 IRIX 6.5.14m 360 GB Scratch Space 9 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 315 GB Scratch Space Mama: Compute Cluster 16 Dual Sun Blades 47 Sun Ultra5 Solaris 8 770 GB Scratch Space Young: Compute Cluster Note: Network connections are 100 Mbps unless otherwise noted. 182 GB Storage 100 GB Storage 56 GB Storage 100 GB Storage 70 GB Storage Network Attached Storage 480 GB Storage Area Network 75 TB 136 GB Storage CSE Multi-Store 2 TB
7
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Browser view of “miller” group files published by user “rappleye”
8
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Administration
9
University at BuffaloThe State University of New York CCR Center for Computational Research Grid-enabling Application Templates Structural Biology Earthquake Engineering Pollution Abatement Geographic Information Systems & BioHazards
10
University at BuffaloThe State University of New York CCR Center for Computational Research Grid-enabling Application Templates Structural Biology: The Shake-and-Bake algorithm for molecular structure determination has been used in a routine fashion to solve difficult atomic resolution structures. The running time of SnB varies widely as a function of the size of the structure, the quality of the data, the space group, and choices of critical input parameters, including the size of the Fourier grid, the number of reflections, the number and type of invariants, and the number of cycles of the procedure used per trial structure, to name a few. SnB is being augmented with a data repository that stores information for every SnB run, regardless of where the job is run. This information is then mined in an automated fashion in order to optimize 17 key SnB parameters in an effort to optimize the procedure for solving previously unknown structures.
11
University at BuffaloThe State University of New York CCR Center for Computational Research Grid-enabling Application Templates Earthquake Engineering: In our effort to develop disaster-resilient communities, there is a need to model, understand, and ultimately direct the behavior of a wide variety of complex multi- scale systems, including the many engineering systems that shape our physical environment. Two aspects of the structural system needs: a multiscale evaluation tools of progressive collapse of structures and use of such evaluation tools in a new general framework for aseismic design and retrofit, based upon evolutionary methodologies.
12
University at BuffaloThe State University of New York CCR Center for Computational Research Grid-enabling Application Templates Geographic Information Systems and Biohazards: A project for developing a transport model capable of predicting the movement of a harmful algal bloom in a lake is currently underway. The models will be developed for each of the three main lakes in the Monitoring and Event Response for Harmful Algal Blooms (MERHAB) study: Ontario, Erie and Champlain. To accomplish the goal of predicting bloom movement, we propose to maintain a near real-time database for water velocity fields in the lakes and to provide short- term predictions of lake circulations. This will require a combination of hydrodynamic and transport modeling, along with linkages to various data sources, including regional weather stations, water monitoring stations, and satellite data.
13
University at BuffaloThe State University of New York CCR Center for Computational Research Grid-enabling Application Templates Pollution Abatement: With increasing population growth and reliance on groundwater sources for drinking water, the protection of groundwater supplies has become increasingly important. Current projects will lead to the release of software that can be used for a variety of applications, such as the development of statewide strategies for groundwater protection, the assessment of the regional impact of waste disposal facilities, and modeling the impact of global climate change on groundwater supplies.
14
University at BuffaloThe State University of New York CCR Center for Computational Research Grid-enabling Application Templates Geophysical Mass Flows: The risk of potential volcanic eruptions and associated mass flows is a problem that public safety authorities throughout the world face several times a year. Flow models are useful to forecast the movement of volcanic materials on or above the surface. Applications of such models include: pre-crisis understanding of hazards and developing risk maps, real-time crisis assistance and management, and post-crisis reconstruction and distribution of aid.
15
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Collaborations Advanced Computational Data Center – Grid (ACDC-Grid) Innovative Laboratory Prototype Grid3+ Collaboration High-Performance Networking Infrastructure HP Labs Collaboration IBM (under discussion) Open Science Grid (Future)
16
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Collaborations Advanced Computational Data Center – Grid (ACDC-Grid): These resources will be shared by researchers from several departments working on a diverse suite of problems. Prototype grid-enabled applications: Shake-and-Bake, evolutionary passively-damped structures, Great Lakes Princeton Ocean Model (POM), geophysical mass flows, harmful algal bloom particle tracking applications, etc. Innovative Laboratory Prototype: Research on providing a low cost, environment friendly laboratory model to teach and research grid technologies and provide a unique team-learning model for collaboration among people with diverse skills and educational backgrounds.
17
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Collaborations Grid3+ Collaboration: The ACDC Job Monitoring system has been deployed on Grid2003 resources and is designed to be a light-weight and non-intrusive tool for monitoring applications and resources on computational grids. It provides a real-time and historical retrospective of the utilization of such resources, which can be used to track efficiency, adjust grid-based scheduling, and perform a predictive assignment of applications to resources. High-Performance Networking Infrastructure: Determination of LAN/MAN/WAN-wide performance metrics of distributed computing services including disaster-tolerant storage and HA distributed-/grid- environments.
18
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Collaborations HP labs Collaboration: Bi-weekly meetings for development of a light-weight version of the Globus Toolkit and campus-wide grid- enabling tools between the research center and key scientists and engineers at HP is ongoing. Group to group collaboration infrastructure – The Access Grid: Research into determination of the attributes necessary in a rapid deployment AG node. Development/benchmarking of new AG shared services including shared presentations and shared digital video environments. Open Science Grid Collaboration: Future
19
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Cyber-Infrastructure Predictive Scheduler Define quality of service estimates of job completion, by better estimating job runtimes by profiling users. Data Grid Automated Data File Migration based on profiling users. High-performance Grid-enabled Data Repositories Develop automated procedures for dynamic data repository creation and deletion. Dynamic Resource Allocation Develop automated procedures for dynamic computational resource allocation.
20
University at BuffaloThe State University of New York CCR Center for Computational Research Middleware Globus Toolkit 2.2.4 direct upgrade WSRF Condor 6.6.0 Network Weather Service 2.6 Apache2 HTTP Server PHP 4.3.0 MySQL 3.23 phpMyAdmin 2.5.1
21
University at BuffaloThe State University of New York CCR Center for Computational Research Biomedical Advances PSA Test (screen for Prostate Cancer) Avonex: Interferon Treatment for Multiple Sclerosis Artificial Blood Nicorette Gum Fetal Viability Test Implantable Pacemaker Edible Vaccine for Hepatitis C Timed-Release Insulin Therapy Anti-Arrythmia Therapy Tarantula venom Direct Methods Structure Determination Listed on “Top Ten Algorithms of the 20 th Century” Vancomycin Gramacidin A High Throughput Crystallization Method: Patented NIH National Genomics Center: Northeast Consortium Howard Hughes Medical Institute: Center for Genomics & Proteomics
22
University at BuffaloThe State University of New York CCR Center for Computational Research Bioinformatics in Buffalo A $360M Initiative New York State: $121M Federal Appropriations: $13M Corporate: $146 Foundation: $15M Grants & Contracts: $64M
23
University at BuffaloThe State University of New York CCR Center for Computational Research Bioinformatics Partners Lead Institutions University at Buffalo (UB) Hauptman-Woodward Medical Research Inst. Roswell Park Cancer Institute Corporate Partners Amersham Pharmacia, Beckman Coulter, Bristol Myers Squibb, General Electric, Human Genome Sciences, Immco, Invitrogen, Pfizer Pharmaceutical, Wyeth Lederle, Zeptometrix Dell, HP, SGI, Stryker, Sun AT&T, Sloan Foundation InforMax, Q-Chem, 3M, Veridian BioPharma Ireland, Confederation of Indian Industries
24
University at BuffaloThe State University of New York CCR Center for Computational Research Center for Computational Research 1999-2004 Snapshot Raptor Image High-Performance Computing and High-End Visualization 110 Research Groups in 27 Depts 13 Local Companies 10 Local Institutions External Funding $111M External Funding $13.5M as lead $97.5M in support $41.8M Vendor Donations $360M Bioinformatics Initiative Deliverables 350+ Publications Software, Media, Algorithms, Consulting, Training, CPU Cycles…
25
University at BuffaloThe State University of New York CCR Center for Computational Research CCR Visualization Resources Fakespace ImmersaDesk R2 Portable 3D Device Tiled-Display Wall 20 NEC projectors: 15.7M pixels Screen is 11’ 7’ Dell PCs with Myrinet2000 Access Grid Node Group-to-Group Communication Commodity components SGI Reality Center 3300W Dual Barco’s on 8’ 4’ screen VREX VR-4200 Stereo Imaging Projector Portable projector works with PC
26
University at BuffaloThe State University of New York CCR Center for Computational Research CCR Visualization Resources Fakespace ImmersaDesk R2 Portable 3D Device Tiled-Display Wall 20 NEC projectors: 15.7M pixels Screen is 11’ 7’ Dell PCs with Myrinet2000 Access Grid Node Group-to-Group Communication Commodity components SGI Reality Center 3300W Dual Barco’s on 8’ 4’ screen
27
University at BuffaloThe State University of New York CCR Center for Computational Research Medical/Dental BCOEB Network Connections (New)
28
University at BuffaloThe State University of New York CCR Center for Computational Research WNY Grid Highlights Heterogeneous Computational & Data Grid Currently in Beta with Shake-and-Bake WNY Release in 2H04 Bottom-Up General Purpose Implementation Ease-of-Use User Tools Administrative Tools Back-End Intelligence Backfill Operations Prediction and Analysis of Resources to Run Jobs (Compute Nodes + Requisite Data)
29
University at BuffaloThe State University of New York CCR Center for Computational Research Objective: Provide a 3-D mapping of the atoms in a crystal. Procedure: 1. Isolate a single crystal. 2. Perform the X-Ray diffraction experiment. 3. Determine molecular structure that agrees with diffration data. X-Ray Crystallography
30
University at BuffaloThe State University of New York CCR Center for Computational Research X-Ray DataMolecular Structure FFT FFT -1 X-Ray Data & Corresponding Molecular Structure Phases lost during the crystallographic experiment. Phase Problem: Determine phases of the reflections. Underlying atomic arrangement is related to the reflections by a 3-D Fourier transform. Reciprocal Space Real Space
31
University at BuffaloThe State University of New York CCR Center for Computational Research Shake-and-Bake Method: Dual-Space Refinement FFT Trial Phases Solutions ? Phase Refinement Tangent Formula Reciprocal SpaceReal Space “Shake”“Bake” Phase Refinement FFT -1 Parameter Shift Density Modification (Peak Picking) (LDE) Trial Structures Shake-and-Bake Structure Factors DeTitta, Hauptman, Miller, Weeks
32
University at BuffaloThe State University of New York CCR Center for Computational Research Atoms: 74Phases: 740 Space Group: P1Triples: 7,400 Trials: 100 Cycles: 40 Rmin range: 0.243 - 0.429 Ph8755: SnB Histogram
33
University at BuffaloThe State University of New York CCR Center for Computational Research Number of Atoms in Structure 0 100 1,000 10,000 100,000 Conventional Direct Methods Shake-and-Bake Multiple Isomorphous Replacement Se-Met Se-Met with Shake-and-Bake Vancomycin 190kDa ? ? Phasing and Structure Size
34
University at BuffaloThe State University of New York CCR Center for Computational Research Molecular Structure Determination SnB Software by UB/HWI “Top Algorithms of the Century” Critical to Rational Drug Design Important Link in Structural Biology Current Effort Grid Collaboratory Intelligent Learning
35
University at BuffaloThe State University of New York CCR Center for Computational Research Antibiotics & Supercomputers Result: New, better drugs in shorter time Vancomycin solved with SnB (UB/HWI) SnB: “Top Algorithms of the Century” “Antibiotic of Last Resort” Original molecular structure required 5 months (Re)solved in a single day on CCR’s supercomputers Current Efforts: Grid, Collaboratory, Intelligent Learning
36
University at BuffaloThe State University of New York CCR Center for Computational Research Photograph of Crystal
37
University at BuffaloThe State University of New York CCR Center for Computational Research Useful Relationships for Multiple Trial Phasing Tangent Formula Parameter Shift Optimization
38
University at BuffaloThe State University of New York CCR Center for Computational Research Structure of SnB SnB Process TrialsHistogramVisualization
39
University at BuffaloThe State University of New York CCR Center for Computational Research Vancomycin Crystal Structure Views (courtesy of P. Loll & P. Axelsen)
40
University at BuffaloThe State University of New York CCR Center for Computational Research Workstations SGI, Sun, DEC/Alpha Linux Parallel Computers Cray T3D/E, TMC CM-5, IBM SP2 HP-Convex Exemplar SGI Origin2/3000 & Onyx 2/3 IBM SP – heterogeneous Linux Clusters Sun Cluster Condor Flock Computational Grid Computing Platforms
41
University at BuffaloThe State University of New York CCR Center for Computational Research Molecular Structure Determination SnB Software by UB/HWI “Top Algorithms of the Century” Critical to Rational Drug Design Important Link in Structural Biology Current Effort Grid Collaboratory Intelligent Learning
42
University at BuffaloThe State University of New York CCR Center for Computational Research Vancomycin Crystal (courtesy of P. Loll)
43
University at BuffaloThe State University of New York CCR Center for Computational Research The Diffraction Pattern Experiment yields: o reflections o associated intensities Phase angles are lost in experiment.
44
University at BuffaloThe State University of New York CCR Center for Computational Research Experiment yields: o reflections o associated intensities Phase angles are lost in experiment. Underlying atomic arrangement is related to the reflections by a 3-D Fourier transform. Phase Problem: determine the set of phases corresponding to the reflections. The Phase Problem
45
University at BuffaloThe State University of New York CCR Center for Computational Research FFT Trial Phases Solutions ? Phase Refinement Density Modification (Peak Picking) Tangent Formula Reciprocal SpaceReal Space Conventional Direct Methods
46
University at BuffaloThe State University of New York CCR Center for Computational Research Ph8755: Trace of SnB Solution Atoms: 74Space Group: P1 SnB Cycles: 40
47
University at BuffaloThe State University of New York CCR Center for Computational Research Interferes with formation of bacterial walls Last line of defense against deadly streptococcal and staphylococcal bacteria strains Vancomycin resistance exists (Michigan) Can’t just synthesize variants and test Need structure-based approach to predict Solution with SnB (Shake-and-Bake) Pat Loll George Sheldrick Vancomycin
48
University at BuffaloThe State University of New York CCR Center for Computational Research Grid Server Console (Vancomycin)
49
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Motivation& Goal Motivation: Large data collections are emerging as important community resources. Data Grids inherently complements Computational Grids, which manipulate data. A data grid denotes a large network of distributed storage resources such as archival systems, caches, and databases, which are linked logically to create a sense of global persistence. Goal: To design and implement transparent management of data distributed across heterogeneous resources, such that the data is accessible via a uniform web interface.
50
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Summary 544 GB Storage Located on 6 heterogeneous ACDC-Grid resources 480 GB Storage Located on 1 dual processor Dell PowerVault server 75,000 GB Storage (10/03) Served by 4 – 16 processor HP GS1280 servers 2,000 GB Storage Served by Sun Ultra-60 servers 78,024 GB Total Data Grid Storage available and accessible from the ACDC-Grid Portal 182 GB Storage 100 GB Storage 56 GB Storage 100 GB Storage 70 GB Storage 136 GB Storage Storage Area Network 75 TB Network Attached Storage 480 GB CSE Multi-Store 2 TB
51
University at BuffaloThe State University of New York CCR Center for Computational Research Grid-Based SnB Objectives Install Grid-Enabled Version of SnB Job Submission and Monitoring over Internet SnB Output Stored in Database SnB Output Mined through Internet-Based Integrated Querying Tool Serve as Template for Chem-Grid & Bio-Grid Experience with Globus and Related Tools
52
University at BuffaloThe State University of New York CCR Center for Computational Research Grid Enabled SnB Problem Statement Use all available resources in the ACDC-Grid for determining a single molecular structure. Grid Enabling Criteria All heterogeneous resources in the ACDC-Grid are capable of executing the SnB application. All job results obtained from the ACDC-Grid resources are stored in a corresponding molecular structure database. There are three modes of operation: Continue submitting SnB application jobs until the grid-enabled SnB application determines a solution has been found, or “X” number of trials have been evaluated, or indefinitely (grid job owner determines when a solution has been found).
53
University at BuffaloThe State University of New York CCR Center for Computational Research Grid Services and Applications ACDC-Grid Computational Resources ACDC-Grid Data Resources ACDC-Grid Data Resources Applications Local Services LSF Condor MPI TCP SolarisIrix WINNT UDP High-level Services and Tools Globus Toolkit globusrun MPI NWS MPI-IO Core Services Metacomputing Directory Service GRAM Globus Security Interface GASS C, C++, Fortran, PHP Shake-and-Bake Oracle MySQL Apache PBS Maui Scheduler RedHat Linux Stork Adapted from Ian Foster and Carl Kesselman
54
University at BuffaloThe State University of New York CCR Center for Computational Research Notes Apache – web portal server PHP - used by apache server for dynamic web portal pages MDS – traditional to use MDS with LDAP but we use MDS with MYSql grid portal database to keep information of available resources (we poll every 15 mins) GRAM – Globus Resource Allocation Manager – API for requesting comptuational jobs GASS – Global Access to Secondary Storage – API for accessing files stored on various platforms Stork – Condor module for transporting job files within a flock
55
University at BuffaloThe State University of New York CCR Center for Computational Research Grid Enabled SnB Required Layered Grid Services Grid-enabled Application Layer Shake – and – Bake application Apache web server MySQL database High-level Service Layer Globus, NWS, PHP, Fortran, and C Core Service Layer Metacomputing Directory Service, Globus Security Interface, GRAM, GASS Local Service Layer Condor, MPI, PBS, Maui, WINNT, IRIX, Solaris, RedHat Linux
56
University at BuffaloThe State University of New York CCR Center for Computational Research Required Grid Services Applications Core Services Metacomputing Directory Service GRAM Globus Security Interface Heartbeat Monitor Nexus Gloperf Local Services LSF CondorMPI NQEEasy TCP Solaris IrixAIX UDP High-level Services and Tools DUROCglobusrunMPINimrod/GMPI-IOCC++ GlobusViewTestbed Status GASS Application Layer Shake-and-Bake Apache web server MySQL database High-level Services Globus, PHP, Fortran, C Core Services Metacomputing Directory Service, Globus Security Interface, GRAM, GASS Local Services Condor, MPI, PBS, Maui, WINNT, IRIX, Solaris, RedHat Linux Grid Implementation as a Layered Set of Services
57
University at BuffaloThe State University of New York CCR Center for Computational Research Grid Enabled SnB Execution User defines Grid-enabled SnB job using Grid Portal or SnB supplies location of data files from Data Grid supplies SnB mode of operation Grid Portal assembles required SnB data and supporting files, execution scripts, database tables. determines available ACDC-Grid resources. ACDC-Grid job management includes: automatic determination of appropriate execution times, number of trials, and number/location of processors, logging/status of concurrently executing resource jobs, & automatic incorporation of SnB trial results into the molecular structure database.
58
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Portal
59
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Portal Login Grid Portal login screen
60
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Capabilities Browser view of “mlgreen” user files stored in the Data Grid
61
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Capabilities Browser view of “miller” group files published by user “rappleye”
62
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Capabilities Browser view of “public” user files published by user “miller”
63
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Capabilities
64
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Capabilities
65
University at BuffaloThe State University of New York CCR Center for Computational Research Grid Portal Job Status Grid-enabled jobs can be monitored using the Grid Portal web interface dynamically. Charts are based on: total CPU hours, or total jobs, or total runtime. Usage data for: running jobs, or queued jobs. Individual or all resources. Grouped by: group, or user, or queue.
66
University at BuffaloThe State University of New York CCR Center for Computational Research Grid Portal Job Status
67
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Portal Condor Flock CondorView integrated into ACDC-Grid Portal
68
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Portal User Management user based Administrator based
69
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Portal Resource Management Administrator grants a user access to ACDC-Grid resources, software, and web pages.
70
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Administration
71
University at BuffaloThe State University of New York CCR Center for Computational Research Grid Enabled Data Mining Problem Statement Use all available resources in the ACDC-Grid for executing a data mining genetic algorithm optimization of SnB parameters for molecular structures having the same space group. Grid Enabling Criteria All heterogeneous resources in the ACDC-Grid are capable of executing the SnB application. All job results obtained from the ACDC-Grid resources are stored in a corresponding molecular structure databases.
72
University at BuffaloThe State University of New York CCR Center for Computational Research Grid Enabled Data Mining There are two modes of operation and two sets of stopping criteria: Data mining jobs can be submitted in a dedicated mode (time critical), where jobs are queued on ACDC-Grid resources, or in a back fill mode (non-time critical), where jobs are submitted to ACDC-Grid resource that have unused cycles available. There are two sets of stopping criteria: Continue submitting SnB data mining application jobs until the grid-enabled SnB application determines optimal parameters have been found, or indefinitely (grid job owner determines when optimal parameters have been found).
73
University at BuffaloThe State University of New York CCR Center for Computational Research Grid Enabled Data Mining Grid Portal Workflow Job Manager ACDC-Grid Computational Resources Molecular Structure Database Data Mining Criteria ACDC-Grid Data Grid ACDC-Grid Data Grid
74
University at BuffaloThe State University of New York CCR Center for Computational Research SnB Molecular Structure Database Molecular Structure Database
75
University at BuffaloThe State University of New York CCR Center for Computational Research Grid Enabled Data Mining Execution Scenario User defines a Grid-enabled data mining SnB job using the Grid Portal web interface supplying: designate which molecular structures parameter sets to optimize, data file metadata, and Grid-enabled SnB mode of operation dedicated or back fill mode, and Grid-enabled SnB stopping criteria. The Grid Portal assembles the required SnB application data and supporting files, execution scripts, database tables, and submits jobs for parameter optimization based on the current database statistics. ACDC-Grid job management includes: automatic determination of appropriate execution times, number of trials, and number of processors for each available resource, logging and status of all concurrently executing resource jobs, automatic incorporation of SnB trial results into the molecular structure database, and post processing of updated database for subsequent job submissions.
76
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC Data Grid Database Schema ACDC-Grid Data Grid ACDC-Grid Data Grid
77
University at BuffaloThe State University of New York CCR Center for Computational Research Grid Portal Job Status ACDC-Grid Computational Resources
78
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Overview Enable the transparent migration of data between various resources while preserving uniform access for the user. Maintain metadata information about each file and its location in a global database table. Currently using MySQL tables. Periodically migrate files between machines for more optimal usage of resources.
79
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Functionality Implement basic file management functions accessible via a platform-independent web interface. Features include: User-friendly menus/ interface. File Upload/ Download to and from the Data Grid Portal. Simple web-based file editor. Efficient search utility. Logical display of files for a given user in three divisions (user/ group/ public). Hierarchical vs. List-based 3 divisions: (user/ group/ public) Sorting capability based on file metadata, i.e. filename, size, modification time, etc.
80
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Functionality Support multiple access to files in the data grid. Implement basic Locking and Synchronization primitives for version control. Integrate security into the data grid. Implement basic authentication and authorization of users. Decide and enforce policies for data access and publishing.
81
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid File Migration Migration Algorithm File migration depends upon a number of factors: User access time Network capacity at time of migration User profile User disk quotas on various resources
82
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid File Migration We need to mine log files in order to determine How much data to migrate in one migration cycle? What is an appropriate migration cycle length? What is a user’s access pattern for files? What is the overall access pattern for particular files?
83
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid File Aging Global File Aging vs. Local File Aging User aging attribute Indicative of a user’s access across their own files. Attribute of a user’s profile. During migration time, this attribute will determine which user’s files should be migrated off of the grid portal onto a remote resource. Function of (file age, global file aging, resource usage)
84
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid File Aging File aging attribute Indicative of overall access to/migration activity of a particular file. Attribute in file_management table. Scale: 0 to 1 probability of whether or not to migrate file. File_aging_local_param initialized to 1. During migration time after a user has been chosen, this attribute will help determine which files of the user to migrate. i.e. Migrate a maximum of the top 5% of user’s files in any one cycle.
85
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid File Aging For a given user, the average of the file_aging_local_param attributes of all files should be close to 1. Operating tolerance before action is taken is within the range of 0.9 – 1.1. In this way, the user file_aging_global_param can be a function of this average. If the average file_aging_local_param attribute > 1.1, then files of the user are being held to long before being migrated. The file_aging_global_param value should be decreased. If the average file_aging_local_param attribute < 0.9, then files of the user are being accessed at a higher frequency than the file_aging_global_param value. The file_aging_global_param value should be increased.
86
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Resource Info
87
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Resource Info Both platforms have reduced bandwidth available for additional transfers
88
University at BuffaloThe State University of New York CCR Center for Computational Research Date Grid File Management Table
89
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid File Age File age, access time, and resource id denote: the amount of time since a file was accessed, when the file was accessed, and where the file currently resides respectively.
90
University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Summary The Data Grid algorithms are continually evolving to minimize network traffic and maximize disk space utilization on a per user basis by data mining user usage and disk space requirements. 182 GB Storage 100 GB Storage 56 GB Storage 100 GB Storage 70 GB Storage 136 GB Storage Storage Area Network 75 TB Network Attached Storage 480 GB CSE Multi-store 2 TB
91
University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Development/Maintenance Development Requirements 7 – Person months for Grid Services Coordinator Including Grid and Database conceptual design and implementation 5 – Person months for Grid Services Programmer Web portal programming 5 – Person months for System Administrator Globus, NWS, MDS, etc. installations 3 – Person months for Database Administrator Grid Portal Database implementation Minimum Maintenance Requirements 1 – Grid Services Coordinator 100% level of effort 1 – Grid Services Programmer 100% level of effort 1 – System Administrator 50% level of effort 1 – Database Administrator 10% level of effort
92
University at BuffaloThe State University of New York CCR Center for Computational Research Acknowledgments Steve Gallo Jason Rappleye Jeff Tilson Martins Innus Cynthia Cornelius National Science Foundation, National Institutes of Health, Oishei Foundation, Wendt Foundation, Sloan Foundation, Verizon, NYS Gov Pataki, Congressman Reynolds, Senator Clinton, Senator Schumer, Congressman Quinn George DeTitta Herb Hauptman Charles Weeks Steve Potter
93
University at BuffaloThe State University of New York CCR Center for Computational Research www.ccr.buffalo.edu
94
University at BuffaloThe State University of New York CCR Center for Computational Research Contact Information miller@buffalo.edu www.ccr.buffalo.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.