Presentation is loading. Please wait.

Presentation is loading. Please wait.

Russ Miller & Mark Green Center for Computational Research & Computer Science & Engineering SUNY-Buffalo Hauptman-Woodward Medical Inst GT’04 Panel: Storage.

Similar presentations


Presentation on theme: "Russ Miller & Mark Green Center for Computational Research & Computer Science & Engineering SUNY-Buffalo Hauptman-Woodward Medical Inst GT’04 Panel: Storage."— Presentation transcript:

1 Russ Miller & Mark Green Center for Computational Research & Computer Science & Engineering SUNY-Buffalo Hauptman-Woodward Medical Inst GT’04 Panel: Storage Considerations for Grid Computing Environments University at Buffalo The State University of New York NSF, NIH, DOE, NYS

2 University at BuffaloThe State University of New York CCR Center for Computational Research Apex Bioinformatics System  Sun V880 (3), Sun 6800  Sun 280R (2)  Intel PIIIs  Sun 3960: 7 TB Disk Storage HP/Compaq SAN  75 TB Disk  190 TB Tape  64 Alpha Processors (400 MHz)  32 GB RAM; 400 GB Disk IBM RS/6000 SP: 78 Processors Sun Cluster: 80 Processors SGI Intel Linux Cluster  150 PIII Processors (1 GHz)  Myrinet Major CCR Resources (12TF & 290TB) Dell Linux Cluster: #22  #25  #38  600 P4 Processors (2.4 GHz)  600 GB RAM; 40 TB Disk; Myrinet Dell Linux Cluster: #187  #368  off  4036 Processors (PIII 1.2 GHz)  2TB RAM; 160TB Disk; 16TB SAN IBM BladeCenter Cluster  532 P4 Processors (2.8 GHz)  5TB SAN SGI Origin3700 (Altix)  64 Processors (1.3GHz ITF2)  256 GB RAM  2.5 TB Disk SGI Origin3800  64 Processors (400 MHz)  32 GB RAM; 400 GB Disk

3 University at BuffaloThe State University of New York CCR Center for Computational Research Advanced CCR Data Center (ACDC) Computational Grid Overview 300 Dual Processor 2.4 GHz Intel Xeon RedHat Linux 7.3 38.7 TB Scratch Space Joplin: Compute Cluster 75 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 1.8 TB Scratch Space Nash: Compute Cluster 9 Single Processor Dell P4 Desktops School of Dental Medicine 13 Various SGI IRIX Processors Hauptman-Woodward Institute 25 Single Processor Sun Ultra5s Computer Science & EngineeringCrosby: Compute Cluster SGI Origin 3800 64 - 400 MHz IP35 IRIX 6.5.14m 360 GB Scratch Space 9 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 315 GB Scratch Space Mama: Compute Cluster 16 Dual Sun Blades 47 Sun Ultra5 Solaris 8 770 GB Scratch Space Young: Compute Cluster T1 Connection Note: Network connections are 100 Mbps unless otherwise noted. 19 IRIX, RedHat, & WINNT Processors CCR RedHat, IRIX, Solaris, WINNT, etc Expanding ACDC: Grid Portal 4 Processor Dell 6650 1.6 GHz Intel Xeon RedHat Linux 9.0 66 GB Scratch Space 1 Dual Processor 250 MHz IP30 IRIX 6.5 Fogerty: Condor Flock Master

4 University at BuffaloThe State University of New York CCR Center for Computational Research BCOEB Medical/Dental Network Connections

5 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC Data Grid Overview (Grid-Available Data Repositories ) 300 Dual Processor 2.4 GHz Intel Xeon RedHat Linux 7.3 38.7 TB Scratch Space Joplin: Compute Cluster 75 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 1.8 TB Scratch Space Nash: Compute Cluster 4 Processor Dell 6650 1.6 GHz Intel Xeon RedHat Linux 9.0 66 GB Scratch Space ACDC: Grid PortalCrosby: Compute Cluster SGI Origin 3800 64 - 400 MHz IP35 IRIX 6.5.14m 360 GB Scratch Space 9 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 315 GB Scratch Space Mama: Compute Cluster 16 Dual Sun Blades 47 Sun Ultra5 Solaris 8 770 GB Scratch Space Young: Compute Cluster Note: Network connections are 100 Mbps unless otherwise noted. 182 GB Storage 100 GB Storage 56 GB Storage 100 GB Storage 70 GB Storage Network Attached Storage 1.2 TB Storage Area Network 75 TB 136 GB Storage CSE Multi-Store 2 TB

6 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC Data Grid Overview 300 Dual Processor 2.4 GHz Intel Xeon RedHat Linux 7.3 38.7 TB Scratch Space Joplin: Compute Cluster 75 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 1.8 TB Scratch Space Nash: Compute Cluster 4 Processor Dell 6650 1.6 GHz Intel Xeon RedHat Linux 9.0 66 GB Scratch Space ACDC: Grid PortalCrosby: Compute Cluster SGI Origin 3800 64 - 400 MHz IP35 IRIX 6.5.14m 360 GB Scratch Space 9 Dual Processor 1 GHz Pentium III RedHat Linux 7.3 315 GB Scratch Space Mama: Compute Cluster 16 Dual Sun Blades 47 Sun Ultra5 Solaris 8 770 GB Scratch Space Young: Compute Cluster Note: Network connections are 100 Mbps unless otherwise noted. 182 GB Storage 100 GB Storage 56 GB Storage 100 GB Storage 70 GB Storage Network Attached Storage 480 GB Storage Area Network 75 TB 136 GB Storage CSE Multi-Store 2 TB

7 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Browser view of “miller” group files published by user “rappleye”

8 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Administration

9 University at BuffaloThe State University of New York CCR Center for Computational Research Grid-enabling Application Templates Structural Biology Earthquake Engineering Pollution Abatement Geographic Information Systems & BioHazards

10 University at BuffaloThe State University of New York CCR Center for Computational Research Grid-enabling Application Templates Structural Biology:  The Shake-and-Bake algorithm for molecular structure determination has been used in a routine fashion to solve difficult atomic resolution structures.  The running time of SnB varies widely as a function of the size of the structure, the quality of the data, the space group, and choices of critical input parameters, including the size of the Fourier grid, the number of reflections, the number and type of invariants, and the number of cycles of the procedure used per trial structure, to name a few.  SnB is being augmented with a data repository that stores information for every SnB run, regardless of where the job is run.  This information is then mined in an automated fashion in order to optimize 17 key SnB parameters in an effort to optimize the procedure for solving previously unknown structures.

11 University at BuffaloThe State University of New York CCR Center for Computational Research Grid-enabling Application Templates Earthquake Engineering:  In our effort to develop disaster-resilient communities, there is a need to model, understand, and ultimately direct the behavior of a wide variety of complex multi- scale systems, including the many engineering systems that shape our physical environment.  Two aspects of the structural system needs:  a multiscale evaluation tools of progressive collapse of structures and  use of such evaluation tools in a new general framework for aseismic design and retrofit, based upon evolutionary methodologies.

12 University at BuffaloThe State University of New York CCR Center for Computational Research Grid-enabling Application Templates Geographic Information Systems and Biohazards:  A project for developing a transport model capable of predicting the movement of a harmful algal bloom in a lake is currently underway.  The models will be developed for each of the three main lakes in the Monitoring and Event Response for Harmful Algal Blooms (MERHAB) study: Ontario, Erie and Champlain.  To accomplish the goal of predicting bloom movement, we propose to maintain a near real-time database for water velocity fields in the lakes and to provide short- term predictions of lake circulations.  This will require a combination of hydrodynamic and transport modeling, along with linkages to various data sources, including regional weather stations, water monitoring stations, and satellite data.

13 University at BuffaloThe State University of New York CCR Center for Computational Research Grid-enabling Application Templates Pollution Abatement:  With increasing population growth and reliance on groundwater sources for drinking water, the protection of groundwater supplies has become increasingly important.  Current projects will lead to the release of software that can be used for a variety of applications, such as  the development of statewide strategies for groundwater protection,  the assessment of the regional impact of waste disposal facilities, and  modeling the impact of global climate change on groundwater supplies.

14 University at BuffaloThe State University of New York CCR Center for Computational Research Grid-enabling Application Templates Geophysical Mass Flows:  The risk of potential volcanic eruptions and associated mass flows is a problem that public safety authorities throughout the world face several times a year.  Flow models are useful to forecast the movement of volcanic materials on or above the surface.  Applications of such models include:  pre-crisis understanding of hazards and developing risk maps,  real-time crisis assistance and management, and  post-crisis reconstruction and distribution of aid.

15 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Collaborations Advanced Computational Data Center – Grid (ACDC-Grid) Innovative Laboratory Prototype Grid3+ Collaboration High-Performance Networking Infrastructure HP Labs Collaboration IBM (under discussion) Open Science Grid (Future)

16 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Collaborations Advanced Computational Data Center – Grid (ACDC-Grid):  These resources will be shared by researchers from several departments working on a diverse suite of problems.  Prototype grid-enabled applications: Shake-and-Bake, evolutionary passively-damped structures, Great Lakes Princeton Ocean Model (POM), geophysical mass flows, harmful algal bloom particle tracking applications, etc. Innovative Laboratory Prototype:  Research on providing a low cost, environment friendly laboratory model to teach and research grid technologies and provide a unique team-learning model for collaboration among people with diverse skills and educational backgrounds.

17 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Collaborations Grid3+ Collaboration:  The ACDC Job Monitoring system has been deployed on Grid2003 resources and is designed to be a light-weight and non-intrusive tool for monitoring applications and resources on computational grids. It provides a real-time and historical retrospective of the utilization of such resources, which can be used to track efficiency, adjust grid-based scheduling, and perform a predictive assignment of applications to resources. High-Performance Networking Infrastructure:  Determination of LAN/MAN/WAN-wide performance metrics of distributed computing services including disaster-tolerant storage and HA distributed-/grid- environments.

18 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Collaborations HP labs Collaboration:  Bi-weekly meetings for development of a light-weight version of the Globus Toolkit and campus-wide grid- enabling tools between the research center and key scientists and engineers at HP is ongoing. Group to group collaboration infrastructure – The Access Grid:  Research into determination of the attributes necessary in a rapid deployment AG node. Development/benchmarking of new AG shared services including shared presentations and shared digital video environments. Open Science Grid Collaboration: Future

19 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Cyber-Infrastructure Predictive Scheduler  Define quality of service estimates of job completion, by better estimating job runtimes by profiling users. Data Grid  Automated Data File Migration based on profiling users. High-performance Grid-enabled Data Repositories  Develop automated procedures for dynamic data repository creation and deletion. Dynamic Resource Allocation  Develop automated procedures for dynamic computational resource allocation.

20 University at BuffaloThe State University of New York CCR Center for Computational Research Middleware Globus Toolkit 2.2.4  direct upgrade WSRF Condor 6.6.0 Network Weather Service 2.6 Apache2 HTTP Server PHP 4.3.0 MySQL 3.23 phpMyAdmin 2.5.1

21 University at BuffaloThe State University of New York CCR Center for Computational Research Biomedical Advances PSA Test (screen for Prostate Cancer) Avonex: Interferon Treatment for Multiple Sclerosis Artificial Blood Nicorette Gum Fetal Viability Test Implantable Pacemaker Edible Vaccine for Hepatitis C Timed-Release Insulin Therapy Anti-Arrythmia Therapy  Tarantula venom Direct Methods Structure Determination  Listed on “Top Ten Algorithms of the 20 th Century”  Vancomycin  Gramacidin A High Throughput Crystallization Method: Patented NIH National Genomics Center: Northeast Consortium Howard Hughes Medical Institute: Center for Genomics & Proteomics

22 University at BuffaloThe State University of New York CCR Center for Computational Research Bioinformatics in Buffalo A $360M Initiative New York State: $121M Federal Appropriations: $13M Corporate: $146 Foundation: $15M Grants & Contracts: $64M

23 University at BuffaloThe State University of New York CCR Center for Computational Research Bioinformatics Partners Lead Institutions  University at Buffalo (UB)  Hauptman-Woodward Medical Research Inst.  Roswell Park Cancer Institute Corporate Partners  Amersham Pharmacia, Beckman Coulter, Bristol Myers Squibb, General Electric, Human Genome Sciences, Immco, Invitrogen, Pfizer Pharmaceutical, Wyeth Lederle, Zeptometrix  Dell, HP, SGI, Stryker, Sun  AT&T, Sloan Foundation  InforMax, Q-Chem, 3M, Veridian  BioPharma Ireland, Confederation of Indian Industries

24 University at BuffaloThe State University of New York CCR Center for Computational Research Center for Computational Research 1999-2004 Snapshot Raptor Image High-Performance Computing and High-End Visualization  110 Research Groups in 27 Depts  13 Local Companies  10 Local Institutions External Funding  $111M External Funding  $13.5M as lead  $97.5M in support  $41.8M Vendor Donations  $360M Bioinformatics Initiative Deliverables  350+ Publications  Software, Media, Algorithms, Consulting, Training, CPU Cycles…

25 University at BuffaloThe State University of New York CCR Center for Computational Research CCR Visualization Resources Fakespace ImmersaDesk R2  Portable 3D Device Tiled-Display Wall  20 NEC projectors: 15.7M pixels  Screen is 11’  7’  Dell PCs with Myrinet2000 Access Grid Node  Group-to-Group Communication  Commodity components SGI Reality Center 3300W  Dual Barco’s on 8’  4’ screen VREX VR-4200 Stereo Imaging Projector  Portable projector works with PC

26 University at BuffaloThe State University of New York CCR Center for Computational Research CCR Visualization Resources Fakespace ImmersaDesk R2  Portable 3D Device Tiled-Display Wall  20 NEC projectors: 15.7M pixels  Screen is 11’  7’  Dell PCs with Myrinet2000 Access Grid Node  Group-to-Group Communication  Commodity components SGI Reality Center 3300W  Dual Barco’s on 8’  4’ screen

27 University at BuffaloThe State University of New York CCR Center for Computational Research Medical/Dental BCOEB Network Connections (New)

28 University at BuffaloThe State University of New York CCR Center for Computational Research WNY Grid Highlights Heterogeneous Computational & Data Grid Currently in Beta with Shake-and-Bake WNY Release in 2H04 Bottom-Up General Purpose Implementation  Ease-of-Use User Tools  Administrative Tools Back-End Intelligence  Backfill Operations  Prediction and Analysis of Resources to Run Jobs (Compute Nodes + Requisite Data)

29 University at BuffaloThe State University of New York CCR Center for Computational Research Objective: Provide a 3-D mapping of the atoms in a crystal. Procedure: 1. Isolate a single crystal. 2. Perform the X-Ray diffraction experiment. 3. Determine molecular structure that agrees with diffration data. X-Ray Crystallography

30 University at BuffaloThe State University of New York CCR Center for Computational Research X-Ray DataMolecular Structure FFT FFT -1 X-Ray Data & Corresponding Molecular Structure Phases lost during the crystallographic experiment. Phase Problem: Determine phases of the reflections. Underlying atomic arrangement is related to the reflections by a 3-D Fourier transform. Reciprocal Space Real Space

31 University at BuffaloThe State University of New York CCR Center for Computational Research Shake-and-Bake Method: Dual-Space Refinement FFT Trial Phases Solutions ? Phase Refinement Tangent Formula Reciprocal SpaceReal Space “Shake”“Bake” Phase Refinement FFT -1 Parameter Shift Density Modification (Peak Picking) (LDE) Trial Structures Shake-and-Bake Structure Factors DeTitta, Hauptman, Miller, Weeks

32 University at BuffaloThe State University of New York CCR Center for Computational Research Atoms: 74Phases: 740 Space Group: P1Triples: 7,400 Trials: 100 Cycles: 40 Rmin range: 0.243 - 0.429 Ph8755: SnB Histogram

33 University at BuffaloThe State University of New York CCR Center for Computational Research Number of Atoms in Structure 0 100 1,000 10,000 100,000 Conventional Direct Methods Shake-and-Bake Multiple Isomorphous Replacement Se-Met Se-Met with Shake-and-Bake Vancomycin 190kDa ? ? Phasing and Structure Size

34 University at BuffaloThe State University of New York CCR Center for Computational Research Molecular Structure Determination SnB Software by UB/HWI  “Top Algorithms of the Century” Critical to Rational Drug Design Important Link in Structural Biology Current Effort  Grid  Collaboratory  Intelligent Learning

35 University at BuffaloThe State University of New York CCR Center for Computational Research Antibiotics & Supercomputers Result: New, better drugs in shorter time Vancomycin solved with SnB (UB/HWI)  SnB: “Top Algorithms of the Century”  “Antibiotic of Last Resort”  Original molecular structure required 5 months  (Re)solved in a single day on CCR’s supercomputers  Current Efforts: Grid, Collaboratory, Intelligent Learning

36 University at BuffaloThe State University of New York CCR Center for Computational Research Photograph of Crystal

37 University at BuffaloThe State University of New York CCR Center for Computational Research Useful Relationships for Multiple Trial Phasing Tangent Formula Parameter Shift Optimization

38 University at BuffaloThe State University of New York CCR Center for Computational Research Structure of SnB SnB Process TrialsHistogramVisualization

39 University at BuffaloThe State University of New York CCR Center for Computational Research Vancomycin Crystal Structure Views (courtesy of P. Loll & P. Axelsen)

40 University at BuffaloThe State University of New York CCR Center for Computational Research Workstations  SGI, Sun, DEC/Alpha  Linux Parallel Computers  Cray T3D/E, TMC CM-5, IBM SP2  HP-Convex Exemplar  SGI Origin2/3000 & Onyx 2/3  IBM SP – heterogeneous  Linux Clusters  Sun Cluster  Condor Flock  Computational Grid Computing Platforms

41 University at BuffaloThe State University of New York CCR Center for Computational Research Molecular Structure Determination SnB Software by UB/HWI  “Top Algorithms of the Century” Critical to Rational Drug Design Important Link in Structural Biology Current Effort  Grid  Collaboratory  Intelligent Learning

42 University at BuffaloThe State University of New York CCR Center for Computational Research Vancomycin Crystal (courtesy of P. Loll)

43 University at BuffaloThe State University of New York CCR Center for Computational Research The Diffraction Pattern Experiment yields: o reflections o associated intensities Phase angles are lost in experiment.

44 University at BuffaloThe State University of New York CCR Center for Computational Research Experiment yields: o reflections o associated intensities Phase angles are lost in experiment. Underlying atomic arrangement is related to the reflections by a 3-D Fourier transform. Phase Problem: determine the set of phases corresponding to the reflections. The Phase Problem

45 University at BuffaloThe State University of New York CCR Center for Computational Research FFT Trial Phases Solutions ? Phase Refinement Density Modification (Peak Picking) Tangent Formula Reciprocal SpaceReal Space Conventional Direct Methods

46 University at BuffaloThe State University of New York CCR Center for Computational Research Ph8755: Trace of SnB Solution Atoms: 74Space Group: P1 SnB Cycles: 40

47 University at BuffaloThe State University of New York CCR Center for Computational Research Interferes with formation of bacterial walls Last line of defense against deadly  streptococcal and staphylococcal bacteria strains Vancomycin resistance exists (Michigan) Can’t just synthesize variants and test Need structure-based approach to predict Solution with SnB (Shake-and-Bake)  Pat Loll  George Sheldrick Vancomycin

48 University at BuffaloThe State University of New York CCR Center for Computational Research Grid Server Console (Vancomycin)

49 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Motivation& Goal Motivation:  Large data collections are emerging as important community resources.  Data Grids inherently complements Computational Grids, which manipulate data.  A data grid denotes a large network of distributed storage resources such as archival systems, caches, and databases, which are linked logically to create a sense of global persistence. Goal:  To design and implement transparent management of data distributed across heterogeneous resources, such that the data is accessible via a uniform web interface.

50 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Summary 544 GB Storage  Located on 6 heterogeneous ACDC-Grid resources 480 GB Storage  Located on 1 dual processor Dell PowerVault server 75,000 GB Storage (10/03)  Served by 4 – 16 processor HP GS1280 servers 2,000 GB Storage  Served by Sun Ultra-60 servers 78,024 GB Total Data Grid Storage available and accessible from the ACDC-Grid Portal 182 GB Storage 100 GB Storage 56 GB Storage 100 GB Storage 70 GB Storage 136 GB Storage Storage Area Network 75 TB Network Attached Storage 480 GB CSE Multi-Store 2 TB

51 University at BuffaloThe State University of New York CCR Center for Computational Research Grid-Based SnB Objectives Install Grid-Enabled Version of SnB Job Submission and Monitoring over Internet SnB Output Stored in Database SnB Output Mined through Internet-Based Integrated Querying Tool Serve as Template for Chem-Grid & Bio-Grid Experience with Globus and Related Tools

52 University at BuffaloThe State University of New York CCR Center for Computational Research Grid Enabled SnB Problem Statement  Use all available resources in the ACDC-Grid for determining a single molecular structure. Grid Enabling Criteria  All heterogeneous resources in the ACDC-Grid are capable of executing the SnB application.  All job results obtained from the ACDC-Grid resources are stored in a corresponding molecular structure database.  There are three modes of operation:  Continue submitting SnB application jobs until the grid-enabled SnB application determines a solution has been found, or “X” number of trials have been evaluated, or indefinitely (grid job owner determines when a solution has been found).

53 University at BuffaloThe State University of New York CCR Center for Computational Research Grid Services and Applications ACDC-Grid Computational Resources ACDC-Grid Data Resources ACDC-Grid Data Resources Applications Local Services LSF Condor MPI TCP SolarisIrix WINNT UDP High-level Services and Tools Globus Toolkit globusrun MPI NWS MPI-IO Core Services Metacomputing Directory Service GRAM Globus Security Interface GASS C, C++, Fortran, PHP Shake-and-Bake Oracle MySQL Apache PBS Maui Scheduler RedHat Linux Stork Adapted from Ian Foster and Carl Kesselman

54 University at BuffaloThe State University of New York CCR Center for Computational Research Notes Apache – web portal server PHP - used by apache server for dynamic web portal pages MDS – traditional to use MDS with LDAP but we use MDS with MYSql grid portal database to keep information of available resources (we poll every 15 mins) GRAM – Globus Resource Allocation Manager – API for requesting comptuational jobs GASS – Global Access to Secondary Storage – API for accessing files stored on various platforms Stork – Condor module for transporting job files within a flock

55 University at BuffaloThe State University of New York CCR Center for Computational Research Grid Enabled SnB Required Layered Grid Services  Grid-enabled Application Layer  Shake – and – Bake application  Apache web server  MySQL database  High-level Service Layer  Globus, NWS, PHP, Fortran, and C  Core Service Layer  Metacomputing Directory Service, Globus Security Interface, GRAM, GASS  Local Service Layer  Condor, MPI, PBS, Maui, WINNT, IRIX, Solaris, RedHat Linux

56 University at BuffaloThe State University of New York CCR Center for Computational Research Required Grid Services Applications Core Services Metacomputing Directory Service GRAM Globus Security Interface Heartbeat Monitor Nexus Gloperf Local Services LSF CondorMPI NQEEasy TCP Solaris IrixAIX UDP High-level Services and Tools DUROCglobusrunMPINimrod/GMPI-IOCC++ GlobusViewTestbed Status GASS Application Layer  Shake-and-Bake  Apache web server  MySQL database High-level Services  Globus, PHP, Fortran, C Core Services  Metacomputing Directory Service, Globus Security Interface, GRAM, GASS Local Services  Condor, MPI, PBS, Maui, WINNT, IRIX, Solaris, RedHat Linux Grid Implementation as a Layered Set of Services

57 University at BuffaloThe State University of New York CCR Center for Computational Research Grid Enabled SnB Execution  User  defines Grid-enabled SnB job using Grid Portal or SnB  supplies location of data files from Data Grid  supplies SnB mode of operation  Grid Portal  assembles required SnB data and supporting files, execution scripts, database tables.  determines available ACDC-Grid resources.  ACDC-Grid job management includes:  automatic determination of appropriate execution times, number of trials, and number/location of processors,  logging/status of concurrently executing resource jobs, &  automatic incorporation of SnB trial results into the molecular structure database.

58 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Portal

59 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Portal Login Grid Portal login screen

60 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Capabilities Browser view of “mlgreen” user files stored in the Data Grid

61 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Capabilities Browser view of “miller” group files published by user “rappleye”

62 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Capabilities Browser view of “public” user files published by user “miller”

63 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Capabilities

64 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Capabilities

65 University at BuffaloThe State University of New York CCR Center for Computational Research Grid Portal Job Status Grid-enabled jobs can be monitored using the Grid Portal web interface dynamically.  Charts are based on:  total CPU hours, or  total jobs, or  total runtime.  Usage data for:  running jobs, or  queued jobs.  Individual or all resources.  Grouped by:  group, or  user, or  queue.

66 University at BuffaloThe State University of New York CCR Center for Computational Research Grid Portal Job Status

67 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Portal Condor Flock CondorView integrated into ACDC-Grid Portal

68 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Portal User Management user based Administrator based

69 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Portal Resource Management Administrator grants a user access to ACDC-Grid  resources,  software, and  web pages.

70 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Administration

71 University at BuffaloThe State University of New York CCR Center for Computational Research Grid Enabled Data Mining Problem Statement  Use all available resources in the ACDC-Grid for executing a data mining genetic algorithm optimization of SnB parameters for molecular structures having the same space group. Grid Enabling Criteria  All heterogeneous resources in the ACDC-Grid are capable of executing the SnB application.  All job results obtained from the ACDC-Grid resources are stored in a corresponding molecular structure databases.

72 University at BuffaloThe State University of New York CCR Center for Computational Research Grid Enabled Data Mining  There are two modes of operation and two sets of stopping criteria:  Data mining jobs can be submitted in a dedicated mode (time critical), where jobs are queued on ACDC-Grid resources, or in a back fill mode (non-time critical), where jobs are submitted to ACDC-Grid resource that have unused cycles available.  There are two sets of stopping criteria:  Continue submitting SnB data mining application jobs until the grid-enabled SnB application determines optimal parameters have been found, or indefinitely (grid job owner determines when optimal parameters have been found).

73 University at BuffaloThe State University of New York CCR Center for Computational Research Grid Enabled Data Mining Grid Portal Workflow Job Manager ACDC-Grid Computational Resources Molecular Structure Database Data Mining Criteria ACDC-Grid Data Grid ACDC-Grid Data Grid

74 University at BuffaloThe State University of New York CCR Center for Computational Research SnB Molecular Structure Database Molecular Structure Database

75 University at BuffaloThe State University of New York CCR Center for Computational Research Grid Enabled Data Mining Execution Scenario  User defines a Grid-enabled data mining SnB job using the Grid Portal web interface supplying:  designate which molecular structures parameter sets to optimize,  data file metadata, and  Grid-enabled SnB mode of operation dedicated or back fill mode, and  Grid-enabled SnB stopping criteria.  The Grid Portal assembles the required SnB application data and supporting files, execution scripts, database tables, and submits jobs for parameter optimization based on the current database statistics.  ACDC-Grid job management includes:  automatic determination of appropriate execution times, number of trials, and number of processors for each available resource,  logging and status of all concurrently executing resource jobs,  automatic incorporation of SnB trial results into the molecular structure database, and  post processing of updated database for subsequent job submissions.

76 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC Data Grid Database Schema ACDC-Grid Data Grid ACDC-Grid Data Grid

77 University at BuffaloThe State University of New York CCR Center for Computational Research Grid Portal Job Status ACDC-Grid Computational Resources

78 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Overview Enable the transparent migration of data between various resources while preserving uniform access for the user.  Maintain metadata information about each file and its location in a global database table.  Currently using MySQL tables.  Periodically migrate files between machines for more optimal usage of resources.

79 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Functionality Implement basic file management functions accessible via a platform-independent web interface. Features include:  User-friendly menus/ interface.  File Upload/ Download to and from the Data Grid Portal.  Simple web-based file editor.  Efficient search utility.  Logical display of files for a given user in three divisions (user/ group/ public).  Hierarchical vs. List-based  3 divisions: (user/ group/ public)  Sorting capability based on file metadata, i.e. filename, size, modification time, etc.

80 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Functionality Support multiple access to files in the data grid.  Implement basic Locking and Synchronization primitives for version control. Integrate security into the data grid.  Implement basic authentication and authorization of users.  Decide and enforce policies for data access and publishing.

81 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid File Migration Migration Algorithm  File migration depends upon a number of factors:  User access time  Network capacity at time of migration  User profile  User disk quotas on various resources

82 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid File Migration We need to mine log files in order to determine  How much data to migrate in one migration cycle?  What is an appropriate migration cycle length?  What is a user’s access pattern for files?  What is the overall access pattern for particular files?

83 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid File Aging Global File Aging vs. Local File Aging  User aging attribute  Indicative of a user’s access across their own files.  Attribute of a user’s profile.  During migration time, this attribute will determine which user’s files should be migrated off of the grid portal onto a remote resource.  Function of (file age, global file aging, resource usage)

84 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid File Aging File aging attribute  Indicative of overall access to/migration activity of a particular file.  Attribute in file_management table.  Scale: 0 to 1 probability of whether or not to migrate file.  File_aging_local_param initialized to 1.  During migration time after a user has been chosen, this attribute will help determine which files of the user to migrate.  i.e. Migrate a maximum of the top 5% of user’s files in any one cycle.

85 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid File Aging For a given user, the average of the file_aging_local_param attributes of all files should be close to 1.  Operating tolerance before action is taken is within the range of 0.9 – 1.1. In this way, the user file_aging_global_param can be a function of this average.  If the average file_aging_local_param attribute > 1.1, then files of the user are being held to long before being migrated.  The file_aging_global_param value should be decreased.  If the average file_aging_local_param attribute < 0.9, then files of the user are being accessed at a higher frequency than the file_aging_global_param value.  The file_aging_global_param value should be increased.

86 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Resource Info

87 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Resource Info Both platforms have reduced bandwidth available for additional transfers

88 University at BuffaloThe State University of New York CCR Center for Computational Research Date Grid File Management Table

89 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid File Age File age, access time, and resource id denote:  the amount of time since a file was accessed,  when the file was accessed, and  where the file currently resides respectively.

90 University at BuffaloThe State University of New York CCR Center for Computational Research Data Grid Summary The Data Grid algorithms are continually evolving to minimize network traffic and maximize disk space utilization on a per user basis by data mining user usage and disk space requirements. 182 GB Storage 100 GB Storage 56 GB Storage 100 GB Storage 70 GB Storage 136 GB Storage Storage Area Network 75 TB Network Attached Storage 480 GB CSE Multi-store 2 TB

91 University at BuffaloThe State University of New York CCR Center for Computational Research ACDC-Grid Development/Maintenance Development Requirements  7 – Person months for Grid Services Coordinator  Including Grid and Database conceptual design and implementation  5 – Person months for Grid Services Programmer  Web portal programming  5 – Person months for System Administrator  Globus, NWS, MDS, etc. installations  3 – Person months for Database Administrator  Grid Portal Database implementation Minimum Maintenance Requirements  1 – Grid Services Coordinator  100% level of effort  1 – Grid Services Programmer  100% level of effort  1 – System Administrator  50% level of effort  1 – Database Administrator  10% level of effort

92 University at BuffaloThe State University of New York CCR Center for Computational Research Acknowledgments Steve Gallo Jason Rappleye Jeff Tilson Martins Innus Cynthia Cornelius National Science Foundation, National Institutes of Health, Oishei Foundation, Wendt Foundation, Sloan Foundation, Verizon, NYS Gov Pataki, Congressman Reynolds, Senator Clinton, Senator Schumer, Congressman Quinn George DeTitta Herb Hauptman Charles Weeks Steve Potter

93 University at BuffaloThe State University of New York CCR Center for Computational Research www.ccr.buffalo.edu

94 University at BuffaloThe State University of New York CCR Center for Computational Research Contact Information miller@buffalo.edu www.ccr.buffalo.edu


Download ppt "Russ Miller & Mark Green Center for Computational Research & Computer Science & Engineering SUNY-Buffalo Hauptman-Woodward Medical Inst GT’04 Panel: Storage."

Similar presentations


Ads by Google