Ames Research CenterDivision 1 Grids for Dummies Featuring Earth Science Data Mining Application Thomas H. Hinke NASA Ames Research Center Moffett Field,

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

Current status of grids: the need for standards Mike Mineter TOE-NeSC, Edinburgh.
Peter Berrisford RAL – Data Management Group SRB Services.
Earth System Curator Spanning the Gap Between Models and Datasets.
C. Grimme, A. Papaspyrou Scheduling in C3-Grid AstroGrid-D Workshop Project: C3-Grid Collaborative Climate Community Data and Processing Grid Scheduling.
High Performance Computing Course Notes Grid Computing.
EInfrastructures (Internet and Grids) US Resource Centers Perspective: implementation and execution challenges Alan Blatecky Executive Director SDSC.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Application of GRID technologies for satellite data analysis Stepan G. Antushev, Andrey V. Golik and Vitaly K. Fischenko 2007.
The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Simo Niskala Teemu Pasanen
Globus Computing Infrustructure Software Globus Toolkit 11-2.
Introduction to Grid Computing Ann Chervenak Carl Kesselman And the members of the Globus Team.
Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
UAH GRIDS Center Middleware Testing Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Commodity Grid (CoG) Kits Keith Jackson, Lawrence Berkeley National Laboratory Gregor von Laszewski, Argonne National Laboratory.
Long Term Ecological Research Network Information System LTER Grid Pilot Study LTER Information Manager’s Meeting Montreal, Canada 4-7 August 2005 Mark.
CoG Kit Overview Gregor von Laszewski Keith Jackson.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Ames Research Center NAS Division 1 A Data Miner for the Information Power Grid Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
Perspectives on Grid Technology Ian Foster Argonne National Laboratory The University of Chicago.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Introduction to Grid Computing Ed Seidel Max Planck Institute for Gravitational Physics
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Commodity Grid Kits Gregor von Laszewski (ANL), Keith Jackson (LBL) Many state-of-the-art scientific applications, such as climate modeling, astrophysics,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1October 9, 2001 Sun in Scientific & Engineering Computing Grid Computing with Sun Wolfgang Gentzsch Director Grid Computing Cracow Grid Workshop, November.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Middleware Camp NMI (NSF Middleware Initiative) Program Director Alan Blatecky Advanced Networking Infrastructure and Research.
ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
7. Grid Computing Systems and Resource Management
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Global Grid Forum (GGF) Orientation
Grid Computing Software Interface
Presentation transcript:

Ames Research CenterDivision 1 Grids for Dummies Featuring Earth Science Data Mining Application Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research CenterDivision 2 Outline Use of Grids for Applications –What are grids –Grids from a user’s perspective –Grid support for Earth Science applications such as data Mining Global Grid Forum –Background –Organization –Current work

Ames Research CenterDivision 3 What Are Grids? “Grids are persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations.” [

Middleware Makes the Grid Network Processor XProcessor Y Grid Services Packaged as Web Services Network Services Grid Common Functions Application Web Services Applications Grid Services Packaged as Web Services Network Services Grid Common Functions Application Web Services Applications

Ames Research CenterDivision 5 Characteristics Usually Found in Grids An underlying security infrastructure such as the Grid Security Infrastructure (GSI), which is based on public key technology –Protection for at least authentication information as it flows from resource to resource Single sign-on A seamless processing environment An infrastructure that is scalable to a large number of resources The ability for the grid components to cross administrative boundaries

Ames Research CenterDivision 6 Why are Grids Important? Computing and data Grids are emerging as the infrastructure for 21 st century science, engineering and high-performance applications and systems –Grids provide a common way of managing distributed computing, data, instrument, and human resources Grids facilitate collaboration by providing the glue of large- scale science and engineering. –A common way to access and use shared data and simulations –A common security model to facilitate the interaction of many different people from many different institutions Grids provide a middle-ware environment that eases the development of complex systems. –Grids can facilitate the development of large-scale science, engineering and operational applications That are widely distributed That are processing and/or data intensive

Ames Research CenterDivision 7 How the User Sees a Grid A set of grid functions that are available as –Application programmer interfaces (APIs) –Command-line functions After authentication, functions can be used to –Spawn jobs on different processors with a single command –Access data on remote systems –Move data from one processor to another –Support the communication between programs executing on different processors –Discover the properties of computational resources available on the grid using the grid information service –Use a broker to select the best place for a job to run and then negotiate the reservation and execution (coming soon).

Ames Research CenterDivision 8 What Will Grids Provide? Support for collaboration –Common authentication and security infrastructure –Common mechanisms to share data –Common mechanisms to access computing resources –Management of community databases Uniform data access –Standardized mechanisms for accessing archival datasets –Common mechanisms for managing metadata Support for building systems –Very few applications use a single computer –At least some of the resources needed to solve one’s problem invariably reside elsewhere –Grids will supply the core capabilities common to most applications, so that application developers do not have to re-implement this core capability with each application

Ames Research CenterDivision 9 Web Access to the Grid is Available Some web portals exist for accessing grids –LaunchPad Developed as part of the NASA Information Power Grid project Uses Java Server Pages and Java Beans –Built using the Grid Portal Development Kit –GridPort Developed at the San Diego Super Computer Center Uses Perl

Ames Research CenterDivision 10 How an Application Developer Sees a Grid A set of grid functions A set of grid functions packaged as web services –Interface is defined through WSDL (Web Services Description Language) –Standard access protocol is SOAP (Simple Object Access Protocol)

Ames Research CenterDivision 11 What a User Gains By Using a Grid As a direct user –Can easily Execute jobs at one or more remote sites Move data between sites All with single sign-on security As a user of a grid enabled application Will not see the grid Will see an application whose development was eased with grid functions or grid-based web services Ease of development should result in more applications or faster availability of applications

Ames Research CenterDivision 12 What Application Developers Gain by Using Grids Application web services can be built by re-using capabilities provided by existing grid-enabled Web services. Applications can also be built by using grid functions Grid functions/services handle distributed management of tasks and data –Developer can focus on logic of application and not logic of distributed interaction

Ames Research CenterDivision 13 Grids Support Various Communities of Use Scientists and domain problem solves and other users –They will use the applications and services that the grid facilitates. They need to be able to express a problem or experiment in application domain-specific terms, specify the drivers (initial conditions, live data sources, etc.) request that the solution be obtained, and manage the resulting graphics, data, etc. Model builders and computational scientists –They will use the grid directly to realize their models and simulations. They combine knowledge of the real world with theoretic models of the real world to produce simulations or models that can produce a “complete” representation of the observables Application developers –They will use the grid directly to realize applications that require high performance computing or a large number of distributed processors. –They will use the models and simulations as components Service builders –They will build the frameworks that allow application developers to Build grid services that can be used directly or Use services as building blocks to more easily develop more complex services targeting specific application areas.

Ames Research CenterDivision 14 Summary of What User Gains User can focus on solving domain issues of the problem and not on computer science issues of distributed computing

Ames Research CenterDivision 15 Most Grids Are Built on the Globus Toolkit NASA’s Information Power Grid (IPG) is one such example The Globus project involves research and development personnel from –Argonne National Laboratory –University of Southern California's Information Sciences Institute –NASA’s Ames Information Power Grid Team –National Science Foundation PACI (Partnerships for Advanced Computational Infrastructure) programs at National Center for Supercomputing Applications (NCSA) San Diego Supercomputer Center –A number of universities

Ames Research CenterDivision 16 Data Mining on the Grid What is data mining? Why mine on the Grid? The Grid Miner developed for NASA’s Information Power Grid (IPG) A proposed IPG Mining Service

Ames Research CenterDivision 17 What Is Data Mining “Data mining is the process by which information and knowledge are extracted from a potentially large volume of data using techniques that go beyond a simple search though the data.” [NASA Workshop on Issues in the Application of Data Mining to Scientific Data, Oct 1999,

Ames Research CenterDivision 18 Grid Miner Developed as one of the early applications on the IPG –Helped debug the IPG –Provided basis for satisfying one of two major IPG milestones last year Provides basis for what could be an on-going Grid Mining Service

Ames Research CenterDivision 19 Example: Mining for Mesoscale Convective Systems Image shows results from mining SSM/I data

Ames Research CenterDivision 20 Example of Data Being Mined 75 MB for one day of global data - Special Sensor Microwave/Imager (SSM/I). Much higher resolution data exists with significantly higher volume.

Grid Miner Operations Preprocessed Data Preprocessed Data Translated Data Patterns/ Models Patterns/ Models Results Output GIF Images HDF-EOS HDF Raster Images HDF SDS Polygons (ASCII, DXF) SSM/I MSFC Brightness Temp TIFF Images Others... PreprocessingAnalysis Clustering K Means Isodata Maximum Pattern Recognition Bayes Classifier Min. Dist. Classifier Image Analysis Boundary Detection Cooccurrence Matrix Dilation and Erosion Histogram Operations Polygon Circumscript Spatial Filtering Texture Operations Genetic Algorithms Neural Networks Others... Selection and Sampling Subsetting Subsampling Select by Value Coincidence Search Grid Manipulation Grid Creation Bin Aggregate Bin Select Grid Aggregate Grid Select Find Holes Image Processing Cropping Inversion Thresholding Others... Input HDF HDF-EOS GIF PIP-2 SSM/I Pathfinder SSM/I TDR SSM/I NESDIS Lvl 1B SSM/I MSFC Brightness Temp US Rain Landsat ASCII Grass Vectors (ASCII Text) Intergraph Raster Others... Figure thanks to Information and Technology Laboratory at the University of Alabama in Huntsville

Ames Research CenterDivision 22 Why Use a Grid for this Application? NASA has large volume of data stored in its archives. –E.g., In the Earth Science area, the Earth Observing System Data and Information System (EOSDIS) holds large volume of data at multiple archives Data archives are not designed to support user processing Grids, coupled to archives, could provide such a computational capability for users

Ames Research CenterDivision 23 Mining on the Grid Grid Mining Agent Grid Processor Satellite Data Archive X Satellite Data Archive Y Grid Mining Agent Grid Processor Grid Mining Agent Grid Processor

Ames Research CenterDivision 24 Grid Miner Architecture IPG Mining Agent IPG Processor Mining Daemon Control Database IPG Processor IPG Mining Agent IPG Processor Mining Operations Repository IPG Processor Data Archive X Satellite Data Archive Y Mining Confiig Info IPG Processor

Ames Research CenterDivision 25 Proposed Mining on the IPG User accesses a mining portal to –Develop mining plan –Identify data to be mined and check file names into Control Database –Identify nature of resources required to perform mining –Invoke mining system Mining portal stages N mining agents to IPG resources

Ames Research CenterDivision 26 Proposed Mining on the IPG Mining agent –Acquires configuration information from Mining Config Info server –Acquires mining plan from mining portal –Acquires mining operations to support mining plan using just-in- time acquisition –Acquires URLs of data to be mined from Control Database –Transfers data using just-in-time acquisition – Mines data –Sends results to specified IPG site

Ames Research CenterDivision 27 Mining Operator Acquisition Vision is a number of source directories for –Public mining operations contributed by practitioners –For-fee mining operations from a future mining.com –private mining operations available to a particular mining team

Ames Research CenterDivision 28 Starting Point for Grid Miner Grid Miner reused code from object-oriented ADaM data mining system –Developed under NASA grant at the University of Alabama in Huntsville –Implemented in C++ as stand-alone, objected-oriented mining system Runs on NT, IRIX, Linux –Has been used to support research personnel at the Global Hydrology and Climate Center and a few other sites. Object-oriented nature of ADaM provided excellent base for enhancements to transform ADaM into Grid Miner

Ames Research CenterDivision 29 Transforming Stand-Alone Data Miner into Grid Miner Original stand-alone miner had 459 C++ classes. Had to make small modifications to 5 classes and added 3 new classes Grid commands added for –Staging miner agent to remote sites –Moving data to mining processor

Ames Research CenterDivision 30 Staging Data Mining Agent to Remote Processor globusrun -w -r target_processor '&(executable=$(GLOBUSRUN_GASS_U RL)# path_to_agent)(arguments=arg1 arg2 … argN)(minMemory=500)'

Ames Research CenterDivision 31 Moving data to be mined gsincftpget remote_processor local_directory remote_file

Ames Research CenterDivision 32 What Grids Can Do to Support the Earth Science Community? Can couple processing to data and data to processing Can bring data and processing to users Can support services of value to significant portions of the Earth Science Community –Mining service –Subsetting service –Data transformation service -- from one storage format to another

Ames Research CenterDivision 33 What Needs to Happen for this to Become a Reality. Data archives need to be grid-enabled –Connected to the grid –Provide controlled access to data on tertiary storage E.g., by using a system such as the Storage Resource Broker that was developed at the San Diego Super Computer Center Some earlier-adopter scientists need to be found to begin using the grid Grid-enabled tools need to be made available Sites could poor computational and data resources and form Earth Science Grid.

Ames Research CenterDivision 34 SRB is Existing Tool for Grid-Enabled Archive San Diego Super Computer Center’s Storage Resource Broker (SRB). Permits grid-access to data on tertiary storage Supports GSI (Grid Security Infrastructure) Provides Unix-like commands for manipulating and accessing data –Grid Miner uses Sget -A "RESOURCE='srbresource'" pathwithfile destdir Datasets have logical names that are independent of location

Ames Research CenterDivision 35 More SRB SRB will support data replication of a logical dataset located at different physical locations Uses Meta data Catalog (MCAT) for holding data about the data stored in the SRB Supports following storage systems: –UNIX file system –Archival storage systems such as UNITREE HPSS –Large objects managed by various DBMS including DB2 Oracle

Ames Research CenterDivision 36 Grid Funding NASA is putting approximately $7 million per year DOE’s Office of Science is putting at least $7M/yr into Grid software development, deployment of the DOE Science Grid, and several major Grid application integration projects (high energy physics, earth sciences, fusion energy) NSF is putting $10-20M/yr into Grid software development and several major Grid application integration projects – e.g. –National Earthquake Engineering Systems Grid (bring all major US earthquake engineering instruments onto a Grid) –National Virtual Observatory (a Grid application to provide uniform access to all major astronomy datasets) NSF is putting $50M/yr into its new Grid based supercomputer centers (Distributed Terascale Facility) UK eScience Grid is building a UK-wide science Grid ($50M/yr ) European Union Data Grid (high energy physics) $7M/yr, EU GridLab (numerical relativity) $3M/yr, + others

Ames Research CenterDivision 37 Global Grid Forum Where did it come from What is it Why is it important to this community

Ames Research CenterDivision 38 Global Grid Forum History Grew out of series of workshops and meetings –Five Grid Forum workshops held between June 1999 and October 2000 in North America First Workshop held at NASA Ames Research Center –European Grid Forum (eGrid) Two European Grid (eGrid) Workshops held, April 2000 and August 2000 –SC’98 and SC’99 Birds of a Feather meetings –Middleware workshop held at Northwestern University in December 1998 with participation by Grid and Internet experts –Grids'98: Designing, Building, and Using a National-Scale Grid", held in Chicago, July 27-28, 1998, brought together for the first time representatives of the various national Grid efforts.

Ames Research CenterDivision 39 Global Grid Forum Now Represents merger of grid technical communities in North America, Europe and Asia Pacific Meets three times per year, alternating between North America and Europe and soon Asia/Pacific Modeled after IETF (Internet Engineering Task Force, which sets Internet standards. Now 450 people from 35 countries working on Grid technology and standards GGF5 meets from July 2002 in Edinburgh, Scotland, UK

Ames Research CenterDivision 40 Global Grid Forum Supports mechanism for formal review, approval and release of –Best practices guides –Grid standards Organized into two types of groups –Research Groups which coordinate research on future grid needs –Working Groups that are expected to produce best practices documents and standards

Ames Research CenterDivision 41 GGF Working Groups Grid Object Specification (GOS) Grid Notification Framework (GNF) Metacomputing Directory Services (MDS) Grid Security Infrastructure (GSI) Grid Certificate Policy (GCP) Advanced Reservation Scheduling and Resource Management Scheduling Dictionary Scheduler Attributes Grid Monitoring Architecture Network Monitoring JINI NPI OGSI GridFTP

Ames Research CenterDivision 42 GGF Research Groups Relational Database Information Services (RDIS) Grid Protocol Architecture (GPA) Accounting Models (ACCT) Data Replication Persistent Archives Applications & Test beds (APPS) Grid User Services (GUS) Grid Computing Environments (GCE) Advanced Programming Models (APM) Advanced Collaborative Environments

Ames Research CenterDivision 43 Application & Test Beds Research Group “The GGF Applications Research Group seeks to provide a bridge between the wider application community and the developers and directors of grid policies, standards and infrastructures.” [APPS Web Site] This would be one place where the Earth Science Community could inject Earth Science unique requirements into the evolving Grid development efforts.

Ames Research CenterDivision 44 Why is the Global Grid Forum Important to the Earth Science Community It will result in grid standards –It will encourage commercial products since there will be standards which the products can meet –Products that meet accepted standards should be more marketable It provides a forum to get Earth Science-specific requirements interjected into the grid development efforts