Presentation is loading. Please wait.

Presentation is loading. Please wait.

Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center.

Similar presentations


Presentation on theme: "Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center."— Presentation transcript:

1 Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center wilkinsn@sdsc.edu

2 Overview What are Science Gateways? What is TeraGrid? Why TeraGrid and Gateways? Examples of Success How Does This Help Me?

3 Phenomenal Impact of the Internet on Scientific Research Only 15 years since the release of Mosaic! Very rapid changes in how science is conducted –1988, National Center for Biotechnology Information BLAST server, search results sent by email, still a working portal today –1992 Mosaic web browser developed –1995 “International Protein Data Bank Enhanced by Computer Browser” –2004 TeraGrid project director Rick Stevens recognized growth in scientific portal development and proposed the Science Gateway Program Ensuing explosion of digital information –Need for analysis in a growing number of scientific areas

4 Very Rapid Changes in Web Usability First generation –Static Web pages Second generation –Dynamic, database interfaces, cgi –Lacked the ease of use of desktop applications Third generation –True networked and internetworked applications that enable dynamic two- way, even multi-way, communication and collaboration on the Web. –These new applications will enable remarkable new uses of the Web in the organizational workplace and on the Internet Fourth generation –Web 2.0 –Source: Screen Porch White Paper, The University of Western Ontario (1998)

5 Gateways are a Natural Extension of Internet Developments 3 common types of gateway –Web portal with users in front and services in back –Client server model where application programs running on users' machines (i.e. workstations and desktops) and accesses services –Bridges across multiple grids, allowing communities to utilize both community developed grids and shared grids Continued rapid changes ahead, must be adaptable, gateways can provide some nimbleness

6 Arden Bement Senate Testimony, April 19, 2007 “Virtual environments have the potential to enhance collaboration, education, and experimentation in ways that we are just beginning to explore.” “In every discipline, we need new techniques that can help scientists and engineers uncover fresh knowledge from vast amounts of data generated by sensors, telescopes, satellites, or even the media and the Internet.” Gateways are a terrific example of interfaces that can support transformative science

7 Gateway Idea Resonates with Scientists Capabilities provided by the Web are easy to envision because we use them in every day life Researchers can imagine scientific capabilities provided through a familiar interface Groups resonate with the fact that gateways are designed by communities and provide interfaces understood by those communities –But also provide access to greater capabilities on the back end without the user needing to understand the details of those capabilities –Scientists know they can undertake more complex analyses and that’s all they want to focus on But this seamless access doesn’t come for free. It all hinges on very capable developers

8 Tremendous Opportunities Using the Largest Shared Resources - Challenges too! What’s different when the resource doesn’t belong just to me? –Resource discovery –Accounting –Security –Proposal-based requests for resources (peer-reviewed access) Code scaling and performance numbers Detailed justification of resource request Citations, metrics of success Tremendous benefits at the high end, but even more work for the developers Potential impact on science is huge –Small number of developers can impact thousands of scientists –But need a way to train and fund those developers and provide them with appropriate tools

9 What is the TeraGrid? NSF-funded facility to offer high end compute, data and visualization resources to the nation’s academic researchers 300+ Teraflops Computation 20+ Petabytes Storage Dedicated cross-country network Visualization

10 TeraGrid Resources Available to Academic Researchers at No Cost TeraGrid creates integrated, persistent, and pioneering computational resources that significantly improve our nation’s ability and capacity to gain new insights into our most challenging research questions and societal problems Proposal-based access, researchers can use resources at no cost –Targeted support available as well

11 Implementing Common Gateway Requirements Web Services –GT4 deployment, identification of remaining capabilities –Information services, WebMDS Auditing –Need to retrieve job usage info on production resources –GRAM audit deployed in test mode in September, inclusion in CTSSv4 Community Accounts –Policy finalized, security approaches being tested by RPs –Attribute-based authentication testing Allocations –Changes in allocation procedures, the mechanisms used to evaluate science impact, and models for identity management, authentication and authorization that are more tuned to virtual organizations. Scheduling –Metascheduling RAT –On-demand via SPRUCE framework Outreach –Talks, Schools/workshops (NVO, GISolve), major project demonstrations (LEAD) –SURA, HASTAC, GEON, CI-Channel, SC, Grace Hopper, MSI-CI2, Lariat, Science Workflows and On Demand Computing for Geosciences Workshop Primer –Living document in wiki, provides up-to- date overview and instructions for new gateway developers (“how to make your portal a TeraGrid science gateway”)

12 Gateways are growing in numbers Success in a variety of domains 10 initial projects as part of TG proposal >20 Gateway projects today No limit on how many gateways can use TG resources –Prepare services and documentation so developers can work independently Open Science Grid (OSG) Special PRiority and Urgent Computing Environment (SPRUCE) National Virtual Observatory (NVO) Linked Environments for Atmospheric Discovery (LEAD) Computational Chemistry Grid (GridChem) Computational Science and Engineering Online (CSE-Online) GEON(GEOsciences Network) Network for Earthquake Engineering Simulation (NEES) SCEC Earthworks Project Network for Computational Nanotechnology and nanoHUB GIScience Gateway (GISolve) Biology and Biomedicine Science Gateway Open Life Sciences Gateway The Telescience Project Grid Analysis Environment (GAE) Neutron Science Instrument Gateway TeraGrid Visualization Gateway BIRN Gridblast Bioinformatics Gateway Earth Systems Grid Astrophysical Data Repository (Cornell) Many others interested –SID Grid –HASTAC

13 Mapping Tool Used on Large Data Sets to Spot Brain Disorders Large Deformation Diffeomorphic Metric Mapping (LDDMM), developed at the Center for Imaging Science at Johns Hopkins Computes a mathematical description of which shapes are similar and different by computing metric distances in the space of anatomical images Source: SDSC Headlines, Paul Tooby "Using TeraGrid resources at multiple sites, this research has been able to successfully distinguish diagnostic categories such as Alzheimer's and Semantic Dementia from control subjects," said Anthony Kolasny, JHU. "This can potentially lead to a powerful new cyberinfrastructure tool clinicians can use to make earlier, more accurate diagnoses."

14 BIRN uses SSHFS to mount TeraGrid filesystems locally 220TB through CIS portal using autofs, samba, smbwebclient. CIS has 87TB of local storage. /cis/net lists network drives. Source: Anthony Kolasny, Johns Hopkins University

15 What is SSHFS and how can it help? SSHFS allows you to mount data through an ssh connection. –http://fuse.sourceforge.net/sshfs.htmlhttp://fuse.sourceforge.net/sshfs.html –http://wikipedia.org/wiki/SSH_Filesystem Simple command line –sshfs remoteuser@remotehost:/path/to/remote_dir local_dirremoteuser@remotehost Performance is as fast as your ssh connection. Performance tuning possible. Allows you to use local applications on remote data. –using Paraview to look at data processed on the TeraGrid and stored on the GPFS-WAN. Directly accessing the remote file. Your changes are seen by everyone. Source: Anthony Kolasny, Johns Hopkins University

16 TeraGrid Life Science Gateway Application services for bio-informaticians Ability for end-users to apply the large scale resources of the TeraGrid to their problems, while leveraging local resources, Featured apps –InterProScan, version 4.2 –InterProScan Data version 12.0 –hmmr, version 2.3.2 –Blastall (from InterProScan) version 2.2.6 Plans to engage Bioinformatics Research Centers (BRC) –Eight BRCs sponsored by the National Institute of Allergy and Infectious Disease (NIAID) –Funded to display sequencing and annotation data, comparative analysis, genome polymorphisms, gene expression, proteomics, host/pathogen interactions and pathways for the NIAID list of Category A-C priority pathogens and other pathogens causing emerging and re-emerging diseases.

17 TeraGrid Bioportal Access to over 140 computational tools and many biological data sets Collaborative workspace, simplified access to diverse set of tools Database searching, alignment and phylogeny, pattern searching, DNA/RNA analysis, and protein analysis EMBOSS (European Molecular Biology Open Software Suite), GLIMMER (Gene Locator and Interpolated Markov Modeler), HMMER (Hidden Markov Modeler), the NCBI (National Center for Biotechnology Information) toolkit and PHYLIP (PHYLogeny Inference Package). Standard databases include NCBI Aggregate, PDB, Prints, RepBase, UniProt, PFam, ProSite, and TransFac

18 GEON Developing cyberinfrastructure in support of an environment for integrative geoscience research IT advances can significantly impact how geoscientists conduct their daily research activities –Web/grid services, TeraGrid –Semantic data integration –Information management and ontologies Tremendous opportunities to conduct novel and efficient research in many areas of the geosciences SYNSEIS – SYNthetic SEISmogram generation tool –Helps seismologists calculate synthetic 3D regional seismic waveforms –Accesses distributed data centers and large computational clusters –Users only need to have access to the Internet and a browser. The entire system is web-based and is accessible from the GEONgrid portal web page.GEONgrid portal web page

19 GEON: LiDAR (Light Distance And Ranging) data Capable of generating digital elevation models (DEMs) more than an order of magnitude more accurate than those currently available Opportunity for geologists to study the processes the shape the earth’s surface at resolutions not previously possible. Distribution, interpolation and analysis of large LiDAR datasets, which frequently exceed a billion data-points, present significant computational challenges. GEON tools begin with a user-defined subset of data and ends with download and visualization of interpolated surfaces and derived products.

20 Linked Environments for Atmospheric Discovery (LEAD) Providing tools that are needed to make accurate predictions of tornados and hurricanes Meteorological data Forecast models Analysis and visualization tools Data exploration and Grid workflow

21 LEAD Inspires Students “Dr. Sikora:Attached is a display of 2-m T and wind depicting the WRF's interpretation of the coastal front on 14 February 2007. It's interesting that I found an example using IDV that parallels our discussion of mesoscale boundaries in class. It illustrates very nicely the transition to a coastal low and the strong baroclinic zone with a location very similar to Markowski's depiction. I created this image in IDV after running a 5-km WRF run (initialized with NAM output) via the LEAD Portal. This simple 1-level plot is just a precursor of the many capabilities IDV will eventually offer to visualize high-res WRF output. Enjoy!” Eric (email, March 2007)

22 NanoHub Explosive User Growth Nanohub attracts thousands of users Over 2M hits in last month In past 12 months –Over 21,000 users –Almost 175,000 simulation runs Very full-featured –Simulation tools –Research proceedings –Curricula content –Collaboration spaces Nanohub is used to complete coursework by undergraduate and graduate students in dozens of courses at 10 universities.

23 GridChem - a desktop application gateway Computational Chemistry Grid (CCG) science gateway GridChem has been using TeraGrid in production since April 2006 Currently services over 100 users and has delivered hundreds of thousands of CPU hours Many paper publications resulting from GridChem use

24 CReSIS (Center for Remote Sensing of Ice Sheets) Awarded CI-TEAM funding to build a Polar Gateway –International Polar Year 2007- 2008 CReSISGrid –Build a TeraGrid Science Gateway –Provide broad-based educational and training activity in Cyberinfrastructure for remote sensing and ice sheet dynamics –MSI impact through leadership of Linda Hayden, Elizabeth City State University

25 Tremendous Potential for Gateways In only 15 years, the Web has fundamentally changed human communication Science Gateways can leverage this amazingly powerful tool to: –transform the way scientists collaborate tackle the toughest problems independent of location –impact the amount of science that can result from each project –influence the public’s perception of science High end resources can have a profound impact The future is very exciting! –Web 2.0 –Application Hosting –Gateway-in-a-box

26 Would development of a gateway help your research? Researchers using defined sets of tools in different ways –Same executables, different input –Datasets –Workflow creation Common data formats Large shared datasets gateways@teragrid.org mailing listgateways@teragrid.org –Email majordomo@teragrid.orgmajordomo@teragrid.org – in body Biweekly telecons to get advice from others www.teragrid.org –Details about current gateways –Materials from June full day tutorial at TG07

27 Thank you for your attention Any questions? Nancy Wilkins-Diehr wilkinsn@sdsc.edu


Download ppt "Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center."

Similar presentations


Ads by Google