Data, Knowledge, Gateways Science in the Digital Age: The Science Gateways Institute and NSF Software Investments Nancy Wilkins-Diehr San Diego Supercomputer.

Slides:



Advertisements
Similar presentations
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advertisements

Supporting Research on Campus - Using Cyberinfrastructure (CI) Public research use of ICT has rapidly increased in the past decade, requiring high performance.
NG-CHC Northern Gulf Coastal Hazards Collaboratory Simulation Experiment Integration Sandra Harper 1, Manil Maskey 1, Sara Graves 1, Sabin Basyal 1, Jian.
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
1 Performance Assessment An NSF Perspective MJ Suiter Budget, Finance and Awards NSF.
SAN DIEGO SUPERCOMPUTER CENTER The Integration of 2 Science Gateways: CyberGIS + OpenTopography Choonhan Youn, Nancy Wilkins-Diehr, SDSC Christopher Crosby,
Integrating Educational Technology into the Curriculum
GENI: Global Environment for Networking Innovations Larry Landweber Senior Advisor NSF:CISE Joint Techs Madison, WI July 17, 2006.
EInfrastructures (Internet and Grids) US Resource Centers Perspective: implementation and execution challenges Alan Blatecky Executive Director SDSC.
Science Gateways Marlon Pierce Science Gateway Group Indiana University.
Tom Sheridan IT Director Gas Technology Institute (GTI)
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Open Library Environment Designing technology for the way libraries really work November 19, 2008 ~ ASERL, Atlanta Lynne O’Brien Director, Academic Technology.
The "Earth Cube” Towards a National Data Infrastructure for Earth System Science Presentation at WebEx Meeting July 11, 2011.
TC2-Computer Literacy Mr. Sencer February 4, 2010.
Social and behavioral scientists building cyberinfrastructure David W. Lightfoot Assistant Director, National Science Foundation Social, Behavior & Economic.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
National Science Foundation: Transforming Undergraduate Education in Science, Technology, Engineering, and Mathematics (TUES)
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Information and Communication Technologies in the field of general education in Armenia NATIONAL CENTER OF EDUCATIONAL TECHNOLOGIES.
Annual SERC Research Review - Student Presentation, October 5-6, Extending Model Based System Engineering to Utilize 3D Virtual Environments Peter.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
US NITRD LSN-MAGIC Coordinating Team – Organization and Goals Richard Carlson NGNS Program Manager, Research Division, Office of Advanced Scientific Computing.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
TeraGrid Science Gateways Nancy Wilkins-Diehr Area Director for Science Gateways TeraGrid 09 Education Program, June 22, 2009.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Unidata Policy Committee Meeting Bernard M. Grant, Assistant Program Coordinator for the Atmospheric and Geospace Sciences Division May 2012 NSF.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
ISpheres Project. Project Overview iSpheresCore iSpheresImage Demonstration References.
Updates from EOSDIS -- as they relate to LANCE Kevin Murphy LANCE UWG, 23rd September
SAN DIEGO SUPERCOMPUTER CENTER NUCRI Advisory Board Meeting November 9, 2006 Science Gateways on the TeraGrid Nancy Wilkins-Diehr TeraGrid Area Director.
Transformation of Research and Education in the 21 st Century Edward Seidel Director, Office of Cyberinfrastructure National Science Foundation
Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
Cyberinfrastructure Planning at NSF Deborah L. Crawford Acting Director, Office of Cyberinfrastructure HPC Acquisition Models September 9, 2005.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Top Issues Facing Information Technology at UAB Sheila M. Sanders UAB Vice President Information Technology February 8, 2007.
NanoHUB.org and HUBzero™ Platform for Reproducible Computational Experiments Michael McLennan Director and Chief Architect, Hub Technology Group and George.
Russ Hobby Program Manager Internet2 Cyberinfrastructure Architect UC Davis.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
Commodity Grid Kits Gregor von Laszewski (ANL), Keith Jackson (LBL) Many state-of-the-art scientific applications, such as climate modeling, astrophysics,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
HPC Centres and Strategies for Advancing Computational Science in Academic Institutions Organisers: Dan Katz – University of Chicago Gabrielle Allen –
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
2005 GRIDS Community Workshop1 Learning From Cyberinfrastructure Initiatives Grid Research Integration Development & Support
Digital Data Collections ARL, CNI, CLIR, and DLF Forum October 28, 2005 Washington DC Chris Greer Program Director National Science Foundation.
The Claromentis Digital Workplace An Introduction
May 2010 GGIM, New York City The National System for Coordination of Territorial Information SNIT NSDI of Chile.
Science Gateways What are they and why are they having such a tremendous impact on science? Nancy Wilkins-Diehr
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
TeraGrid’s Process for Meeting User Needs. Jay Boisseau, Texas Advanced Computing Center Dennis Gannon, Indiana University Ralph Roskies, University of.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
Accessing the VI-SEEM infrastructure
Campus Cyberinfrastructure
MATLAB Distributed, and Other Toolboxes
Presentation transcript:

Data, Knowledge, Gateways Science in the Digital Age: The Science Gateways Institute and NSF Software Investments Nancy Wilkins-Diehr San Diego Supercomputer Center

Data, Knowledge, Insight Data sharing and analysis challenges common themes among speakers at last month’s XSEDE conference Kaitlin Thaney – Director, Mozilla Science Lab “Science is really ripe for disruption. A lot of the practices are really rooted in their analog beginnings.” The web has fundamentally transformed how we interrogate, how we interact with content, how we discover information, our work environments, our agility. Mozilla helping to use the power of the open web to transform science’s future Reuse of output (more powerful than citations) Ability to link to code and output from a journal article so results can be reproduced Data sharing – the low hanging fruit has been picked – need to share across disciplines to solve the really tough problems Related conundrum

Terrence Sejnowski – UCSD, Salk – Member of advisory committee for BRAIN “With 100 billion neurons connected by a quadrillion synapses, it’s like the world’s most powerful supercomputer on steroids. To top it all off, it runs on only 20 watts of power, about as much as the light in your refrigerator” If we could record data from every neuron in a circuit responsible for behavior, we could understand the algorithms the brain uses Need fundamentally new technologies and transformative tools to do this – In electron microscopy, optogenetics, image segmentation via machine learning, computational geometry, crowd sourcing President Obama’s $110M BRAIN initiative Brain Research through Application of Innovative Neurotechnologies

John Greally – Albert Einstein College of Medicine DNA sequencing ( ) – 2x10**3 base pairs per experiment DNA sequencing ( ) – 4x10**5 base pairs per experiment Today – 5x10**11 base pairs per experiment Increased use of sequencing – $3600 to sequence a human genome at the NY Genome Center – Geneticists, physicians, molecular biologists rely on others for insights due to lack of programming skills Disruptive Genomics Technologies Einstein Genome Gateway – Automated workflows, metadata capture, vis tools – Running for 4 years, 350 registered users on 4 continents Challenges now downstream data analysis – Data must sit where computation is Community, open source approach – System of linked components

Gateways: A natural result of the impact of the internet on worldwide communication and information retrieval Implications on the conduct of science are still evolving – 1980’s, Early gateways, National Center for Biotechnology Information BLAST server, search results sent by , still a working portal today – 1989 World Wide Web developed at CERN – 1992 Mosaic web browser developed – 1995 “International Protein Data Bank Enhanced by Computer Browser” – 2004 TeraGrid project director Rick Stevens recognized growth in scientific portal development and proposed the Science Gateway Program – Today, Web 3.0 and programmatic exchange of data between web pages Simultaneous explosion of digital information – Growing analysis needs in many, many scientific areas – Sensors, telescopes, satellites, digital images, video, genome sequencers – #1 machine on Top500 today over 10,000x more powerful than all combined entries on the first list in 1993 Only 20 years since the release of Mosaic !

Interfaces to supercomputers haven’t changed much in the last 20 years vt100 in the 1980s and a login window on the Ranger system at TACC today My experiences Science Gateways and high performance computing

Why are gateways worth the effort? Increasing range of expertise needed to tackle the most challenging scientific problems – How many details do you want each individual scientist to need to know? PBS, RSL, Condor Coupling multi-scale codes Assembling data from multiple sources Collaboration frameworks #! /bin/sh #PBS -q dque #PBS -l nodes=1:ppn=2 #PBS -l walltime=00:02:00 #PBS -o pbs.out #PBS -e pbs.err #PBS -V cd /users/wilkinsn/tutorial/exercise_3../bin/mcell nmj_recon.main.mdl +( &(resourceManagerContact="tg- login1.sdsc.teragrid.org/jobmanager- pbs") (executable="/users/birnbaum/tutorial/bin/ mcell") (arguments=nmj_recon.main.mdl) (count=128) (hostCount=10) (maxtime=2) (directory="/users/birnbaum/tutorial/exerci se_3") (stdout="/users/birnbaum/tutorial/exercise _3/globus.out") (stderr="/users/birnbaum/tutorial/exercise _3/globus.err") ) ======= # Full path to executable executable=/users/wilkinsn/tutor ial/bin/mcell # Working directory, where Condor-G will write # its output and error files on the local machine. initialdir=/users/wilkinsn/tutorial/ exercise_3 # To set the working directory of the remote job, we # specify it in this globus RSL, which will be appended # to the RSL that Condor-G generates globusrsl=(directory='/users/wilk insn/tutorial/exercise_3') # Arguments to pass to executable. arguments=nmj_recon.main.md l # Condor-G can stage the executable transfer_executable=false # Specify the globus resource to execute the job globusscheduler=tg- login1.sdsc.teragrid.org/jobman ager-pbs # Condor has multiple universes, but Condor-G always uses globus universe=globus # Files to receive sdout and stderr. output=condor.out error=condor.err # Specify the number of copies of the job to submit to the condor queue. queue 1

Did you know? NSF supercomputers can be accessed through user- designed web interfaces as well as from the command line In % of XSEDE users came through gateways

Today, there are over 25 gateways using XSEDE

Cyberinfrastructure for Phylogenetic Research (CIPRES) Most popular science gateway in TeraGrid – ~25% of all XSEDE users In use on 6 continents Cited in major journals (Cell, Nature, PNAS) Used at major research institutions (Stanford, Harvard, Yale) Used by 57 researchers for curriculum delivery Used in 80% of EPSCoR states Recently used by a 15-year-old high school student who won the Massachusetts state science fair with no support from gateway staff 10

Arman Bilge, Lexington High School wins MA state science fair using CIPRES

Application of high-end cyberinfrastructure to GIS Influence on multiple domains Improved decision support Spatial joins, layers of multiple datasets at different resolutions Goal is core set of composable, interoperable, manageable, and reusable software elements Collaborative geospatial problem solving environment CyberGIS Software Integration for Sustained Geospatial Innovation

Samples from researchers all over the world – Some (Germany, Australia) have their own ultracentrifuges and use only the analysis capabilities, others send samples to UT to spin Spin the samples at high speeds, learn about macromolecule properties Monte Carlo simulations Observations are electronically digitized and stored for further mathematical analysis Analytical Ultracentrifugation: Emerging computational tool for the study of proteins 13 The Center for Analytical Ultracentrifugation of Macromolecular Assemblies, UT Health Sciences, Borries Demeler, PI

Ultrascan provides a comprehensive data analysis environment Management of analytical ultracentrifugation data for single users or entire facilities Support for storage, editing, sharing and analysis of data – HPC facilities used for 2-D spectrum analysis and genetic algorithm analysis XSEDE Technische University of Munich Juelich Supercomputing Center Portable graphical user interface MySQL database backend for data management Over 30 active institutions

Gateways in the marketplace Kids control telescopes and share images “In seconds my computer screen was transformed into a live telescopic view” – “Slooh's users include newbies and professional astronomers in 70 countries” Observatories in the Canary Islands and Chile, Australia coming soon 5000 images/month since 2003 Increases public support for investment in these facilities

Science is all about connections – Instruments, sensor networks, HPC facilities, campus laboratories, visualization facilities, data stores – Connections are often made through software A critical, but often overlooked component NSF vision for cyberinfrastructure in the 21st century Software is critical to today’s scientific advances

1.Multi-level, long term support (individual, team, institute) 2.Responsibility for verification, validation, reproducibility 3.Consistent policy on open source 4.Collaborations across divisions, agencies and industry 5.Use of ACCI to obtain community input on priorities NSF CI Advisory Committee commissions 6 task forces (2009) Software task force recommends to NSF:

Scientific Software Elements (SSE) – Small groups create software that advances one or more area Scientific Software Integration (SSI) – Larger interdisciplinary teams, software frameworks Scientific Software Innovation Institutes (S2I2) Software vision implemented in 2010 Software Infrastructure for Sustained Innovation (SI2) program

Institutes: Long term hubs of excellence Serve a research community of substantial size and disciplinary breadth Expertise, processes, architectures, resources and implementation mechanisms to transform research practices and productivity Support, outreach, workforce development, proactive approach to diversity Pathways to community involvement

Simultaneous NSF study identifies limitations to short-lived science portals or gateways Characteristics of short funding cycles – Build exciting prototypes with input from scientists – Work with early adopters to extend capabilities – Tools are publicized, more scientists interested – Funding ends – Scientists who invested their time to use new tools are disillusioned Less likely to try something new again – Start again on new short-term project Need to break this cycle and fund for long-term success Science Gateway Institute conceptualization award in 2012

After reading stacks of reports, we wanted to capture our findings in a memorable way science-gateways-to-future-success/reports/

science gateway /sī′ əns gāt′ wā′/ n. 1.an online community space for science and engineering research and education. 2.a Web-based resource for accessing data, software, computing services, and equipment specific to the needs of a science or engineering discipline. We are building an institute to serve you—and others like you—with resources, services, experts, and ideas for creating and sustaining science gateways. Sign up to join the conversation: Are you building websites that serve your science discipline? Do you wish you could connect with and learn from others who are doing the same thing?

Millions of dollars are spent on gateways, but developers face several challenges: They often work in isolation even though development can be quite similar across domain areas. They bridge cyberinfrastructure—locally, campus-wide, nationally, and sometimes internationally. They need foundational building blocks so they can focus on higher-level, grand-challenge functionality. They struggle to secure sustainable funding because gateways span the worlds of research and infrastructure. Through Summer 2014, we will be engaging interested members of the science and engineering community in a planning process for a Science Gateway Institute (SGW-I). The goal of the institute would be to provide coordinating activities across the National Science Foundation, offering several services and resources to support the gateway development community: An incubator service offering consultation and documentation about business planning and software development. An extended support team to build gateways and share their expertise. A forum to connect members of the development community. A modular, layered framework that supports community contributions and allows developers to choose components. Workforce development to help train the next generation for careers in this cross-disciplinary area. Sharing expertise about technologies and strategies would allow developers to concentrate on the novel, challenging, and cutting-edge development needed by their specific user communities. First, we want to hear from you. If you would like to participate in this planning process or stay informed about our progress, please contact us.

Business plan development and review Development environment, consulting, documentation and software recommendations Software repositories Software engineering facilities Software assessment services – like Open Source Software Advisory Service, Apache assessment service, Software Sustainability Institute (UK) Build-and-test facilities Hosting service Offering gateways expertise in the following areas: – Usability assessment – Licensing – Sustainability – Project management – Security Incubator Service Assist with the entire lifecycle of a gateway:

Gateway-building Support Institute staff assigned to a project for months, up to a year – Assist with gateway development or implementation of advanced features Workflows, fault tolerance, sensor feeds, HPC simulations – Teach research teams what it takes to build, enhance, operate, and maintain gateways after support ends – Peer-reviewed request process open to all

Gateway Forum Gathering place for scientific web developers across NSF directorates, agencies, and international boundaries Social forums, white papers, blogs, testimonials and user stories Annual conference Broad and engaging symposium series Gateway training program – Synchronous and asynchronous, video tutorials – Best practices, case studies Showcase of successful projects Environment that enables continuous community feedback

Gateway Framework Modular, layered approach – Supports community contributions – Grocery store approach allows developers to pick and choose the components they need Tiered architecture 1.Value-added services Publication channel for delivering content to a wider audience Information repositories for good design practices Information/code samples for best practices in user-interface and user- experience design 2.Core web framework which includes hosted site creation and content management 3.Platform API to provide a cohesive set of RESTful web services upon which the previous two layers rely 4.Systems layer where the hardware and low-level middleware reside Clouds and cloud services, HPC systems, grid middleware, data warehouses, databases, instrumentation, and distributed data stores

Figure 1. High-level architecture of software offerings and value-added services provided by the institute.

Workforce Development Terrific opportunities for students and IT professionals – Much science gateway development currently done by campus IT Gateway building training – Web development is a natural interest area for students Very visual, see results of programming instantly – Builds cross-disciplinary communication skills Talk to scientists, construct a gateway that meets their needs – Utilize existing programming opportunities such as Google Summer of Code Opportunities to proactively address diversity

Next steps Final focus group session mid-October Larger survey due out in January Submit S2I2 implementation phase proposal when that call comes out Very interested in a collaborative organization that involves many gateway perspectives More info at Thank you!