Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts in Research Edward Seidel Senior Vice President,

Slides:



Advertisements
Similar presentations
21 st Century Science and Education for Global Economic Competition William Y.B. Chang Director, NSF Beijing Office NATIONAL SCIENCE FOUNDATION.
Advertisements

Research Councils ICT Conference Welcome Malcolm Atkinson Director 17 th May 2004.
Supporting Research on Campus - Using Cyberinfrastructure (CI) Public research use of ICT has rapidly increased in the past decade, requiring high performance.
U.S. Department of Energy’s Office of Science Basic Energy Sciences Advisory Committee Dr. Daniel A. Hitchcock October 21, 2003
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
High-Performance Computing
Research CU Boulder Cyberinfrastructure & Data management Thomas Hauser Director Research Computing CU-Boulder
Skolkovo Institute of Science and Technology A new Opportunity for Collaboration in Research Education and Innovation Mats Nordlund Vice President of Research.
The Changing Research Data Paradigm One agency’s response Changes to Implementation of NSF’s Data Sharing Policy NOAA’s second annual Environmental Data.
GENI: Global Environment for Networking Innovations Larry Landweber Senior Advisor NSF:CISE Joint Techs Madison, WI July 17, 2006.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
Science of Science and Innovation Policy (SciSIP) Presentation to: SBE Advisory Committee By: Dr. Kaye Husbands Fealing National Science Foundation November.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
The NSF Cyberinfrastructure for the 21 st Century Program CIF21 Rob Pennington Program Director Office of Cyberinfrastructure National Science Foundation.
The "Earth Cube” Towards a National Data Infrastructure for Earth System Science Presentation at WebEx Meeting July 11, 2011.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation, Integration.
Cyberinfrastructure: Framing the Issues on Your Campus What is it? Why do we care? What do we do about it now? 11 Peter M. Siegel CIO and Vice Provost,
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Data- and Compute-Driven Transformation of Modern Science Edward Seidel Assistant Director, Mathematical and Physical Sciences, NSF (Director, Office of.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
National Center for Supercomputing Applications University of Illinois at Urbana–Champaign 21 st Century Research and Education Major Challenges for Universities.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Presenter: Karla Strieb Assistant Executive Director Transforming Research Libraries June 3, 2010 Supporting E-science: Progress at Research Institutions.
Data Infrastructures Opportunities for the European Scientific Information Space Carlos Morais Pires European Commission Paris, 5 March 2012 "The views.
Designing the Microbial Research Commons: An International Symposium Overview National Academy of Sciences Washington, DC October 8-9, 2009 Cathy H. Wu.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
Partnerships and Broadening Participation Dr. Nathaniel G. Pitts Director, Office of Integrative Activities May 18, 2004 Center.
The Materials Genome Initiative and Materials Innovation Infrastructure Meredith Drosback White House Office of Science and Technology Policy September.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
Students Becoming Scientists in the World: Integrating Research and Education for Sustainable Development Dr. James P. Collins Directorate for the Biological.
13 September 2012 The Libraries’ Role in Research Data Management: A Case Study from the University of Minnesota Meghan Lafferty, Chemistry, Chemical Engineering,
Session Chair: Peter Doorn Director, Data Archiving and Networked Services (DANS), The Netherlands.
Federal R&D Funding: A SIAM Perspective Mel Ciment Senior Advisor SIAM, Washington Office Tel: May 6, 2001.
Physics Steven Gottlieb, NCSA/Indiana University Lattice QCD: focus on one area I understand well. A central aim of calculations using lattice QCD is to.
Transformation of Research and Education in the 21 st Century Edward Seidel Director, Office of Cyberinfrastructure National Science Foundation
Changing Science and Engineering: the impact of HPC Sept 23, 2009 Edward Seidel Assistant Director, Mathematical and Physical Sciences, NSF (Director,
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
Cyberinfrastructure Planning at NSF Deborah L. Crawford Acting Director, Office of Cyberinfrastructure HPC Acquisition Models September 9, 2005.
Research Recommendations for the Broadband Taskforce Agenda November 23, 2009.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
David Mogk Dept. of Earth Sciences Montana State University April 8, 2015 Webinar SAGE/GAGE FACILITIES SUPPORTING BROADER EDUCATIONAL IMPACTS: SOME CONTEXTS.
Hassan A. Karimi Geoinformatics Laboratory School of Information Sciences University of Pittsburgh 3/27/20121.
HPC Centres and Strategies for Advancing Computational Science in Academic Institutions Organisers: Dan Katz – University of Chicago Gabrielle Allen –
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
Computational Science & Engineering meeting national needs Steven F. Ashby SIAG-CSE Chair March 24, 2003.
Three Critical Matters in Big Data Projects for e- Science Kerk F. Kee, Ph.D. Assistant Professor, Chapman University Orange, California
National Strategic Computing Initiative
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Digital Data Collections ARL, CNI, CLIR, and DLF Forum October 28, 2005 Washington DC Chris Greer Program Director National Science Foundation.
Understanding Collaboration Using Social Network Analysis Diana Rhoten Office of Cyberinfrastructure National Science Foundation.
1 Why is Digital Curation Important for Workforce and Economic Development? Alan Blatecky Office of Cyberinfrastructure Symposium on Digital Curation in.
Faculty Councils Brad Whittaker Director, Research Services and Industry Liaison Strategic Research Plan.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
The Global Scene Wouter Los University of Amsterdam The Netherlands.
Connecting Users, Data & Data Repositories Simon J. Goring ORCID: John W. Williams doi: /m9.figshare Distinguished Lecture.
1 Kostas Glinos European Commission - DG INFSO Head of Unit, Géant and e-Infrastructures "The views expressed in this presentation are those of the author.
European Perspective on Distributed Computing Luis C. Busquets Pérez European Commission - DG CONNECT eInfrastructures 17 September 2013.
1 Cyberinfrastructure for the 21 st Century (CIF21) NSF Data Strategy and EarthCube 9 th e-Infrastructure Concertation Meeting Sept 23, 2011 Rob Pennington.
Engineering (Richard D. Braatz and Umberto Ravaioli)
Electron Ion Collider New aspects of EIC experiment instrumentation and computing, as well as their possible impact on and context in society (B) COMPUTING.
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Bird of Feather Session
BCoN Data Integration Workshop, University of Kansas, Feb 13-14, 2018
Presentation transcript:

Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts in Research Edward Seidel Senior Vice President, Research and Innovation Skolkovo Institute of Science and Technology 1

2 Part 1: We are in a period of unprecedented change in Science and Society…the crises & opportunities this creates

3 3 Profound Transformation of Science Collision of Two Black Holes 1972: Hawking. 1 person, no computer 50 KB 1994: 10 people, NCSA Cray Y-MP, 50MB 1998: 15 people, NCSA Origin, 50GB

Community Einstein Toolkit “Einstein Toolkit : open software for astrophysics to enable new science, facilitate interdisciplinary research and use emerging petascale computers and advanced CI.”  Consortium: 92 members, 46 sites, 15 countries  Whole consortium engaged in directions, support, development  Simulation: Luciano Rezzolla, Max Planck Institut für Gravitationsphysik (AEI) 4 Community + software + algorithms + hardware + … Many groups can do this: field explodes! Major triumph of Computational Science---solve EEs!

New Frontiers: Relativistic Matter  Nuclear equations of state  Collab. with astrophysicists  General relativistic magnetohydrodynamics  Some groups have ideal MHD  Radiation Transport (neutrinos/photons)  Expensive and complicated!  Requires opacities/emissivities  Chemical reactions (thermonuclear, chemical)  SN community!  Computation:  Multiphysics!!  GRMHD: petascale problem  Radiation transport beyond this Schnetter et al, PetaScale Computing: Algorithms and Applications, 2007 Zoom in to just this part: post BH formation and evolution of jet Zoom in to just this part: post BH formation and evolution of jet

Schnetter, et al Post BH Formation/Evolution of Jet  Multiphysics  GR, Neutrinos, MHD, Nuclear EOS  Computer Science  10 level AMR, optimized for Blue Waters  Science  200s evolution, analyze it all  Blue Waters  6M sustained = 70days! 6 Multiphyiscs framework needed for fluids in astrophysics and porous media… Rezzolla, et al

7 Theory and Observation of Universe  Gravitational Waves!  Complex problems in relativistic astrophysics  Relativity, hydrodynamics, nuclear physics, radiation, neutrinos, magnetic fields: globally distributed collab!  Observe (PB), compute (PF) signals  Gravity and general relativity are transformed  4 centuries of small science, small data culture  2-3 decades of radical change in both data (factors of 1000 per~5 years) and collaboration LIGO/VIRGO/GEO New era of science after a century! Data- and compute- dominated gravitational wave astronomy!

US Council on Competitiveness  Ping Golf  Moved from workstation to Cray, now make prototype only at last stage of design  Too effective: had to simulate “less effective” design!  Proctor & Gamble  Pringles “flying” off manufacturing line causing significant lost product and revenue. Using CFD codes that Boeing uses, airflow over the Pringle modeled, design so Pringles did not “lift off” 8

9 Part 1a: The Growth of Data Data Tsunami “I’m still here…” But I’m your new baby big brother… With millions of processors…

Going Beyond a Community Transient & Multi-Messenger Astronomy 10  New era: seeing events as they occur  Here now  ALMA, EVLA in radio  Ice Cube neutrinos  On horizon  24-42m optical?  LSST = SDSS (40TB) every night!  SKA = exabytes  Simulations integrate all physics  Data-intensive = compute-intensive Astronomy was passive. No longer! Communities need to share data, software, knowledge, in real time Will require integration across disciplines, end-to- end

Big Data vs The Long Tail of Science  Many “Big Data” projects are “special”  Tend to be highly organized, have singular sources of data, professionally curated, a lot attention paid to them  What about the “Long Tail” (the other 99%)?  Thousands of biologists sequencing communities of organisms  Thousands of chemist and materials scientists developing a “materials genome”  Millions of people “Tweeting”…  Characteristics: Heterogeneous, perhaps hand generated Not curated, reused, served, etc… 11 How do we harness the power of this long tail? News Flash! NYT 6/3/13: Drug side effects discovered by mining web logs: paroxetine + pravastatin = high blood sugar!

12 Grand Challenge Communities Combine it All... Where is it going to go? 12 Same CI useful for black holes, hurricanes

13 Grand Challenge Communities for Complex Problems  Require many disciplines, all scales of collaborations  Individuals, groups, teams, communities  Multiscale Collaborations: Beyond teams  Are dynamic and highly multidisciplinary  Time domain astronomy, emergency forecasting, metagenomincs, materials genome…  Drive sharing technologies and methodologies  Researchers collaborate, work by sharing data. Places requirements on eInfrastructre:  Software, networks, collaborative environments, data, sharing, computing, etc  Scientific culture, reproducibility, access, university structures  “Publications.” What is a modern publication? 13 Social, behavioral and economic sciences will be critical in helping us understand these issues…

Scenarios like this in all fields 14 NEON+GIS

Framing the Challenge: Science and Society Transformed by Data  Modern science  Data- and compute- intensive  Integrative, multiscale  4 centuries of constancy, 4 decades change!  Multi-disciplinary Collaborations  Individuals (Galileo!)  Groups, teams, Grand Challenge Communities  Big Data + Long Tail  Sea of Data  Age of Observation 15 We still think like this… …But such radical change cannot be adequately addressed with (current) incremental approach! Students take note!

Part 2: Crises, Challenges, Opportunities Computing Data Software End-to-end Networks Organizational structures Education No, we are not… Cybe r Instruments & Facilities

17 Five Crises “ CDSE” Community needs to address  Computing Technology  Multicore: processor is new transistor  Programming model, fault tolerance, etc  New models: clouds, grids, GPUs, …  Data, provenance, and visualization  How do we create “data scientists”?  What is an international data infrastructure?  Software treated as e-Infrastructure  Complex applications on coupled compute- data-networked environments, tools needed  Modern apps: lines, many groups contribute, take decades

18 Five Crises  Organization for Multidisciplinary & Computational Science  “Universities must significantly change organizational structures: multidisciplinary & collaborative research are needed [for US] to remain competitive in global science”  “Itself a discipline, computational science advances all science…inadequate/outmoded structures within Federal government and the academy do not effectively support this critical multidisciplinary field”  Education  The CI environment is running away from us!  How do we develop a workforce to work effectively in this world?  How do universities transition?

Scientific Computing and Imaging Institute, University of Utah Data Crisis: Information Big Bang PCAST Digital Data NSF Experts Study Wired, Nature Storage Networking Industry Association (SNIA) 100 Year Archive Requirements Survey Report “there is a pending crisis in archiving… we have to create long-term methods for preserving information, for making it available for analysis in the future.” 80% respondents: >50 yrs; 68% > 100 yrs Industry

The Shift Towards a “Sea of Data” Implications  Science & society are now data-dominated  Experiment, computation, theory  US mobile phone traffic exceeded 1 exabyte!  Classes of data  Collections, observations, experiments, simulations  Software  Publications  Totally new methodologies  Algorithms, mathematics, culture  Data become the medium for  Multidisciplinarity, communication, publication, science, economic development… 20 How do we attribute credit for this new publication form? How are data peer reviewed? What is a publication in the modern data-rich world? What is a business model for OA? Fundamental questions become focused around data: What to curate, how to remove boundaries? How to incentivize sharing? IP?

21 Part 2a: Recommendations

Software ACCI Task Force Reports  Final recommendations presented to the NSF Advisory Committee on Cyberinfrastructure Dec 2010  More than 25 workshops and Birds of a Feather sessions, 1300 people involved  Final reports on-line “Permanent programmatic activities in Computational and Data-Enabled Science & Engineering (CDS&E) should be established within NSF.” Grand Challenges Task Force “NSF should establish processes to collect community requirements and plan long-term software roadmaps.” Software Task Force “ Higher education should adopt criteria for tenure and promotion that reward…the production of digital artifacts of scholarship. Such artifacts include widely used data sets, scholarly services delivered online, and software. ” Campus Bridging Task Force 22 Campus Bridging Data & Viz Grand Challenge HPC Learning

Recommendation of NSF Advisory Committee on Cyberinfrastructure ACCI "The National Science Foundation should create a program in Computational and Data-Enabled Science and Engineering (CDS&E), based in and coordinated by the NSF Office of Cyberinfrastructure. The new program should be collaborative with relevant disciplinary programs in other NSF directorates and offices." 23

24 Part 3: Universities attempt to respond We have to do all this and revolutionize the state/national economy?

S koltech: Example of a 21 st Century University in the Making

Skoltech at a Glance A unique Russian institution in international context – This decade: a community of 200 faculty, 300 post- docs, 1200 graduate students Focused on science, engineering and technology – Addressing problems and issues in IT, Energy, Biomedicine, Space and Nuclear Interdisciplinary by design; no departments – 15 centers organized around complex problems With strong programs in support of innovation and entrepreneurship – Creating a culture of innovation in every student, professor, staff member Important part of the Skolkovo innovation ecosystem Integrated data, compute, instrumentation infrastructure and policy under development for Interdisciplinary research Accelerating discovery Economic development Integrated data, compute, instrumentation infrastructure and policy under development for Interdisciplinary research Accelerating discovery Economic development

27 Part 3a: You can help lead this revolution Kathryn Gray

Modern Research & Education Ecosystem 28 SoftwareSoftware SoftwareSoftware Track 2 CampusCampusCampusCampusCampusCampus CampusCampus CampusCampusCampusCampusCampusCampusCampusCampus DataData DataData DataData DataData XSEDE Education Crisis: I need all of this to start to solve my problem! Blue Waters

The Opportunity (US picture)!  Now have emerging national Integrated, High Performance Research Architecture  Blue Waters and beyond towards exascale: high end Extraordinary science continues lead at cutting edge Traditional and novel large data applications Few places can house, field, or drive such a facility  XSEDE architecture can connect… Campus Bridging: campus to national CI… –Campus Assets: MRI, Instruments, DNA sequencers… –Facilities: Supercomputers, telescopes, accelerators, light sources, NEON … »”More silicon than Steel” –Networks: end-to-end connectivity »Where are those optical network apps? 29

Much to do to build CDSE on this Background: address the “5 Crises”  Education  Many new opportunities and challenges CSE already has its struggles Now data: what is a “data scientist”? CDSE emerges  Data opportunities for education and citizen science  Faculty development, curriculum development  Needed on every campus  Talk to NSF, DOE, EC, your national agencies  Recommendations of ACCI, MPSAC, etc  New programs needed: See NSF CDS&E, CI TraCS, CAREER, “LWD”, etc  You can help make this happen 30

Key Messages  Astounding rate of change of the “Triple Helix” of Research, Education, and Innovation  Computing and Data radically change methods  Culture of collaboration around complex problems  These create many crises and opportunities  From technology to methodology to culture…  Deep integration required for science  Emergence of Computational and Data- enabled Science and Engineering as a discipline and your role!  A key part of the paradigm shift 31

32 & Data