Download presentation
Presentation is loading. Please wait.
Published byOsborne Craig Modified over 9 years ago
1
Data- and Compute-Driven Transformation of Modern Science Edward Seidel Assistant Director, Mathematical and Physical Sciences, NSF (Director, Office of Cyberinfrastructure) 1
2
2 Profound Transformation of Science Gravitational Physics Galileo, Newton usher in birth of modern science: c. 1600 Problem: single “particle” (apple) in gravitational field (General 2 body- problem already too hard) Methods Data: notebooks (Kbytes) Theory: driven by data Computation: calculus by hand (1 Flop/s) Collaboration 1 brilliant scientist, 1-2 student
3
3 3 Profound Transformation of Science Collision of Two Black Holes Science Result The “Pair of Pants” Year: 1994 Team size ~ 10 Data produced ~ 50Mbytes Science Result The “Pair of Pants” Year: 1994 Team size ~ 10 Data produced ~ 50Mbytes Science Result The “Pair of Pants” Year: 1972 Team size 1 person (S. Hawking) Computation Flop/s Data produced ~ Kbytes (text, hand- drawn sketch) 400 years later…same!
4
Move to 3D: 1000x more data! 4 3D Collision Science Result Year: 1998 Team size ~ 15 Data produced ~ 50Gbytes 3D Collision Science Result Year: 1998 Team size ~ 15 Data produced ~ 50Gbytes
5
5 Just ahead: Complexity of Universe LHC, Gamma-ray bursts! Gamma-ray bursts! GR now soluble: complex problems in relativistic astrophysics All energy emitted in lifetime of sun bursts out in a few seconds: what are they?! Colliding BH-NS? SN? GR, hydrodynamics, nuclear physics, radiation transport, neutrinos, magnetic fields: globally distributed collab! Scalable algorithms, complex AMR codes, viz, PFlops*week, PB output! LHC: Higgs particle? ~10K scientists, 33+ countries, 25PB Planetary lab for scientific discovery! Remote Instrument
6
6 Grand Challenge Communities Combine it All... Where is it going to go? 6 Same CI useful for black holes, hurricanes
7
Framing the Question Science is Radically Revolutionized by CI Modern science Data- and compute- intensive Integrative Multiscale Collaborations for Complexity Individuals, groups Teams, communities Must Transition NSF CI approach to support Integrative, multiscale 4 centuries of constancy, 4 decades 10 9-12 change! 7 …But such radical change cannot be adequately addressed with (our current) incremental approach! We still think like this… Students take note!
8
Scientific Computing and Imaging Institute, University of Utah Data Crisis: Information Big Bang PCAST Digital Data NSF Experts Study Wired, Nature Storage Networking Industry Association (SNIA) 100 Year Archive Requirements Survey Report “there is a pending crisis in archiving… we have to create long-term methods for preserving information, for making it available for analysis in the future.” 80% respondents: >50 yrs; 68% > 100 yrs Industry
9
9 Explosive Trends in Data Growth Comparative Metagenomics DNA sequencing of entire families of organisms Already hundreds of TB, thousands of users HD Collaborations and Optiportals Multichannel HD, gigapixel visualizations Petascale-Exascale simulation They generate peta-exabytes per simulation! Square Kilometer Array 3000 radio receivers, 1 km 2 area! 19 countries! Possibly beginning in 201X, operational 202X Data: exabyte per week! Analysis: Exaflops!
10
Provenance in Science Provenance is as important as the result Not a new issue Lab notebooks have been used for a long time What is new? Large volumes of data Complex analyses— computational processes Writing notes is no longer an option… GC Communities require open, sharable data, standards, metadata DNA recombination By Lederberg When Observed data Annotation Source: Juliana Freire, U of Utah
11
11 “Data Deluge” Drives Change at NSF Data Issues resonate the most across NSF units! DataNet – $100M investment in Sustainable Archive & Access Partners – development of widely accessible network of interactive data archives; Driven by today’s grand challenges, integrating multiple disciplines INTEROP – Community-led interoperability – interdisciplinary, community approaches to combine and re-use data in ways not envisioned by their creators Data-intensive computing: SDSC Gordon facility NSF Data Policy – The “Data Working Group” (NSF-wide group of Program Directors) working to assure that data are shared within and across disciplines There is a major shift in science towards data- intensive methods. NSF is responding…
12
NSF Vision and National CI Blueprint 12 Track 1 Track 2 CampusCampusCampusCampusCampusCampus CampusCampus CampusCampusCampusCampusCampusCampusCampusCampus DataNetDataNet DataNetDataNet SoftwareSoftware NetsNets DataNetDataNet DataNetDataNet DataNetDataNet Education Crisis: I need all of this to start to solve my problem! Science is becoming unreproducible in this environment. Validation?Provenance? Reproducibility?
13
The Shift Towards Data Implications All science is becoming data-dominated Experiment, computation, theory Totally new methodologies Algorithms, mathematics All disciplines from science and engineering to arts and humanities End-to-end networking becomes critical part of CI ecosystem Campuses, please note! How do we train “data-intensive” scientists? Data policy becomes critical! 13
14
Recent NSF Activities on Data Policy and Implementation 14
15
15 Fundamental points on data and publication policy Publicly funded scientific data and publications should be available, and science benefits There has to be a place to keep data, and a way to access it There needs to be an affordable, sustainable cost model for this 15 Who pays? The NSF? The Institution? What is the cost model? What is reasonable? What data must be made available? Raw data? Peer reviewed? When is it available? 6 months? 1 year? After publication? Where is it placed? Author web site? Library? NSF sites? How long is it made available? How do we enforce it post- award? There is great variability in requirements across science communities: peer review can help guide this process.
16
Changes Coming for Data! Long-standing NSF Policy on Data (Proposal & Award Policies & Procedures Guide) “Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data... created or gathered in the course of work under NSF grants” 16 NSF will soon require a Data Management Plan (DMP), subject to peer review; criterion for award The DMP will be in the form of a 2-page supplementary document to the proposal It will not be possible to submit proposals without such a document Customization by discipline, program necessary
17
Upcoming Implementation of NSF Data Policy Directorate-Specific Issues & Peer Review Many details are implemented/enforced via peer review and Program Officer discretion, including things like embargo period, standards, etc. The challenge at NSF is that “no one size fits all” so each Directorate will be responsible for its own recommendations for DMP content, appropriate institutional repositories, etc. This does not address Open Access as applied 17
18
Electronic Access to Scientific Publications 18
19
Why is this Important? Science requires it Science progress accelerated by making publications available and searchable Results in one community need to easily propagate to another for multidisciplinary complex problem solving Search technologies can be brought to bear Publications need to be associated with rich information: videos of simulations, supporting data, simulation and analysis codes… Equality and Broadening participation Young scientists at smaller universities at a needless disadvantage without it. They may lose journal access. This hurts science and puts talent at risk US Administration focus on transparency and accountability 19
20
Current Activities We have begun serious discussions within NSF on these issues National Science Board Committee on Data started Goals similar to those for Data We have had numerous visits from funding agencies from around the world Primary topic: what is NSF doing on OA? Discussion with various publishers, libraries to explore options Quality of science relies on peer review systems of best journals; need a way to support OA 20
21
On Working with Publishers Quality of science, identification of talented scientists: we rely on the peer review systems of the best journals NSF receives an assurance that the work done on a grant meets a standard Universities use impact factors as part of their tenure and promotion process I believe it is in the interests of science, and hence the public interest, to help journals find a viable OA business model.. Bernard Schutz, Presentation to NSF, May 2009 21
22
Final Remarks Science is becoming collaborative and data dominated We are accelerating efforts to advance NSF in all aspects of data Science requires that data need to be open and accessible; we are working to achieve this All forms of data are important, and must be more tightly connected in the future Collections, software, publications Time is of the essence 22
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.