The NSF Cyberinfrastructure for the 21 st Century Program CIF21 Rob Pennington Program Director Office of Cyberinfrastructure National Science Foundation 1
The Shift Towards a “Sea of Data” Implications All science is becoming data-dominated Experiment, computation, theory Fourth paradigm Classes of data Collections, observations, experiments, simulations Software Publications Totally new methodologies Algorithms, mathematics, culture Data become the medium for Multidisciplinarity, communication, publication…science 2 Fundamental questions become focused around data: How to remove boundaries? How to incentivize sharing? How do we attribute credit for this new publication form? How are data peer reviewed? What is a publication in the modern data-rich world?
Scientific Data Challenges 3 Bytes per day Genomics LHC TeraGrid, Blue Waters Square Kilometer Array Genomics LHC Climate, Environment LSST Exa Bytes Peta Bytes Tera Bytes Giga Bytes Climate, Environment Volume Useful Lifetime Distribution Data Access Many smaller datasets… DataNet
4 Software Analytic Tools Compute, Modeling Communities Expertise, research Networks Sea of Data CIF21 Science, innovation, discovery, economic competitiveness Grand Challenges EarthCube, Understanding the Phenome, Clean Energy, Climate prediction, Social networking, Complex networks, Health records, cybersecurity, Matter-by-design, disaster recovery, etc Multi-disciplinary & multi-scale integration CIF21 and Transforming Research
Discovery Collaboration Education NSF CIF21 Major Areas Organizations Universities, schools Government labs, agencies Research and Medical Centers Libraries, Museums Virtual Organizations Communities Expertise Research and Scholarship Education Learning and Workforce Development Interoperability and operations Cyberscience Networking Campus, national, international networks Research and experimental networks End-to-end throughput Cybersecurity Computational Resources Supercomputers Clouds, Grids, Clusters Visualization Compute services Data Centers Data Databases, Data repositories Collections and Libraries Data Access; storage, navigation management, mining tools, curation, privacy Scientific Instruments Large Facilities, MREFCs,,telescopes Colliders, shake Tables Sensor Arrays - Ocean, environment, weather, buildings, climate. etc Software Applications, middleware Software development and support Cybersecurity: access, authorization, authentication Advanced Computational Infrastructure Data Infrastructure Program
Broad Principles to Lead CIF21 Builds national infrastructure for S&E Leverages common methods, approaches, and applications – focus on interoperability Catalyzes other CI investments across NSF Provides focus and is a vehicle for coordinating efforts and programs Based upon a shared governance model involving all parts of NSF Managed as a coherent program by OCI Spiral development methodology 6
Evolution of CIF21 and NSF Data Programs 7 ACCI Task Force NSB DataNet Awards Community Input NSF CIF21 Data Programs On-going input Science & Engineering Research + Cyberinfrastructure
Data Related Context National Science and Technology Council (NSTC) comments-access-federally-funded-scientific-research- results comments-access-federally-funded-scientific-research- results Networking and Information Technology Research and Development (NITRD) National Science Board Data Policies Task Force Advisory Committee for Cyberinfrastructure (ACCI) 8
NSTC RFIs for Public Comment - Context Two Requests for Information (RFIs) – Nov 2011 Public Access to Digital Data Resulting from Federally Funded Scientific Research Preservation, Discovery and Access Standards for Interoperability, Re-Use and Re-Purposing RFI for Scholarly Publications information-public-access-digital-data-and-scientific- publications Comment period closed on 12 Jan 2012 Digital Data: 118 responses Scholarly Publications: 377 responses Individual and institutional responses 9
NSB Data Policy Task Force - Context Dec 2011: NSB Recommendations #1: Provide leadership … in the development and implementation of digital research data policies... #2: … require grantees to make both the data and the methods and techniques used in the creation and analysis of the data accessible … Data should be shared using persistent electronic identifiers … #3: Continue to expand the support of computational and data- enabled science and engineering … #4: Convene a panel.. to explore and develop a range of viable long-term business models… #5: Further the expansion of sustainable data management, including preservation and curation of pre-existing and newly generated long-lived data … 10
NSF Advisory Committee for Cyberinfrastructure (ACCI) Task Force - Context Grand Challenges Campus Bridging Data and Viz Cyberlearning HPC HIGH P ERFORMANCE COMPUTING Software Grand Challenges, HPC, Data/Viz, Software, Campus Bridging, Cyberlearning More than 25 workshops and Birds of a Feather sessions and more than 1300 people involved Final reports: orces/ 11
ACCI Data Task Force Recommendations Recognize data infrastructure and services as essential research assets fundamental to today’s science and as long-term investments in national prosperity Create new citation models in which data and software tool providers are credited with their data contributions Develop and publish realistic cost models to underpin institutional/national business plans for research repositories/data services Identify and share best-practices for the critical areas of data management 12
CIF21 and Data Enabled Science Provide critical tools and services for data mining, integration, analysis, modeling and visualization. Overcome barriers to scaling, synthesis, and interoperability to promote effective use of large scale, shared data resources. Strategic investments that concentrate tools, resources and expertise in support of compelling grand challenge science questions. 13
Data Infrastructure: A Multi-tiered and Multi-Disciplinary Landscape 14 Observational Communities Modeling and Simulation Communities Population, Climate, Environment Communities Data Content Data Storage Data-enabled Science DataNet supported
CIF21: Data-Enabled Science Data-intensive Science Program (knowledge) Intensive disciplinary efforts, multi-disciplinary discovery and innovation Data Analysis and Tools Program (information) Data mining, manipulation, modeling, visualization, decision-making systems Data Services Program (data) Provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline 15 Dumped On by Data: Scientists Say a Deluge Is Drowning Research
Data Curation Sustainable, community-based networks for management of critical scientific data resources in a life-cycle context. Overcome challenges of culture change, policy development and implementation, sustainable operations, quality and usability control. Strategic awards that address heterogeneity in formats, complexity, semantics of data collections that are valued by science communities of significant breadth. Operate as a network of data services that promote interoperability, multidisciplinarity, and scalability. 16
Data Storage National storage infrastructure for scientific data Accommodate scale and heterogeneity through robust, open, and broadly accepted standards Business model implemented with governmental, academic, non profit, and commercial stakeholders Make strategic investments that: Leverage existing resources in XSEDE, commercial clouds, federal data centers Meet growing capacity needs at optimum cost Provide coordinating and integrative functions for integrity, access control, availability, persistence Catalyze a national data infrastructure 17
Cross Cutting Challenges Balancing Research into Next Generation infrastructure with operation & maintenance of current capacity Sustainability through technical design, development of business models, and integration with the research cycle Integration Vertical – Linking low-level bit storage infrastructure to data collections, and to applications Horizontal– Achieving connectivity and interoperability between activi ties that vary in scale, disciplinarity, and funding source 18
Summary CIF21 is focused on effective ways to approach and respond to the challenges Critical concepts and goals Realistic and innovative Spiral process with strong, on-going feedback Structure for longevity Scalable open inclusive governance Long term business models International collaborations and programs 19