Download presentation
Presentation is loading. Please wait.
Published byJoan Lilian Barker Modified over 9 years ago
1
Fran Berman National and International Efforts in Research Data Access and Sharing Dr. Francine Berman Chair, Research Data Alliance / US Edward P. Hamilton Distinguished Professor in Computer Science, RPI
2
Fran Berman Research Data Driving Solutions to Complex Scientific and Societal Challenges Who is most at risk to contract asthma? How can we increase wheat yields? How accurate is the Standard Model of Physics? Image: Lucas Taylor How can we best address energy needs and sustain the environment ? Image: Ceinturion, Wikipedia
3
Fran Berman Data Infrastructure Needed to Explore Solutions Data Use and Re-use Data Discovery and Data Sharing Research Dissemination and Reproducibility Data Access (now) and Preservation (later) Data discoverability tools Data access via portals, science gateways, etc. Database and data collection systems Data services to support use and re-use Data analysis algorithms Data-driven models and simulations Data visualization tools Semantic frameworks Data management systems Data storage …
4
Fran Berman Social, Organizational, and Human Infrastructure Equally Important Policy Sustainable Economics Common Standards Community Practice Social and Organizational Infrastructure Human Infrastructure / Workforce Data-focused Curriculum and Training Data Scientists McKinsey Global Institute 2011 Report, Traffic Image: Mike Gonzalez
5
Fran Berman Today’s Presentation: Emerging Efforts in the Development of Effective Research Data Infrastructure Global Data Infrastructure How do we accelerate open access data sharing and exchange? National Data Infrastructure How do we support stewardship and preservation of publicly accessible research data?
6
Fran Berman Data-Sharing Driving Discovery Across Sectors and Communities
7
Fran Berman World-wide Efforts Focusing on Infrastructure to Support Research Data Sharing, Access, Use Science, Humanities, Arts Communities E-Infrastructure professionals, data analysts, data center staff, … Data Scientists Libraries, Archives, Repositories, Museums
8
Fran Berman Research Data Alliance Created to Accelerate Development of Research Data Sharing Infrastructure Worldwide RDA is an emerging, global community- driven organization created to accelerate the development of research data- sharing infrastructure world-wide. RDA community efforts focus on building social, organizational and technical infrastructure to reduce barriers to data sharing and exchange accelerate the development of coordinated global data infrastructure
9
Fran Berman RDA Approach: CREATE ADOPT USE RDA Members come together as Working Groups – 12-18 month efforts to build, adopt, and use specific pieces of infrastructure Interest Groups – longer-lived discussion forums that spawn Working Groups as specific pieces of needed infrastructure are identified. Working Group efforts focus on the development and use of data sharing infrastructure Code, policy, infrastructure, standards, or best practices that are adopted and used by communities to enable data sharing “Harvestable” efforts for which 12-18 months of work can eliminate a roadblock Efforts that have substantive applicability to groups within the data community, but may not apply to everyone Efforts for which working scientists and researchers can start today
10
Fran Berman Map courtesy traveltip.org traveltip.org Austral- pacific 4% Africa 2% South America 1% The RDA Community Today: Over 1600 members from 70+ countries (as of 15/3/14) Asia 4%
11
Fran Berman Community Growth RDA Launch / First Plenary March 2013 RDA Second Plenary September 2013 RDA Third Plenary March 2014 First RDA organizational telecon: August 2012 Global Data Planning Meeting: October 2012 First Working Groups and Interest Groups 240 participants First “neutral space” community meeting (Data Citation Summit) First Org. Partner Meet-up First BOFs 380 participants from 22 countries RDA Fourth Plenary September 2014 First Organizational Assembly 6 co-located events 14 BOF, 12 Working Groups, 22 Interest Groups 497 participants Amsterdam First Working Group exchange meeting RDA Plenary 2 Washington, DC RDA Plenary 1 / Launch Gothenburg, Sweden RDA Plenary 3 Dublin, Ireland
12
Fran Berman RDA Interest (IG) and Working Groups (WG) by Focus (as of 15/3/14) Domain Science - focused Toxicogenomics Interoperability IG Structural Biology IG Biodiversity Data Integration IG Agricultural Data Interoperability IG Digital History and Ethnography IG Defining Urban Data Exchange for Science IG Marine Data Harmonization IG Materials Data Management IG Data Stewardship - focused Research Data Provenance IG Certification of Digital Repositories IG Preservation e-infrastructure Long-tail of Research Data IG Publishing Data IG Domain Repositories IG Global Registry of Trusted Data Repositories and Services IG Base Infrastructure - focused Data Foundations and Terminology WG Metadata Standards WG Practical Policy WG PID Information Types WG Data Type Registries WG Metadata IG Big Data Analytics IG Data Brokering IG Reference and Sharing - focused Data Citation IG Data Categories and Codes WG Legal Interoperability IG Community Needs - focused Community Capability Model IG Engagement IG Clouds in Developing Countries IG
13
Fran Berman First RDA Infrastructure Deliverables coming this Fall Data Type Registries WG Deliverables: System of data type registries, formal model for describing types, working model of a registry. Initial Adopters and Users: CNRI, International DOI Foundation, Deep Carbon Observatory Practical Code Policies Deliverables: Survey of policies in production use, testbed of machine actionable policies, deployment of 5 policy sets, policy starter kits Initial Adopters and Users: RENCI, DataNet Federation Consortium, CESNET, Odum Institute, EUDAT Persistent Identifier Information Types Deliverables: Minimal set of PID types, API Initial Adopters and Users: Data Conservancy, DKRZ Language Codes Deliverables: Operationalization of ISO language categories for repositories. Initial Adopters and Users: Language Archive, Paradisec Data Foundations and Terminology Deliverables: Common vocabulary for data terms, formal definitions and open registry for data terms Initial Adopters and Users: EUDAT, DKRZ, Deep Carbon Observatory, CLARIN, EPOS Metadata Standards Deliverables: Use cases and prototype directory of current metadata standards starting from DCC directory Initial Adopters and Users: JISC, DataOne
14
Fran Berman RDA/US Goals: Contribute to RDA “international” efforts and leadership Bring US efforts to broader RDA community Build the RDA community within the US Leverage and implement RDA deliverables in the US to amplify impact Collaborate closely with other RDA “regions” on key programs and initiatives RDA/US: Collaborate Globally, Contribute Locally NSF-supported RDA/US initiatives: Outreach (RDA RDA/US) RDA Deliverables Amplification Student / Early Career Engagement RDA/US Steering Committee Fran Berman, RPI Larry Lannom, CNRI Mark Parsons, RPI Beth Plale, IU
15
Fran Berman RDA/US Opportunities for Students and Early Career Professionals RDA/US Interns –$5K for summer of work/mentorship with RDA Interest or Working Group –Interns attend Fall Plenary ($2500 participant support) and present a poster on their project –Interns attend a kick-off meeting at the beginning of the summer. RDA/US Fellows –Fellows engage with an RDA WG/IG and attend 3 Plenaries ($2.5K per Plenary participant costs) –First Plenary: Identify a group to work with –Second and Third Plenaries: Present interim and final progress on common efforts
16
Fran Berman Sustainable Stewardship to Support Data-Driven Innovation Global Data Infrastructure How do we accelerate open access data sharing and exchange? National Data Infrastructure How do we support stewardship and preservation of publicly accessible research data?
17
Fran Berman Increasing R&D Agency Requirements for Data Access and Management Research Data Infrastructure particularly important
18
Fran Berman Publicly Accessible Data has to Live Somewhere Public Access, Use, and Re-Use of Data Now and in the Future Presupposes Sustainable Stewardship Today Stewardship and Preservation are critical: “Homeless” data ceases to exist Economically sustainable data infrastructure necessary to support –Federally mandated data management plans –Public access to research data –Use and re-use –Reproducibility The “bigger”, more long-term, more complex, or more valuable the data is, the greater the importance of sustainable data stewardship and infrastructure
19
Fran Berman It’s Not Just “Big Data” and It’s Not Just the Cost of Storage. Data Management, Stewardship, and Use Incur Continuing Infrastructure Costs Most valuable data replicated As research collections increase, storage capacity must stay ahead of demand Information courtesy of Richard Moore, SDSC Resources and Resource Refresh Costs include Maintenance and upkeep Software tools and packages Utilities (power, cooling) Space Networking Security and failover systems People (expertise, help, infrastructure management, development) Training, documentation Monitoring, auditing Reporting costs Costs of compliance with regulation, etc. SDSC Data Storage Growth ‘97-’09
20
Fran Berman Economics of Public Access: Who Pays the Data Bill? Article: Science Magazine, August 9, 2013. Free public access link at http:/www.cs.rpi.edu/~bermaf/
21
Fran Berman Op-Ed Recommendations: Partner Across Sectors to Distribute the Preservation and Stewardship Responsibilities Charleston Ballet blog: http://allianceblog.org/tag/charleston-ballet/ ; iTunes gift card Evolve research culture to take advantage of what works in the private sector Create sustainable university library and repository stewardship solutions Clarify public sector stewardship commitments: articulate what data will / won’t be supported Facilitate private sector stewardship of public access research data as a public good Private Sector Public Sector Individuals Academia
22
Fran Berman Value Proposition: Why Data Infrastructure Is Important The Research landscape is changing Data is accelerating new innovation and discovery Greater need for access, ease-of-use, interoperability of data Traditional modes of research recognition evolving: new approaches to collaboration / competition, publication, citation, analysis all involve digital data The Educational landscape is changing University curricula becoming more data-driven Increasing integration of on-line / on-site options supported by data infrastructure More digital monitoring, tracking, accountability needed; more policy and regulation involving digital data The Workforce is changing More data literacy required from everyone More data science embedded in everything Data scientists increasingly critical for competitiveness and leadership Image: CAIDA Internet visualization; Article: HBR October 2012
23
Fran Berman Your part: Things you can do on Monday morning Small steps: 1.If you don’t have one, create a data management plan for your current project for a reasonable fixed term of time 2.Make your data available to the community (as appropriate) by curating it and ingesting it into a publicly accessible repository 3.Cite and publish your data when you write about your results 4.Join the RDA and get involved in (or start) an Interest Group or Working Group that will help you develop needed data infrastructure.
24
Fran Berman Thank You!
25
Fran Berman Infrastructure Investments Often a Hard Sell … –Quantifying return on investment a challenge –Hard to “market” compared to more urgent competing priorities –Business model must be sustainable and address infrastructure refresh and evolution Stephanie A. Miner, the Syracuse mayor, said [infrastructure is] too often overlooked when politicians want to spend money on economic development. “You don’t cut ribbons for new water mains, but that’s really what matters.” NY Times, Feburary 15, 2014
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.