New Requirements for Dataset Metadata: a perspective from CODATA Simon Hodson Executive Director CODATA

Slides:



Advertisements
Similar presentations
GEOSS Data Sharing Principles. GEOSS 10-Year Implementation Plan 5.4 Data Sharing The societal benefits of Earth observations cannot be achieved without.
Advertisements

Near East Plant Protection Network for Regional Cooperation & Knowledge Sharing Food and Agriculture Organization of the United Nations An Overview on.
European Clearing-House Mechanism Portal Toolkit Expert Group Meeting
The Oviedo Convention from the Perspective of DG Research Dr. Lino PAULA European Commission DG Research, Governance and Ethics Unit.
GEO SB-01 Oceans and Society: Blue Planet An Integrating Oceans Task of GEO GEO-IX Plenary November 2012 Foz do Iguaçu, Brazil on behalf of the Blue.
(The Global Programme of Research On Climate Change Vulnerability, Impacts and Adaptation) Adaptation Knowledge Day V: Climate Change Adaptation Gaps BONN,
CODATA Activities and Plans Simon Hodson Executive Director CODATA 9th Meeting of the Board on Research.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Welcome to the Conference !! Juan Bicarregui Chair, APA Executive.
The JISC vision of research information management Dr Malcolm Read Executive Secretary, JISC.
The Science Council and Sustainability.
CODATA Report and CODATA and World Data System Conference SciDataCon 2014 November 2-5, 2014 New Delhi, India BRDI Meeting, March 11, 2014 Sara Graves,
Presented by: Charles Pallandt Title: Managing Director EMEA Academic & Governmental Markets Date: April 28 th, Turkey “Driving Research Excellence.
Bologna and the Third Cycle Anthony J Vickers UK Bologna Expert.
A New Initiative on Earth System Research for Global Sustainability
EEN [Canada] Forum Shelley Borys Director, Evaluation September 30, 2010 Developing Evaluation Capacity.
THE JOINED UP WORLD OF E-RESEARCH Professor Neil McLean National Technical Standards Adviser to the Department of Education Science and Training (DEST)
Facilitate Open Science Training for European Research Where Librarians can learn and teach Open Science for European Researchers LIBER 2015 London,
Strengthening International Science for the Benefit of Society Celebrating 75 years:
Agricultural Biotechnology Network for Regional Collaboration and Knowledge Sharing Food and Agriculture Organization of the United Nations An Overview.
CODATA Capacity Building Activities Simon Hodson Executive Director CODATA GEO Work Plan Symposium,
STRENGTHENING the AFRICA ENVIRONMENT INFORMATION NETWORK An AMCEN initiative A framework to support development planning processes and increase access.
GEO Work Plan Symposium 2012 ID-05 Resource Mobilization for Capacity Building (individual, institutional & infrastructure)
1 CODATA John Broome CODATA Treasurer. 2 What is CODATA? CODATA, the “Committee on Data for Science and Technology”, is an independent interdisciplinary.
Data Infrastructures Opportunities for the European Scientific Information Space Carlos Morais Pires European Commission Paris, 5 March 2012 "The views.
The European Science Foundation is a non-governmental organisation based in Strasbourg, France.
David Baker Ed Simons Josh Brown The various aspects of Interoperability A strategic partnership driving interoperability in research information through.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
UNDP-GEF Adaptation 0 0 Impact of National Communications on Process of Integrating Climate Change into National Development Policies UNFCCC Workshop on.
The AIACC Project Assessments of Impacts and Adaptations to Climate Change in Multiple Regions & Sectors UNFCCC Workshop Bonn 9 June 2003.
1 Collaborations and Partnerships John Broome CODATA-International.
SCIENCE, RESEARCH DATA, AND PUBLISHING Stewart Wills Editorial Director, Web & New Media, Science 26 February 2013.
Out of Cite, Out of Mind: USNC Work on Data Citation Practices and Standards BRDI 23 September 2013 Bonnie C. Carroll Information International Associates,
Towards a European network for digital preservation Ideas for a proposal Mariella Guercio, University of Urbino.
What is a Business Analyst? A Business Analyst is someone who works as a liaison among stakeholders in order to elicit, analyze, communicate and validate.
ESIP Federation Air Quality Cluster Partner Agencies.
GEO Work Plan Symposium 2012 ID-03: Science and Technology in GEOSS ID-03-C1: Engaging the Science and Technology (S&T) Community in GEOSS Implementation.
1 Services and Infrastructutre John Broome CODATA-International.
David Carr The Wellcome Trust Data management and sharing: the Wellcome Trust’s approach Economic & Social Data Service conference.
SHARE (SHared Access Research Ecosystem) Tyler Walters Co-Chair, SHARE Steering Group (a joint committee of the ARL, the AAU, and the APLU) Eric Celeste.
Committee Meeting, June 9, 2008 Strategic Institutional Research Plan.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
Data Citation: framing the discussion and global context Dr Simon Hodson Executive Director, CODATA Referencing data in publications: principles,
Consultant Advance Research Team. Outline UNDERSTANDING M&E DATA NEEDS PEOPLE, PARTNERSHIP AND PLANNING 1.Organizational structures with HIV M&E functions.
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
The Data Sharing Working Group 24 th meeting of the GEO Executive Committee Geneva, Switzerland March 2012 Report of the Data Sharing Working Group.
EDLproject WP3 “Developing the European Digital Library” LIBER – EBLIDA workshop Digitisation of Library Material in Europe Copenhagen, October.
DRIVER Action plan for an International Repository Organisation Dale Peters OAI6 Breakout Session Joining up Repositories 18 June 2009.
19-20 October 2010 IT Directors’ Group meeting 1 Item 6 of the agenda ISA programme Pascal JACQUES Unit B2 - Methodology/Research Local Informatics Security.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
Association of Enterprise Architects International Committee on Enterprise Architecture Standards Jan 23, Collaborative Expedition Workshop #57aeajournal.org.
Creating Global Information Commons for Science Presentation at the EUROCRIS meeting September 2006, Brussels by Jean-Jacques Royer CODATA Treasurer.
Joint Declaration of Data Citation Principles (Overview) The Data Citation Synthesis Group Joint Declaration.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
RCUK International Funding Name Job title Research Councils UK.
GEOSS Data Sharing: Plans for 2016 and beyond GEO Work Programme Symposium WMO, Geneva, 2 May 2016 Robert Chen on behalf of the DSWG co-chairs (GEO Foundational.
The European Science Foundation is a non-governmental organisation based in Strasbourg, France.
ODIN – ORCID and DATACITE Interoperability Network ODIN: Connecting research and researchers Sergio Ruiz - DataCite Funded by The European Union Seventh.
Capacity Building in: GEO Strategic Plan 2016 – 2025 and Work Programme 2016 Andiswa Mlisa GEO Secretariat Workshop on Capacity Building and Developing.
NRF Open Access Statement
Priorities for International Development of e-Infrastructure and Data Management in Global Change Research Presentation by Robert Gurney, University of.
EOSC MODEL Pasquale Pagano CNR - ISTI
Summit 2017 Breakout Group 2: Data Management (DM)
GEOSS Data Sharing Principles
Access  Discovery  Compliance  Identification  Preservation
Support for the AASHTO Committee on Planning (COP) and its Subcommittees in Responding to the AASHTO Strategic Plan Prepared for NCHRP 8-36, TASK 138.
THE CODATA STRATEGIC PLAN
Towards Excellence in Research: Achievements and Visions of
Common Solutions to Common Problems
Bird of Feather Session
Presentation transcript:

New Requirements for Dataset Metadata: a perspective from CODATA Simon Hodson Executive Director CODATA

New Requirements: dataset metadata  How do euroCRIS stakeholders see new and emerging requirements for dataset metadata.  How may CERIF evolve in the light of this.  What cooperation/collaboration may be possible.  CODATA is a strategic partner of euroCRIS:  New ED and a new Strategic Plan:  Overview of CODATA activities.  Sample of initiatives that may have some implications for dataset metadata.

Some general points…  Huge diversity of metadata standards.  Importance of generic issues: time (for various functions), geographic locations/coverage, linking to publication, licensing, access control.  Some examples from initiatives in ‘data publishing’ – providing a place for datasets linked to publications or a means of ‘publishing’ datasets.  But requirements of datasets are not the same as those of publications.  Importance of ‘assessability’ – provides a need for contextual information, provenance.  Policy issues are significant drivers. But so is a significant desire on behalf of researchers to get credit for datasets.  All have metadata as an important component of what is being done, but not the whole story: cultural aspects, business processes and issues of sustainability.

What is CODATA?  Mission: To strengthen international science for the benefit of society by promoting improved scientific and technical data management and use.  An Interdisciplinary Body of ICSU, the International Council for Science.  CODATA has been working at the forefront of data science since  Not-for-profit, membership organisation (national members, scientific unions, affiliate members, members-at-large). CODATA is…  An international community and network of expertise on data issues.  An influential and authoritative voice in national and international policy regarding scientific data management.  A focal point for international, cross- disciplinary collaboration and communication on key scientific data issues.

What does CODATA bring to the table? Unique position comprises national members’ committees, International Union members, Task Groups, strategic initiatives, close relationship to ICSU.  Genuinely international in reach 23 National Members, many with very active national committees. Model can encourage activity at national and international levels. Recent new members: Mongolia, Finland, Czech Republic.  Helps address data issues in international science programmes Close association with ICSU (and WDS), supports strategic initiatives. 16 International Scientific Union members. Provides engagement with data challenges in particular disciplines and in transdisciplinary programmes.  A community and forum for data science Affiliate members and members at large. Active Executive Committee International Conference, Data Science Journal, Working Groups and Task Groups.

Elements of Strategic Plan Policy frameworks for data: take the lead in defining a policy agenda for scientific data. First step is to establish Data Policy Committee. Provide focus and expertise. Support ICSU forum on Open Access. Forum on data principles for publicly funded and international science. 2.Frontiers in data science and technology: coordinate work in key frontiers of data science and interdisciplinary application areas; capacity building activities. CODATA workshop series on Frontiers of Data Science and Technology. Important to partner other organisations, stakeholders; build on work of Task Groups. Identify and support addressing of data issues in coordination with International Unions. Current activities: nanotechnology, data for sustainable development, approaches to data recovery. Expand CODATA’s capacity building activities, education and training activities including curriculum development. Initiative to encourage Early Career Data Scientists.

Elements of Strategic Plan Frontiers in data science and technology: (cont.) International Science Data Conference, with WDS, New Delhi 2-5 Nov Reinvigorate the Data Science Journal. Task Groups includes Groups working on data citation, data challenges in microbiology, anthropometry, materials science and earth and space science; preservation and availability of data in developing countries; open data for global roads. 3.Data strategies for international science: support major ICSU scientific programmes to address data management needs (including infrastructure, policies, processes, standards). Promote a coordinated data strategy for Future Earth Promote a coordinated data strategy for Integrated Research on Disaster Risk

CODATA Task Groups TGs are approved for two-years by the CODATA General Assembly; seed funding provided; benefit from community endorsement, linkages. 1.Advancing Informatics for Microbiology 2.Anthropometric Data and Engineering 3.Data Citation Standards and Practices 4.Data at Risk 5.Earth and Space Science Data Interoperability 6.Exchangeable Materials Data Representation to support Scientific Research and Education 7.Fundamental Physical Constants: 2010 CODATA constants Global Information Commons for Science Initiative 9.Global Roads Data Development 10.Linked Open Data for Global Disaster Risk Research 11.OCTOPUS: Mining Space and Terrestrial Data for Improved Weather, Climate and Agricultural Predictions 12.Preservation of and Access to Scientific and Technical Data in/for/with Developing Countries

CODATA Nanomaterials  CODATA longstanding series of Task Groups on materials data.  ICSU and CODATA sponsored Workshop in February 2012  Representatives from 13 Unions, ISO 229, and other communities in attendance  Recommended new project to establish multi-disciplinary, multi-user groups to establish requirements of nanomaterials description system  CODATA and VAMAS (Versailles Project on Advanced Materials and Standards) established a Joint Working Group  Invited 14 Unions and many nanomaterials experts.  Two-year work plan includes international workshops.  Participation in FP7 Future Nano Needs Project.  Results to be given to ISO and other standards groups Slide credit, John Rumble

CODATA Nanomaterials  Part of a large European project: Future Nano Needs.  Developing unified information model for describing nanomaterials.  CODATA provides liaison role with international standards bodies and scientific unions to disseminate work and promote uptake of the information models and other outputs.  Team working on the information model comes from the CODATA community. Need to describe properties (interaction) as well as measurements. Criteria for uniqueness and equivalency. Different requirements of different disciplines: Chemistry Physics Materials science Food science and technology Nutrition science Toxicology Environmental and ecology science Medicine Biology Slide credit, John Rumble

Royal Society Science as an Open Enterprise Report, 2012  ‘how the conduct and communication of science needs to adapt to this new era of information technology’.  Intelligent Openness: data should be accessible, assessable, intelligible, usable.  ‘As a first step towards this intelligent openness, data that underpin a journal article should be made concurrently available in an accessible database. We are now on the brink of an achievable aim: for all science literature to be online, for all of the data to be online and for the two to be interoperable.’  Royal Society June 2012, Science as an Open Enterprise, ce-public-enterprise/report/

ICSU Consultation on Open Access and Metrics  ICSU (International Council of Science) consultation on Open access to scientific data and literature and assessment of research by metrics.  Contribute towards a report which will provide providing 'an analysis of the current situation and thinking on open access and the use of metrics and a statement of ICSU’s overall policies' with regard to these things.’  ICSU feels the need to clarify position, work with scientific unions on this.  Concerns about the Gold OA business model (for data and publications) and where this leaves researchers without grant funding.  What type of of metrics, to what end?  Consultation workshop on 25 September involving ICSU members and to which CODATA is contributing.

Data Citation, Standards and Practices  Co-Chairs: Christine Borgman, Jan Brase, Sarah Callaghan; Consultant: Paul Uhlir; see  Involvement of a range of key organisations and experts.  Major Report Out of Cite, Out of Mind to be released in September 2013  Forceful set of ‘First Principles’ for data citation: 1.Status of Data: Data citations should be accorded the same importance in the scholarly record as the citation of other objects. 2.Attribution: Citations should facilitate giving scholarly credit and legal attribution to all parties responsible for those data. 3.Persistence: Citations should be as durable as the cited objects. 4.Access: Citations should facilitate access to data by humans and by machines. 5.Discovery: Citations should support the discovery of data and their documentation. 6.Provenance: Citations should facilitate the establishment of provenance of data. 7.Granularity: Citations should support the finest grained description necessary to identify the data. 8.Verifiability: Citations should contain information sufficient to identify the data unambiguously. 9.Metadata Standards: Citations should employ widely accepted metadata standards. 10.Flexibility: Citation methods should be sufficiently flexible to accommodate the variant practices among communities.

Vision ‘Open Data and the Social Contract of Scientific Publishing’ The credibility and effectiveness of the research enterprise is due in large part to the social contract behind scholarly publishing. Researchers disclose their work to their peers in return for professional credit. In so doing, they also expose their findings to be confirmed or refuted, and enable other researchers to build upon their results. Dryad seeks to extend this social contract to research data by providing a model for how a disciplinary repository can motivate researchers to disclose the data that is of the greatest value for scientific reuse, that associated with publications, and realize the manifold benefits of free access to scientific data in perpetuity.

Dryad Data Repository  Dryad Data Repository:  Provides a home for the data underpinning research articles.  Exploring a depositor pays business model:  Relying on low cost, low curation, high throughput business model.  Emphasises researcher responsibility to ensure quality of data and metadata. Research Funder Research Project Gold OA Fee/APC JournalDryad

Dryad Data Repository  Encourages citation of the data package as well as the article.  Submission encourages: scientific names (species), spatial coverage, temporal coverage.  Keen to explore ways of enhancing the service and this will rely on enhanced metadata: collections relating to particular subjects, data types, methods, funders etc.  In part this rests assessability and making connections, on being able to draw in significant amounts of contextual information, identifiers and other metadata, from a ‘research information ecosystem’.

Policies / Standards / Repositories  Journal data availability policies:  JorD project reports that nearly 50% of journals sampled have a data availability policy of some sort (though only 25% of these can be characterised as ‘strong’).  Journal policies generally not clear or specific about repository, standards etc.  Interest in aligning journal policies with funder policies, help researchers comply.  Clarify relations between Policies > Standards > Data repositories  What are the policy requirements from funders, what is said if anything about standards and repositories?  What standards have community uptake, are used in repositories?  What repositories have accreditation, employ given standards, are recommended in policies.  Build combined information resource:  Contact: Susanna Sansone and Rebecca Lawrence

Understanding Data Publication Processes: PREPARDe Project  Examined and modeled a number of workflows for data publication (publishing data associated with research publications).  Report on publication processes.  Data repository accreditation.  White paper of principles of repository accreditation to be released.  Scientific review of datasets.  White paper of principles and recommendations on data peer review.  Cross-linking between repositories and data publishers.  Requirements for a third party broker to facilitate multi-directional linking between datasets and literature.  See and

NPG Scientific Data  Open Access, online-only platform containing data descriptors that describe and explain datasets, supported by an APC model.  Why this approach? Drivers:  Researchers: credit for depositing and describing their data, helps researchers meet funder requirements.  Funders: helps realise objectives for data availability and reuse (RoI).  Scholarly communications: enhanced product, metadata approach hopefully makes more interesting to market.  Data and metadata to be CC0, narrative has options ranging from CC BY to CC BY-NC- SA  Data descriptor provides detailed information about a dataset held in a third party repository (like Dryad or Figshare, but not limited to these).  Data descriptor will be peer reviewed.

NPG Scientific Data: Data Descriptors  Data Descriptors use ISA Tab (Investigation, Study, Assay)  Increasing use in life sciences, biomedical and environmental sciences.  ‘Investigation’ (the project context), ‘Study’ (a unit of research) and ‘Assay’ (analytical measurement).  Importance of metadata standards that capture research processes to some extent: e.g. DDI, ISA, CIF.  Enrich with contextual metadata, methods, workflows, DMP info…

Crystallographic Data  Raises the question of which data should be available and described.  Raw data (few MB-few GB)  Reduced data (tens of kB-few MB)  Structure data (few kB – ~1 MB)  Established workflow and format for sharing structure data; but don’t always share derived data, and less frequently raw data.

Why Publish Raw Crystallographic Data?  Increasing use of validation for structure data: found, retrospectively that over 100 fraudulent structures had been published in Acta Crys. E from : Molecules.pdf Molecules.pdf  Publishing raw data allows validation and checking.  Above all has benefits for improving techniques for reduction and structure identification.  Helliwell: publishing raw data allows improved refined software use, new results and wider uptake of improved approach:  IUCr Diffraction Data Deposition Working Group working towards a recommendations on data deposit and for a federation of repositories.  Information challenges likely to include: federated search portals, extraction of subsets of large data sets, establishment of automated procedures for expiring data sets, linking to publications, sorting by different criteria... Slide credist: Brian McMahon, John Helliwell, Michael Hoyland.

Federations of Data Repositories  ICSU-World Data System:  Prototype data portal.  Working group on how to enhance this metadata: in particular by linking to other sources of data and information.  Concept paper in preparation.  ANDS Research Data Australia  Now has nearly 90,000 collections!  Uses RIF-CS as interchange format.  Catalogue of data holdings in universities and data centres/labs.

ReCollect App for Eprints Data Repositories Metadata approach from Research Essex Project

Metadata Schema for Institutional Data Repositories

Thank You! Contact Simon Hodson Executive Director CODATA Tel (Office): | Tel (Cell): CODATA (ICSU Committee on Data for Science and Technology), 5 rue Auguste Vacquerie, Paris, FRANCE