Open data and data curation

Slides:



Advertisements
Similar presentations
Depositing Data for Archiving Libby Bishop ESDS Qualidata, University of Essex Changing Families, Changing Food Meeting University of Sheffield 15 March.
Advertisements

Access to Economic and Social Data via the UK Data Archive Jack Kneeshaw UKDA.
ONS Research Data Access Strategy AGENDA Background and context Confidentiality The Strategy.
A centre of expertise in digital information managementwww.ukoln.ac.uk Creative Commons Workshop The Role Of Openness Brian Kelly UKOLN University of Bath.
A centre of expertise in digital information managementwww.ukoln.ac.uk Creative Commons Workshop The Bigger Picture Brian Kelly UKOLN University of Bath.
Making your research available for re-use SHARE IT.
Workshop 501 and 505 Review barriers to communication
Visibility Information Exchange Web System. Source Data Import Source Data Validation Database Rules Program Logic Storage RetrievalPresentation AnalysisInterpretation.
Open Government Vlora Ademi, Business Development Manager-Edu, Microsoft Macedonia &Kosovo
Basic guidelines for the creation of a DW Create corporate sponsors and plan thoroughly Determine a scalable architectural framework for the DW Identify.
The meaning of data “publication” Stéphane Goldstein Head of Programmes, RIN Research Data Management Workshop University of Oxford 13 June 2008.
E-Government PPPs Randeep Sudan, Lead ICT Policy Specialist.
Data and Knowledge Management
INDEPTH Network INDEPTH Data Systems Kobus Herbst.
Pesewa Presentations. The Importance of the Knowledge Base Key condition for international marketing success Companies need to accumulate data and information.
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Data Management Development and Implementation: an example from the UK SLA Conference, Boston, June 2015 Geraldine Clement-Stoneham Knowledge and Information.
Arnold Bregt SDI as an organisational infrastructure: Policy & Legal issues 0.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
Open for ^ Business Research Data Services & Data Management Planning Ryan Schryver Wendt Commons is our.
Carol Tullo, The National Archives 14 April 2011 The Checks and Balances of a Transparent Public Sector World of Information.
1 Guidelines For The Future Sharing Best Practice For National Bibliographies In The Digital Era Neil Wilson Information Coordinator IFLA Bibliography.
Value Perspectives Challenges Intro IntroductionsValuePerspectivesChallengesDiscussion Lead Into Gold: The Alchemy of Open Data August 2, 2010 TASSCC Annual.
Keeping an Open Mind OPEN DATA SUZANNE VAN DEN HOOGEN, MLIS DLI WORKSHOP FREDERICTON, NB APRIL 28, 2015.
DIRECTIVE 2003/98/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 17 November 2003 on the re-use of public sector information (PSI directive) Theory.
DIY Research Data Management Training Kit for Librarians Data sharing Anne Donnelly Liaison Librarian College of Medicine & Veterinary Medicine College.
Public Sector Information Strategy in the UK data.gov.uk John Sheridan 4 February 2010.
EGovernment Ireland’s eGovernment Strategy Enda Holland, Department of Public Expenditure and Reform.
Recommendation of the OECD Council for enhanced access and more effective use of public sector information 11 th Meeting of the PSI Group European Commission.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
Queensland University of Technology CRICOS No J The OAK Law Project Legal Issues in Data Management: A Practical Approach.
Why Use MONAHRQ for Health Care Reporting? March 2015 Note: This is one of eight slide sets outlining MONAHRQ and its value, available at
Selection and Use of Open Standards Addressing limitations in order to exploit the potential of open standards About the Opportunities and Risks Framework.
IAEA International Atomic Energy Agency Open Data at NIS United Nations Library and Information Network for Knowledge Sharing (UN-LINKS) October.
Keeping an Open Mind OPEN DATA SUZANNE VAN DEN HOOGEN, MLIS STFX UNIVERSITY DLI WORKSHOP FREDERICTON, NB APRIL 28, 2015.
Presentation made by 3D High School G.B. Bodoni.  What is it? Business Plan is a planning document that describe in detail the business project and allows.
Towards a balanced market Shane O’Neill Alignea Consulting Limited.
Conceptual Data Modelling for Digital Preservation Planets and PREMIS Angela Dappert.
The Future of Government Chandler, Arizona. Agenda OpenChandler.org Code For America.
Auditing Grey in a CRIS Environment
Database Systems Database Systems: Design, Implementation, and Management, Rob and Coronel.
Robert S. Chen Co-chair/Data Sharing Task Force Geneva, Switzerland 1-3 February 2011 Implementation of the GEOSS Data Sharing Principles.
Intellectual Property Right Bernard Denis, DG-KTT.
DOE Data Management Plan Requirements
Data Management Lesley A. Brown Director of Proposal Development.
The Swedish Experience: A Fair and Level Playing Field? PSI Alliance 2009 Annual Conference, 25 June 2009 Pia Bergdahl
Statistical Metadata Extensions to the X3.285 Metamodel By Daniel W. Gillman Chairman, NCITS/L8 U.S. Bureau of the Census.
Data Dissemination Conditions in the European Statistical System (ESS) UNECE, Warschau May 2009.
Managing Access at the University of Oregon : a Case Study of Scholars’ Bank by Carol Hixson Head, Metadata and Digital Library Services
Considerations on barriers to data sharing Elaine Collier, MD National Center for Research Resources National Institutes of Health.
Good data management is fundamental for high quality data, for facilitating data sharing, and for ensuring the sustainability and accessibility of data.
Using the DMPTool for data management plans Kathleen Fear February 27, 2014.
Writing a Data Management Plan with the DMPTool Kathleen Fear January 15, 2015.
Open data Program pp.kk.vvvv Osasto Finland Open data policy in Finland  Long history of Open Government  Freedom of Information Act in 1766.
Why ANDS? 16 May, 2011 Mathew Wyatt. Trends towards open data  Data science  Gov 2.0  Research 2.0  Open Science  Freedom of Information.
Ingest – Acquisition and deposit Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.
© CGI Group Inc. EGI-InSPIRE Open Data and Business Modelling for Open Science John van Echtelt Business Model Innovator Madrid, 18 September 2013.
Definition Policy – a course of action that has been officially agreed and chosen by a political party, business or other organization (Longman Dictionary)
Publishing DDI-Related Topics Advantages and Challenges of Creating Publications Joachim Wackerow EDDI16 - 8th Annual European DDI User Conference Cologne,
Trevor Taylor, Director, Member Services, Asia and the Americas,
An Introduction to Open Data
City of Toronto Open Data
Horizon 2020: Open data pilots and lessons learnt
Institutional Framework, Resources and Management
2. An overview of SDMX (What is SDMX? Part I)
5 November, 2018 Nuku’alofa, Tonga
Making official statistics open by default
Metadata in Digital Preservation: Setting the Scene
Presentation transcript:

Open data and data curation 11 Open data and data curation Hamish James Statistics New Zealand Loss of knowledge about the dataset is the key problem we face. Datasets are easy to break apart or put together. It is not enough to be able to find a dataset, you also need to be able to drill down into the dataset and examine individual records or variables. This number means nothing without context. For datasets, that context is provided by creating metadata for the dataset

Outline Setting the scene Open data How open data and data curation are related

Quick definitions data open data data curation information structured digital analogue unstructured data open data data curation

Defining data 14. This definition of data places no restrictions on the topics data may refer to, or on the purposes towards which data may be used. Data may be collected about anything. Some examples of data include sets of measurements captured by scientific instruments, responses extracted from questionnaires and forms, and structured catalogue entries about physical or digital objects, such as books and artworks. The broad scope of what may be treated as data is also expanding as modern information technology and management techniques provide ways of creating structure in previously unstructured information sources. With such a wide definition of data, readers should bear in mind that their understanding of the term may be different from that of others. 4 Data consists of sets of structured values that can be organised, analysed and manipulated by a software application or some other means of calculation. This includes data collected directly through surveys and administrative systems, as well as data created or compiled by aggregating or reanalysing other sources. A defining characteristic of data is that it is machine-readable.

Open data, data curation Open data is a philosophy based on the idea that that data is more valuable if more people can use it, and that technology has made the cost of sharing data negligble Data curation is a field of research and work focusing on the long-term management of data, built on the argument that the opportunity cost of losing data is high Open data highlights benefits Data curation worries about costs

data knowledge value

Focus of open data activities Data collected and held by governments Data collected or generated through publically funded research http://wiki.opengovdata.org/index.php?title=Open DataPrinciples

Reasons to make data open The underlying purposes of making publically funded data more accessible are to: inform decision making by government, businesses and communities increase transparency and accountability in government decision making assist informed participation by the public in government decision making promote economic development through the innovate application of data collected for one purpose to other tasks gain greater value from research data

Barriers to reuse of government data Agency culture (reluctance or hostility to data sharing) Funding constraints Ensuring data confidentiality Shared ownership Poor dissemination practices Last year we did what government departments do best, we produced a paper! Barrier to reuse of government held data The paper drew on a review of published international and national commentary and evidence along with a small number of New Zealand government agencies that are already making data available for reuse. Government agencies may be disinterested in, or actively opposed to the release of information and data. Although most evidence relates to access to official information or freedom of information laws, it seems likely that a culture of restricting access to information will also encompass restricting access to data. Data reuse initiatives can require significant funding. The costs of developing and maintaining data discovery and access systems are much easier to measure than the benefits that flow from data reuse, and this may make it difficult to construct strong business cases for investment. Concerns that allowing data reuse may breach privacy or confidentiality can make agencies reluctant to allow data to be reused. General legislation, such as the Privacy Act, and specific legislation, such as the Statistics Act, imposes requirements on government agencies to protect data. A lack of attention to how data is disseminated can greatly complicate reuse of the data. The adoption of standard machine-readable formats, consistent discovery metadata, and simple licensing arrangements will make it easier for others to reuse data. Barriers to the reuse of data can emerge where government agencies do not have sole ownership of the data they wish to make available. Data that is produced as a result of a public-private partnership can be difficult to make publicly available for re-use because of the potential threat to the commercial interests of the private partner. Data may have direct sale value, or may embody intellectual property of value. Maori could view a range of government-held datasets as holding customary knowledge and expect access to be restricted.

Open Government Data Principles Government data shall be considered open if it is made public in a way that complies with the principles below: Complete. All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations. Primary. Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms. Timely. Data is made available as quickly as necessary to preserve the value of the data. Accessible. Data is available to the widest range of users for the widest range of purposes. Machine processable. Data is reasonably structured to allow automated processing. Non-discriminatory. Data is available to anyone, with no requirement of registration. Non-proprietary. Data is available in a format over which no entity has exclusive control. License-free. Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed. OECD Principles and Guidelines for Access to Research Data from Public Funding

Characteristics of open data Free and open access to the data Freedom to redistribute the data Freedom to reuse the data No restriction of the above based on who someone is (e.g. their nationality) or their field of endeavour (e.g. commercial or non-commercial) c.f. http://www.okfn.org/about/

Creative Commons licence conditions Attribution Share-alike No derivative works NZGOAL (New Zealand Government Open Access and Licensing Framework) advocates the use of Creative Commons licences Non-commercial Creative Commons

Linked data Linked data uses semantic web approaches (especially RDF) to describe data and make it accessible to machines – a web of linked data RDF ‘triples’ are used to describe things Subject – predicate – object Hamish – is a – presenter

Linking Open Data dataset cloud

What is missing?

46 Loss of knowledge about the dataset is the key problem we face. 1717 46 Loss of knowledge about the dataset is the key problem we face. Datasets are easy to break apart or put together. It is not enough to be able to find a dataset, you also need to be able to drill down into the dataset and examine individual records or variables. This number means nothing without context. For datasets, that context is provided by creating metadata for the dataset

Data needs context 1818 Simple examples of metadata for a data value. The loss of one pieces of this metadata could render the data useless.

Examples “Which town or city in the UK has the highest proportion of students?" “Which town or city in the UK is home to one or more university campuses whose registered full or part time (non-distance) students divided by the local population gives the largest percentage?” http://digitalcuration.blogspot.com/2010/03/link ed-data-and-reality.html

re/use render explain Technology: Documentation: Hardware Standards Formats Software Documentation: Standards Meaning Interpretation

data knowledge value Technology to render data Documentation to explain

What is missing? Context Data is not self-describing Who provides the description? What does it cost to provide the description? How much of the description is held as tacit knowledge? Expert’s personal knowledge Rules and meaning encoded into the data and software

I'm not sure exactly what it was that you are thinking of I'm not sure exactly what it was that you are thinking of? However here are just a few cases where I had to do some extra work when the information was not available: Firstly in the Ag Economic Survey I had to trawl through correspondence in off-site boxes to 'cement' together a list of the codes and categories that related to the pre-1989 local govt areas used in the valuation rolls which was not in cars. (that took about 10 - 15 hours work spread over weeks. Problem was often you had parts of lists but not a list where you had codes against descriptions. At least this information was able to be used for the older Ag Production later. Farmtypes lists at the front of the Ag Production publications in the 1970's and 80' were not always what was in the actual data or published tables but by doing SAS frequency analysis I reconciled totals against the published tables to confirm the codes. Similar analysis was done for the breeds data some years against published tables. The Agriculture Production (Annual Census of Farms Year Ended 30 June 1979) study had no questionnaire or other proxy for linecodes, so we did 1980 and 1978 before and then did frequency analysis on all three and confidently confirmed most linecodes using published tables and the straddling years for reality checks. This took about 20 hours work just to do this. Then the work was as per other similar ingests. General comment is that when you are not familiar with data it takes a lot of analysis to try and 'validate' linecode and category descriptions. After some investment it gets easier after the initial investment of time if there are common variables across years but then suddenly they all change and you have to slow down and get familiar with a new set of linecodes. I have attached 1979 DDI for now.Let me know if you need anymore, cheers (See attached file: 5081_MD.xml)

Data curation Data curation involves: = open data = data curation Data management Adding value to data Data sharing for re-use Data preservation for later re-use http://www.dcc.ac.uk/news/what-makes-data-curation = open data = data curation

Digital Curation Centre

DDI Alliance

Open data brings benefits and risks more users highlights data curation failures justifies data curation costs pressure for more user support expands expert community increases risk of poor analysis

Complementary ideas Actively curated data will: Remain technologically accessible Be easier to understand (and therefore use) Data curation will benefit from data being made more open: Data that is in active use tends to remain usable Widely used data is better understood than isolated data

Thank you Contact details 29292929 Hamish James Manager, Information Management hamish.james@stats.govt.nz 04 931 4237