Data Archiving and Networked Services DANS is an institute of KNAW en NWO Census data, CEDAR and the future of Digital Archiving: changing ideas, challenges.

Slides:



Advertisements
Similar presentations
Building a Career Portfolio
Advertisements

Royal Netherlands Academy of Arts and Sciences Accessing grey literature in an integrated environment of scientific research information Elly Dijk, Chris.
Archiving Trevor Croft MICS3 Data Archiving, Dissemination and Further Analysis Workshop Geneva - November 6th, 2006.
New Services for Data Creators and Providers Louise Corti, Head ESDS Qualidata/ Outreach & Training Alasdair Crockett, ESDS Data Services Manager.
Introduction Lesson 1 Microsoft Office 2010 and the Internet
The following is designed to give a brief understanding of the different methods you can use to scan, file and search documents in FILEstream. FILEstream.
DANS is an institute of KNAW and NWO Data Archiving and Networked Services Certification and Dutch data management services Marjan Grootveld LIBER workshop,
About «Cross Border E-archive» Conference «Digital archives and historical cross border heritage» 19 June 2014, Riga, Latvia.
DANS is an institute of KNAW and NWO Data Archiving and Networked Services DANS Research Data Services and the APARSEN Centre of Excellence Peter Doorn.
DANS is een instituut van KNAW en NWO Data Archiving and Networked Services The Front Office-Back Office model: supporting research data management in.
Costs and benefits of preserving digital research data
Data Archiving and Networked Services DANS is een instituut van KNAW en NWO Certification at DANS Ingrid Dillo DSA Conference 2014 Amsterdam, 24 September.
Importing Transfer Equivalencies: How to Maximize Efficiency How Columbia College Office of Registrar improved productivity through third party solutions.
Royal Netherlands Academy of Arts and Sciences 1 CRIS and DAREnet integrated into NARCIS: access to research information in the Netherlands Elly Dijk KNAW.
DANS is an institute of KNAW and NWO Data Archiving and Networked Services The e-depot for Dutch archaeology; a trusted digital repository Hella Hollander.
IASSIST/IFDO 2001 Conference “A Data Odyssey” Session C3 Digital Archiving Amsterdam,The Netherlands, 16 May 2001.
Alternative Ways of Presenting Historical Census Data Luuk Schreven & Anouk de Rijk &
Historical Censuses; Numbers from the Dutch Providing access to the Dutch Population census of 1971 drs. L.J.G. Schreven.
Royal Netherlands Academy of Arts and Sciences NARCIS: The Gateway to Dutch Scientific Information Elly Dijk, Chris Baars, Arjan Hogenaar and Marga van.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Digitizing Dutch Censuses Preliminary results & work in progress Luuk Schreven Netherlands Institute for Scientific.
Library Electronic Resources in the EUI Library Veerle Deckmyn, Library Director Aimee Glassel, Electronic Resources Librarian 5 September
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
Online Presence for SAIPs What’s Online Presence?
Website Content, Forms and Dynamic Web Pages. Electronic Portfolios Portfolio: – A collection of work that clearly illustrates effort, progress, knowledge,
Label production Solution with Label Gallery programs Label Gallery is used for general label design and print GalleryForm is used to create data entry.
Section 13.1 Add a hit counter to a Web page Identify the limitations of hit counters Describe the information gathered by tracking systems Create a guest.
World Bank, Africa Region, Africa Household Survey Databank - The World Bank - Africa.
Data Archiving and Networked Services & Peter Doorn (DANS) Ruben Dood (CBS) a long-standing cooperation to serve research REGIONAL WORKSHOP 16th & 17th.
Good practice in Research Data Management Module 6: Tools, training and support.
DANS is an institute of KNAW and NWO Data Archiving and Networked Services Dykes of standards supporting polders of data The practices used in the Netherlands.
Explorations of multi-level methods & ecological inference techniques in the analysis of “Life Courses in Context” Peter Doorn & Luuk Schreven
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
According to the APA Publication Manual, some online books, journals, and magazines have added “digital object identifier (doi) numbers to their bibliographic.
MSS Technologies and the AIIM Grand Canyon Chapter present: Electronic Document Management System Needs Analysis.
The DSpace Course Module – An introduction to DSpace.
Digitization of the Federal Depository Library Program Judith C. Russell Superintendent of Documents & Managing Director, Information Dissemination “Electronic.
USERS IN THE SPOTLIGHT: STUDY ON THE USE OF THE DUTCH SCIENTIFIC PORTAL NARCIS, 2009 Elly Dijk, Arjan Hogenaar, Marga van Meel Royal Netherlands Academy.
30 september 2009 The communication mix shifts from paper to screen: take the edge with digital documentation.
Linking resources Praha, June 2001 Ole Husby, BIBSYS
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
Chapter 16 Designing Effective Output. E – 2 Before H000 Produce Hardware Investment Report HI000 Produce Hardware Investment Lines H100 Read Hardware.
Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation.
Introduction to Excel Line Graphing The ‘Quick’ and ‘Easy’ guide to using Microsoft Excel for Line Graphing * Created by: Bunch of BHS science teachers.
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
POPULATION AND HOUSING CENSUSES IN SLOVAKIA ON THE WEBSITE Miroslav Hudec Pavol Büchler INFOSTAT – Bratislava MSIS Geneva
The Control Panel is the starting point when you wish to load files into Blackboard. Students cannot see this panel, unless they know your password of.
1 UNOG Library Digitization and Microform Unit (DMU) – December 2009.
June 3, 2016 Research data management – an introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
United Nations Economic Commission for Europe Statistical Division The Importance of Databases in the Dissemination Process Steven Vale, UNECE.
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
Datasealofapproval.org13/12/2015 DANS is an institute of KNAW and NWO 1 Identifying and removing barriers for sharing scientific data Laurents Sesink
DANS is an institute of KNAW and NWO Data Archiving and Networked Services Measurement of research impact in OpenAIRE 2020: via text mining or the CRISs?
DANS is an institute of KNAW and NWO Data Archiving and Networked Services DANS Research Data Services and the APARSEN Centre of Excellence Peter Doorn.
Tiziana // Alessandra Lenzi - MG Breaking down the walls Project Museo Galileo and the Linked Open Data A joint project between.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Data Archiving and Networked Services DANS Peter Doorn, director DANS.
Tema 3 INEbase history Statistical books available on the web Celia Santos
NASBLA Social Media: What is it for? NASBLA is involved in numerous Social Media that all serve a distinct purpose. So, what are they all for?
Ingest – Acquisition and deposit Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.
Introduction to SHERPA RoMEO and its Significance for Publishers
Presentation on ISS RIT preparation :
YugNIRO Digitization Proposal 2012
Louisiana: Our History.
Recent developments in Eurostat publications
Sharing of Eurostat predefined tables
DANS is “linked 3th party of SURFsara”
Sharing of Eurostat predefined tables
Point 6. Eurostat plans for Time Use Survey data processing and dissemination Working Group on Time Use Surveys 10 April 2013.
ESTP course on Statistical Metadata – Introductory course
APE EAD3 introduction - DARIAH - Brussels
Presentation transcript:

Data Archiving and Networked Services DANS is an institute of KNAW en NWO Census data, CEDAR and the future of Digital Archiving: changing ideas, challenges & opportunities Peter Doorn Data Archiving and Networked Services CEDAR Mini Symposium, Amsterdam, 31 st March 2014

Contents Two slides about DANS Why digitize historical censuses? History of the census digitization projects Results: CD-ROMs, Websites, Publications Digital preservation of the first “digitally born” census of 1960 Projects and activities since 2006 Challenges for the years to come

What is DANS? Institute of Dutch Academy and Research F unding O rganisation (KNAW & NWO) since 2005 First predecessor dates back to 1964 (Steinmetz Foundation), Historical Data Archive 1989 Mission: promote and provide permanent access to digital research information

EASY: Electronic Archiving System for self-deposit NARCIS: Gateway to scholarly information In the Netherlands Data Seal of Approval Persistent Identifier URN:NBN resolver Our services

Why digitize historical censuses? Important source for statistics and research Limited number of census books Preservation of 19 th and 20 th century originals Digital archiving Target audience: researchers, onderzoekers, students, local governments, amateur historians, education

Systematic digitization of Dutch Census Books 1995/96 possibility raised in talks between CBS and Steinmetz archive 1996: small pilot by CBS and Netherlands Historical Data Archive – Selection of material – How to digitize? – How to store? – How to pubish? – Project plan for continuation project

Digitization in three projects : – Microfilming and scanning 200 books, 42,500 pages – Data-entry 10,000 pages Census – March 2004: – Checking and correction censuses and 1930 – Archiving digitally born census 1960 and 1971 March 2003 – July 2006: Life Courses in Context – First project in humanities funded by NWO “large investments” – In collaboration with Historical Sample of the Netherlands (Kees Mandemakers, IISG) – Data-entry censuses – Scanning handwritten tables 1947 and OCR tests – Documentation, harmonisation, “linking”, access, research

Digitizing Censuses: division of tasks Collaboration project CBS and NHDA/NIWI/DANS since early 1997 Subsidized by NWO and KNAW CBS: –data entry tables Census 1899 –Statline publication NIWI: –Scanning Census –OCR of Introduction to Census 1899 –First Website Census 1899

Results 1999 Set of 5 CD-ROMs –images of censuses (200 books, c. 42,500 pages Set van 2 CD-ROMs –Database Census 1899 –27 books – pages > 17,000,000 numbers/characters Introduction to Census 1899 (also as Website) StatLine publication tables of 1899 Images 1899 Conference & book with analyses of the Census 1899 (2001)

CD-ROM publications in September 1999

Book publications [related projects: Historical GIS, HASH, HDNG]

Website of Introduction to Census Launched in September 1999

Census 1899 also published in CBS StatLine

The 1960 census: the first born digital census in the Netherlands First computer at CBS: X1 Electrologica 1969: punch cards transferred to Steinmetz Archive Kf. 100 needed for reconstructing files Bitrot, data input errors and more… W B’(‘N3=‘)’5ZD,10B SC2+NSC3); ‘,/’)’); B’(‘N3=‘)’5ZD,10B 1790

The size of the problem PersonsMissing personsPersons too many Men183,970254,100 Women182,7557,661 Total366,725261,761

Lanceerknop voor de geheel vernieuwde website Launched in November 2004

Web statistics visitors (3300 per month) 2 mln. page views 0,5 Tb data down- loaded

Projects and activities since 2006 Digitization of “transparancies” and collotypes NLGIS – historical GIS Checking and correction Harmonisation Archiving in EASY Scanning historical data at CBS & CBS website HISTEL project CEDAR project

Digitization of “transparencies” and collotypes (early photo copies) Totaaloverzicht lichtdrukken/transparantenTekens per paginaOpmerking TellingBandenPagina'sTabel-inhoud Voorkolom (gedrukt) Blanco cellenTotaal BDT BDT BDT BDT micro- fiches digitaal beschikbaar BRT VT & BRT WT WT VT 1960? deels digitaal beschikbaar VT 1971 geprint uit bestanden digitaal beschikbaar Totaal Digitaal beschikbaar Totaal excl. digitaal beschikbaar

2006: Scanning and OCR of transparancies Scan record attempt, February 2005: Census 1947 C pages scanned in one day

Manual data entry of 1947 Census Templates prepared for each table type Data entry carried out by Xerox (India) Supervision by Jan Jonker Archived in and available from DANS EASY

Project idea June 2009: New portal historical population data

Checking and correction Most underestimated task of the project Ongoing work since 1999… Distinction between data-entry / conversion errors and source errors Data-entry errors are corrected Error detection method based on differences between calculated and given row and column totals Source errors are indicated with notes… Tom Vreugdenhil is the hero of error checking and correction

Harmonisation Three key variables: – Occupations – Municipalities – Religious denomination

Harmonizing occupations Occupations available for 1849, 1889, 1899, 1909, 1920, 1930 and 1947 Coded according to Historical International Standard Codes of Occupations (HISCO) Results: – Coded occupations and exact content and context of each table with unique occupational titles (Excel & Access) – Total of all unique occupational titles in the censuses (Excel & Access) – Excel Workbook Lookup tool to code occupations automatically – Excel Workbook hisco toolbar to search for codes, occupational titles and descriptions of occupations in the HISCO databaseHISCO database

Harmonizing municipalities Based on the work by Onno Boonstra and Ad van der Meer “Repertorium van Nederlandse gemeenten ” New standard code (“Amstrdam code”) for all Dutch municipalities that have ever existed Database tool to code municipalities in the censuses ID amsterdamse _codebegindatumeinddatumgemeente_provgemeenteprovincie Almenum Friesland Zuidlaren Drenthe Tynaarlo Drenthe Zeddam Gelderland Zijpe Noord-Holland Opsterland Friesland Ureterp Friesland

CBS Historical Collection website: 19 th and 20 th century publications

HISTEL project Umbrella project to oversee the various census activities that are going on, supervised by René van Horik: Transfer of data, website – new agreement between CBS and DANS – publish as extended data guide / paper in new DANS data journalwww.volkstellingen.nl "Anonymous open access" to the census data in EASY Archiving of existing data and newly scanned tables in EASY Version management, updating corrected tables Lisaison with CEDAR

Archiving everything in EASY

Why a CEDAR project? Great examples of LOD projects on new census data – Are they applicable to historical tables The historical censuses are stored in numerous containers in an archival silo – Can we open up the containers and silos to connect the data? – Can we make the data comparable over time? – Can we link it to outside sources? Is it viable to publish the whole DANS archive as LOD? – Provide insight to the possibilities for more data collections

Lots of challenges left… CEDAR: publishing the historical censuses as LOD – First priority for linking: linking the census data over time – Further harmonization is a prerequisite for this – LOD offers new insight in the extent of the harmonization problem and a systematic solution (we expect ;-) Archiving LOD – PRELIDA (PREserving Linked Data) project offers insight in the requirements and options – Storing the RDF is only part of the answer Lots of images of historical census tables left to turn into figures Preserving the census services: no longer supported, NLGIS tool already gonewww.volkstellingen.nl Wish for 2020: a user-friendly tool to link historical census data over time and to external sources

Data Archiving and Networked Services DANS is an institute of KNAW en NWO Thank you for your attention