Download presentation
Presentation is loading. Please wait.
1
http://www.dans.knaw.nl Dirk Roorda, coordinator infrastructure
3
Overview Part 1: The rising role of data Part 2: The free use of data Part 3: The care for data Part 4: The re-use of data
4
Part 1: The rising role of data http://en.wikipedia.org/wiki/Exabyte Internet size (May 2009): 500 EB 500.000 PB 500 million TB 500 million fat USB disks 500 billion memory cards of 1 GB 70 memory cards per person
5
Data deluge http://www.datadeluge.com/http://www.datadeluge.com/ http://en.wikipedia.org/wiki/File:Tree_of_life_SVG.svghttp://en.wikipedia.org/wiki/File:Tree_of_life_SVG.svg http://tolweb.org/tree/
6
Where does it come from? Instruments satellites, sensors, dna-sequencing Records administrations, censuses, surveys Digitisation the analog legacy Hobby pictures, movies, genealogy Integration better interoperability of existing data
7
The driving force Information and Communication Technology Babbage Analytical Engine 1870
8
A datacenter Genealogy 2,5 PB 5328 servers 1,12 MW http://blog.familytreemagazine.com/insider/Inside+Ancestrycoms+TopSecret+Data+Center.aspx http://www.ancestry.com/
9
A closer look Linguistics text corpora, automatic translation Philology how to read a million books? History historical census data Archeology archive law, commercial research
10
Linguistics and Philology A chronometric approach to Indian alchemical literature Assessing frequency changes in multistage diachronic corpora Evaluating methods for computer- assisted stemmatology using artificial benchmark data sets A Corpus Study of the Rigveda Dictionary generation for less- frequent language pairs using WordNet An exercise in non-ideal authorship attribution: the mysterious Maria Ward http://llc.oxfordjournals.org/
11
History http://www.volkstellingen.nl/nl/
12
http://www.volkstellingen.nl/en/
13
Archaeology http://edna.itor.org/nl/intern/upload_directory/a00002/downloads/IMG0013.tif
14
Archaeology (2) http://edna.itor.org/nl/oai/oai_addi/oai_addi/OAI:EVALMA:a00002.xml/
15
Part 2: The free use of Data
16
Open Access Data is information Information is knowledge Knowledge is power Why share it?
17
Open Access Shared knowledge is double knowledge Without free sharing of knowledge, scientific progress will halt Tensions between sharing and not sharing remain, though
18
A good Example http://www.ploscompbiol.org/home.action
21
Work to do organise your data let your data work together with those of others (colleagues, future scientists, the public) ask new questions to the data because there is so much of it create new (virtual) data collections
22
Part 3: The care for data
23
Research Data Recycling existing data collecting by experiments, surveys primary research data verifying results by others preserving unique data from experiments compilation, aggregation, annotation databanks data mining, analysis, visualisation new data as research input
24
Challenge: Software Operating system (DOS, Windows 95,...) Programming Languages (Basic, Pascal) File formats (Word Perfect, dBase) Applications (Addressbook, Websites) Old data may be locked up in old software.
25
Meeting the challenge To prevent the problem in the future Backward compatibility Open Standards Open Source Applications Modular software engineering keep data separated from interface and business logic To remedy the problems of the past Emulation Migration
26
Challenge: Human organisation Forgotten jargon Forgotten knowledge No metadata Websites with broken links
27
Jargon II.17. Posterior berry aneurysm with subarachnoid bleed. II.18. Subarachnoid bleed with extension into the ventricles. II.19. Ruptured berry aneurysm at the end of the internal carotid artery, with obstructive hydrocephalus. Morgagni found the rupture. II.22. Subarachnoid hemorrhage. http://www.pathguy.com/morgagni.htm
28
Meeting the challenge Persistent Identifiers Enough Metadata Codification of knowledge and practices Wikipedia Datamanagement early on
29
Part 4: The re-use of data
30
Data management Use common infrastructure rather than private means Use open formats rather than proprietary formats Use open source software rather than closed software Use standard ways of documenting data taxonomies, ontologies, metadata schemes
31
Common Infrastructure Local file shares University repository DANS European Infrastructures
32
DANS http://easy.dans.knaw.nl/dms
33
EASY
34
Dataset
35
Datafiles
36
Metadata
38
linguists make their technology accessible - resourcesalgorithmstechniques humanities and social sciences - they are the target users
40
Geleerdenbrieven = Circulation of Knowledge Archiving = circulation of information
44
Keep imagining
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.