Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dirk Roorda, coordinator infrastructure.

Similar presentations


Presentation on theme: "Dirk Roorda, coordinator infrastructure."— Presentation transcript:

1 http://www.dans.knaw.nl Dirk Roorda, coordinator infrastructure

2

3 Overview Part 1: The rising role of data Part 2: The free use of data Part 3: The care for data Part 4: The re-use of data

4 Part 1: The rising role of data http://en.wikipedia.org/wiki/Exabyte Internet size (May 2009): 500 EB 500.000 PB 500 million TB 500 million fat USB disks 500 billion memory cards of 1 GB 70 memory cards per person

5 Data deluge http://www.datadeluge.com/http://www.datadeluge.com/ http://en.wikipedia.org/wiki/File:Tree_of_life_SVG.svghttp://en.wikipedia.org/wiki/File:Tree_of_life_SVG.svg http://tolweb.org/tree/

6 Where does it come from? Instruments satellites, sensors, dna-sequencing Records administrations, censuses, surveys Digitisation the analog legacy Hobby pictures, movies, genealogy Integration better interoperability of existing data

7 The driving force Information and Communication Technology Babbage Analytical Engine 1870

8 A datacenter Genealogy 2,5 PB 5328 servers 1,12 MW http://blog.familytreemagazine.com/insider/Inside+Ancestrycoms+TopSecret+Data+Center.aspx http://www.ancestry.com/

9 A closer look Linguistics text corpora, automatic translation Philology how to read a million books? History historical census data Archeology archive law, commercial research

10 Linguistics and Philology A chronometric approach to Indian alchemical literature Assessing frequency changes in multistage diachronic corpora Evaluating methods for computer- assisted stemmatology using artificial benchmark data sets A Corpus Study of the Rigveda Dictionary generation for less- frequent language pairs using WordNet An exercise in non-ideal authorship attribution: the mysterious Maria Ward http://llc.oxfordjournals.org/

11 History http://www.volkstellingen.nl/nl/

12 http://www.volkstellingen.nl/en/

13 Archaeology http://edna.itor.org/nl/intern/upload_directory/a00002/downloads/IMG0013.tif

14 Archaeology (2) http://edna.itor.org/nl/oai/oai_addi/oai_addi/OAI:EVALMA:a00002.xml/

15 Part 2: The free use of Data

16 Open Access Data is information Information is knowledge Knowledge is power Why share it?

17 Open Access Shared knowledge is double knowledge Without free sharing of knowledge, scientific progress will halt Tensions between sharing and not sharing remain, though

18 A good Example http://www.ploscompbiol.org/home.action

19

20

21 Work to do organise your data let your data work together with those of others (colleagues, future scientists, the public) ask new questions to the data because there is so much of it create new (virtual) data collections

22 Part 3: The care for data

23 Research Data Recycling existing data collecting by experiments, surveys primary research data verifying results by others preserving unique data from experiments compilation, aggregation, annotation databanks data mining, analysis, visualisation new data as research input

24 Challenge: Software Operating system (DOS, Windows 95,...) Programming Languages (Basic, Pascal) File formats (Word Perfect, dBase) Applications (Addressbook, Websites) Old data may be locked up in old software.

25 Meeting the challenge To prevent the problem in the future Backward compatibility Open Standards Open Source Applications Modular software engineering keep data separated from interface and business logic To remedy the problems of the past Emulation Migration

26 Challenge: Human organisation Forgotten jargon Forgotten knowledge No metadata Websites with broken links

27 Jargon II.17. Posterior berry aneurysm with subarachnoid bleed. II.18. Subarachnoid bleed with extension into the ventricles. II.19. Ruptured berry aneurysm at the end of the internal carotid artery, with obstructive hydrocephalus. Morgagni found the rupture. II.22. Subarachnoid hemorrhage. http://www.pathguy.com/morgagni.htm

28 Meeting the challenge Persistent Identifiers Enough Metadata Codification of knowledge and practices Wikipedia Datamanagement early on

29 Part 4: The re-use of data

30 Data management Use common infrastructure rather than private means Use open formats rather than proprietary formats Use open source software rather than closed software Use standard ways of documenting data taxonomies, ontologies, metadata schemes

31 Common Infrastructure Local file shares University repository DANS European Infrastructures

32 DANS http://easy.dans.knaw.nl/dms

33 EASY

34 Dataset

35 Datafiles

36 Metadata

37

38 linguists make their technology accessible - resourcesalgorithmstechniques humanities and social sciences - they are the target users

39

40 Geleerdenbrieven = Circulation of Knowledge Archiving = circulation of information

41

42

43

44 Keep imagining


Download ppt "Dirk Roorda, coordinator infrastructure."

Similar presentations


Ads by Google