Dirk Roorda, coordinator infrastructure.

Slides:



Advertisements
Similar presentations
DRIVER Building a worldwide scientific data repository infrastructure in support of scholarly communication 1 JISC/CNI Conference, Belfast, July.
Advertisements

DRIVER Step One towards a Pan-European Digital Repository Infrastructure Norbert Lossau Bielefeld University, Germany Scientific coordinator of the Project.
Basic Computer Vocabulary
SAIL: Documenting data content and quality, letting the computer take the strain Caroline Brooks Senior Research Analyst, College of Medicine, Swansea.
Interoperability Scenarios All Working Groups Meeting May, Rome, Italy.
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Selecting a Data Sharing Repository. 2 Why Share Data? Enabling others to replicate and verify results as part of the scientific process Allows researchers.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
Are We Ready for the Digital Humanities? Atlantic Provinces Library Association Lisa Goddard Memorial University Libraries May 2011 Atlantic Provinces.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.
Chapter Chapter Goals Describe the layers of a computer system Describe the concept of abstraction and its relationship to computing Describe.
SCIENTIFIC SOLUTIONS Thomson ResearchSoft Paul Torpey April 8, 2005.
Interpret Application Specifications
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
1IASSIST 2009, Tampere Maarten Hoogerwerf Pitfalls of Enhanced Publications.
Hardware and Software Basics. Computer Hardware  Central Processing Unit - also called “The Chip”, a CPU, a processor, or a microprocessor  Memory (RAM)
Chapter 1 The Big Picture Chapter Goals Describe the layers of a computer system Describe the concept of abstraction and its relationship to computing.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Chapter 01 Nell Dale & John Lewis.
11 Aug 2015Computer introduction1 Storage devices Bits, Bytes, Kilobytes, MB, GB, Terabytes Hardware Moore’s law Disks Internal hard disk TB (
Computer Systems Week 10: File Organisation Alma Whitfield.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
Swapan Deoghuria Scientist-II, Computer Centre Indian Association for the Cultivation of Science Kolkata , INDIA URL:
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Serenate1 Non-standard users: The Library Raf Dekeyser K.U.Leuven.
Quality Attributes of Web Software Applications – Jeff Offutt By Julia Erdman SE 510 October 8, 2003.
Living in a Digital World Discovering Computers Fundamentals, 2010 Edition.
Johannes Spitzbart Phonogrammarchiv, Austrian Academy of Sciences Österreichische Tage der Digitalen Geisteswissenschaften save the data - workshop on.
Introduction Chapter 1. 1 History of Computers Development of computers began with many early inventions: The abacus helped early societies perform computations.
Data Dictionaries Some Definitions And Examples. From Webopedia.com In database management systems, a file that defines the basic organization of a database.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Chapter 1 The Big Picture.
Chapter 16 The World Wide Web. 2 The Web is an infrastructure of distributed information combined with software that uses networks as a vehicle to exchange.
Open Access to Grey Literature: Challenges and Opportunities in India By Dr. Manorama Tripathi Prof. H. N. Prasad Banaras Hindu University, Varanasi. Mr.
1.8History of Java Java –Based on C and C++ –Originally developed in early 1991 for intelligent consumer electronic devices Market did not develop, project.
Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.
Digital Archiving in the Hungarian Széchényi Library The story and the plans of the Hungarian Electronic Library Rome, 21. Oct István Moldován OSZK,
Digitizing Aloha: Using Information Technology to Preserve and Present the History and Culture of Hawai'i Bob Schwarzwalder Assistant University Librarian,
AHDS Digitisation Workshop University of Edinburgh 3rd April 2003.
MTA SZTAKI Department of Distributed Systems The problems of persistent identifiers in the context of the National Digital Data Archives of Hungary András.
Russ Hobby Program Manager Internet2 Cyberinfrastructure Architect UC Davis.
4 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Computer Software Chapter 4.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Own research related to workshop Can we produce “knowledge maps” to locate and find (scientific) works across collections, time and space?
The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National library of The Netherlands.
Examples for Open Access Scholar Electronic Repository by New Bulgarian University IP LibCMASS Sofia 2011 Contract № 2011-ERA-IP-7 Sofia, September,
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
Database Systems Lecture 1. In this Lecture Course Information Databases and Database Systems Some History The Relational Model.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
MetaPlus Klas Blomqvist Statistics Sweden Research and Development – Central Methods
Digitization & Digital Preservation
CombeDay Making Data Openly Available Simon Coles.
Knowledge Support for Modeling and Simulation Michal Ševčenko Czech Technical University in Prague.
 A content management system ( CMS ) is a system providing a collection of procedures used to manage work flow in a collaborative environment. These.
CLASS Metadata and Remote Sensing Extensions CLASS Data Provider’s Conference September 2005 Anna Milan, Ted.Habermann,
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
1 Geospatial Standards for Canada Proposed blueprint for Jean Brodeur and Cindy Mitchell.
Indico – CERN-UNOG meeting – 28 Feb CERN – IT 1 INDICO Event Management and Archival Thomas Baron CERN-UNOG Meeting 28 th February 2012.
Writing a successful data management plan Kathleen Fear October 17, 2013.
Joseph JaJa, Mike Smorul, and Sangchul Song
DIGITAL LIBRARY.
Implementing an Institutional Repository: Part II
Enabling direct data access to social science research data
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Overview of Computer system
Presentation transcript:

Dirk Roorda, coordinator infrastructure

Overview Part 1: The rising role of data Part 2: The free use of data Part 3: The care for data Part 4: The re-use of data

Part 1: The rising role of data Internet size (May 2009): 500 EB PB 500 million TB 500 million fat USB disks 500 billion memory cards of 1 GB 70 memory cards per person

Data deluge

Where does it come from? Instruments satellites, sensors, dna-sequencing Records administrations, censuses, surveys Digitisation the analog legacy Hobby pictures, movies, genealogy Integration better interoperability of existing data

The driving force Information and Communication Technology Babbage Analytical Engine 1870

A datacenter Genealogy 2,5 PB 5328 servers 1,12 MW

A closer look Linguistics text corpora, automatic translation Philology how to read a million books? History historical census data Archeology archive law, commercial research

Linguistics and Philology A chronometric approach to Indian alchemical literature Assessing frequency changes in multistage diachronic corpora Evaluating methods for computer- assisted stemmatology using artificial benchmark data sets A Corpus Study of the Rigveda Dictionary generation for less- frequent language pairs using WordNet An exercise in non-ideal authorship attribution: the mysterious Maria Ward

History

Archaeology

Archaeology (2)

Part 2: The free use of Data

Open Access Data is information Information is knowledge Knowledge is power Why share it?

Open Access Shared knowledge is double knowledge Without free sharing of knowledge, scientific progress will halt Tensions between sharing and not sharing remain, though

A good Example

Work to do organise your data let your data work together with those of others (colleagues, future scientists, the public) ask new questions to the data because there is so much of it create new (virtual) data collections

Part 3: The care for data

Research Data Recycling existing data collecting by experiments, surveys primary research data verifying results by others preserving unique data from experiments compilation, aggregation, annotation databanks data mining, analysis, visualisation new data as research input

Challenge: Software Operating system (DOS, Windows 95,...) Programming Languages (Basic, Pascal) File formats (Word Perfect, dBase) Applications (Addressbook, Websites) Old data may be locked up in old software.

Meeting the challenge To prevent the problem in the future Backward compatibility Open Standards Open Source Applications Modular software engineering keep data separated from interface and business logic To remedy the problems of the past Emulation Migration

Challenge: Human organisation Forgotten jargon Forgotten knowledge No metadata Websites with broken links

Jargon II.17. Posterior berry aneurysm with subarachnoid bleed. II.18. Subarachnoid bleed with extension into the ventricles. II.19. Ruptured berry aneurysm at the end of the internal carotid artery, with obstructive hydrocephalus. Morgagni found the rupture. II.22. Subarachnoid hemorrhage.

Meeting the challenge Persistent Identifiers Enough Metadata Codification of knowledge and practices Wikipedia Datamanagement early on

Part 4: The re-use of data

Data management Use common infrastructure rather than private means Use open formats rather than proprietary formats Use open source software rather than closed software Use standard ways of documenting data taxonomies, ontologies, metadata schemes

Common Infrastructure Local file shares University repository DANS European Infrastructures

DANS

EASY

Dataset

Datafiles

Metadata

linguists make their technology accessible - resourcesalgorithmstechniques humanities and social sciences - they are the target users

Geleerdenbrieven = Circulation of Knowledge Archiving = circulation of information

Keep imagining