Download presentation
Presentation is loading. Please wait.
Published byElfreda Newton Modified over 9 years ago
1
Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: http://www.adp.fdv.uni-lj.si MOST (UNESCO) and GESIS workshop, Berlin, 22-24 February 2002
3
Topics of a presentation A brief history of technical standards and its influence on Data Archives organisation The adoption of DDI in 1999 Advantages and disadvantages of using existent but still emerging standard What are XML and DDI? Q uick look inside DDI DTD document structure DDI XML Codebooks production line in ADP Discussion
4
A brief history of data archives technical standards (Tannenbaum, Taylor 1990) Late 1950s – IBM cards Easily reproduced, recycled – the advent of DA 1960s – electronic computers – end of storage standards A task of data conversion and interchange – DA matured
5
Beginning of the www era in early 90s (DDI Committee, 2001) CSSDA electronic codebook specification OSIRIS Codebook Dictionary (SRC,ICPSR) Standard study description But lack of coordination resulted in noncompatible catalogues
6
“Midwife function” (Scheuch, 1990) A role of ZA in late 1960 when 5 new archives were established in Europe: “offers to share experiences, especially of past errors” “technical information on data storage and retrieval”
7
Situation in 1997 when ADP establishes “Multiplicity of classificatory languages, search techniques and standards for documenting data” (DDI Committee, 2001) Every organisation adopt its own dialect of existing standards A CESSDA IDC functioned as a lone example of still living integrating efforts
8
But... DDI was under discussion March 1999 – DDI Beta version became operable ADP applied for a grant which secured a six-month long intensive learning and practise of its own XML codebooks production Results: 1. Successful implementation of first ten XML codebooks 2. Enhancing a production line for a routine codebook production.
9
2000 - 2001 Preparation of our own XSL for XML Codebook presentation on the internet March 2000 –DDI DTD Version 1.0 was published Machine conversion of DDI DTD Beta XML Codebooks into 1.0 version Continuing production of XML Codebooks
10
NESSTAR Meanwhile a parallel refinement of NESSTAR tool was developing, which promises to add functionality to a growing collection of XML codebooks End of 2001 – a configuration of ADP NESSTAR server catalogue
12
Advantages and disadvantages of using existent but still emerging standard There is no need for (re)inventing a local catalogue rules Cooperation in document production (sharing documents between sites) A danger of staying alone if others will not adopt the same standard Less capability to add specific emphasis according to local needs
13
+/ - Use of existing and emerging software tools suitable for the standard environment Virtual catalogue Conversion tools from SPSS and CAI software files Dependency on others timetable in dynamic of tools production E.g. NESSTAR was late in full adoption of UTF-8 convention which was crucial fur us
14
What is xml? “XML is to a document’s intellectual content what HTML is to the physical structure of that document” (Thomas, Bloc 2001)
15
Why XML? XML can be accomplished without professional or expert knowledge (user- friendly) It is ready for preparing a multiple format presentations, e.g. printed book, internet etc. It can be filled by different authors - each with specialist knowledge of its subject area. All obey the same content structure.
16
DDI DTD <> XML? DTD= xml Document type definition DDI DTD = a special Data documentation initiative XML Codebook definition A Codebook xml document must be “well-formed” and “valid”
17
Well-formed Any XML document, e.g. HTML, can be well-formed – in accordance with the XML syntax Main features: must be closed Sensitive “UPPER–lower” case naming Only one per document
18
Valid = Well formed + Conforms to a specific DTD Example: an underlined path calls......
19
... a file "CONFIG10/CODEBOOK-EN.DTD“> ( Content of a file):... <!ELEMENT codeBook (docDscr*, stdyDscr+, fileDscr*, dataDscr*, otherMat*) >...
20
What does it all mean? You do not have to look in the “machine-readable” “codebook.DTD” file to fill-in a.XML Codebook: A XML editor helps to check well-formedness and document validity It helps choosing appropriate elements in accordance with the DTD while editing A “human-readable” Tag Library consists of element definition with practical examples. It gives you guidance on type and form of information
21
Let’s look Inside DDI DTD document structure...
22
Integrates different levels of information in a same document docDscr (XML document and sources description) stdyDscr (Overall study + stdy level references) fileDscr (Physical data files) dataDscr (variables) othMat (additional material for variables documentation)
23
It specifies both... The content of catalogue - suitable as input to virtual catalogue of different sites, produced on various platforms. The content of codebook (variables description) – suitable as input to “virtual library of all individual measurements in the studies in a collection”
24
A dilemma of Library vs. Data service concept (Scheuch, 1990 The unit of storage is “study” The unit of storage is the variable
25
In a DDI DTD XML codebook you can integrate meta-information about... Intellectual content of a study Its scope Methodological details Retrieval and dissemination policies File location and format
26
(+) References to accompanying documents, e.g. Reports on methodology, Publications, Classifications lists, Questionnaires and similar, Computer syntax files, Tables of results, etc.
27
(+) Hyperlink cross-references inside and outside document The use of ID and IDRefs attributes The use of URI attributes
28
To sum up: XML is similar to HTML in that it is: Easy to use, Broadly accessible, Hyper-textual In addition it has: Computer&human readable and understandable structure of document content
29
DDI XML Codebooks production line in ADP First step: 1. Basic information about new data set file, depositor, and accompanying material is first entered in ADP Inventory book (ACCESS Data base) 2. After choosing best suited predefined XML DDI Codebook template we extract the information from ACCESS data base to the draft XML Codebook 3. A resulting codebook is moved to an Internet catalogue for quick info about new study, viewing is supported by referenced XSL through IE5 or better.
30
Second step: Full Study description 1. A depositor is requested to fill a MS Word form, containing elements corresponding to DDI DTD study description section 2. A draft XML Codebook from previous step is edited with XMetaL® XML editor. Missing peaces of information are added manually
31
Third step: Codebook Data description generated from SPSS data file 1. Final SPSS data file, if fully labelled, is converted with the NSD XML Generator ® to an XML data description section of DDI Codebook and integrated with previous study description
32
Step 4: Codebook Data description with full questions text 1. For most important data sets full questions text is entered into dD section from original questionnaire text file or 2. by using a conversion tool from CAI computer readable files to a DDI XML files.
33
Finally NESSTAR ® Final two documents, Slovene and English language DDI XML Codebooks, are converted into a NESSTAR complaint format and together with the data file published into a NESSTAR catalogue.
34
Codebook.xsl Original paper documents Free-text documents Codebook.xml (XML Editor) Computer readable Human + computer readable Human readable IE explorer view Printed codebook NESSTAR Catalogue + Data Explorer SPSS data + labels, CAI quest. docDscr stdyDscr fileDscr dataDscr othMat... Coversion Tools stdyDscr form filled-in by depositor Code- book.dtd Tag Library
35
Common issues in DDI XML codebooks production 1. XML editors does not necessarily support UNICODE 2. The use of entities in XML document helps to standardise document production, makes it faster and easier to translate into English
36
Conclusions: DDI DTD receive growing attention in a community which guaranty production of new tools for enhancing its use Despite continuing developments and overlapping archival standards, DDI 1.0 as today’s technology promises the longevity of XML Codebook 1.0 documents Slovene ADP have taken the experience with DDI for guidance of its organisation.
37
Main references DDI Committee (2001): The Data Documentation Initiative (DDI) Version 1.1: The New Specification for Social Science Metadata. Project Description. Data Documentation Initiative. A Project of a Social Science Community. (2002) http://www.icpsr.umich.edu/DDIhttp://www.icpsr.umich.edu/DDI Scheuch, Erwin K. (1990): From a data archive to an infrastructure for the social sciences. International Social Science Journal, No. 123, pp. 93-111. Tanenbaum, Eric and Marcia Taylor (1990): Developing social science archives. International Social Science Journal, No. 124?. Thomas, Wendy L. And William C. Block (2001): An Introduction to the Data Documentation Initiative (DDI). ICPSR OR Meeting 2001. http://www.icpsr.umich.edu/DDI/PAPERS/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.