PRIPRAVA PODATKOV IN DOKUMENTACIJE

Slides:



Advertisements
Similar presentations
DDI for the Uninitiated ACCOLEDS /DLI Training: December 2003 Ernie Boyko Statistics Canada Chuck Humphrey University of Alberta.
Advertisements

DLI Training Nesstar Workshop
Data Documentation Initiative (DDI) Workshop Carol Perry Ernie Boyko April 2005 Kingston Ontario.
RSS. March HB/The Data Archive. The RSS Working Group on Data preservation and sharing: standards for documenting data for preservation and secondary.
Useful tools for ESRC Research Centres
Foundational Objects. Areas of coverage Technical objects Foundational objects Lessons learned from review of Use Case content Simple Study Simple Questionnaire.
Metadata 8/7/2012 Katie Moss Digital Metadata Technician, Digital Library Services
An Leabharlann UCD Órna Roche UCD James Joyce Library Metadata Documenting your data
Documenting the Resource Malcolm Polfreman
Arja Kuula: The DDI and Qualitative data IASSIST2001 Amsterdam, May 2001 Finnish Social Science Data Archive.
Reusable!? Or why DDI 3.0 contains a recycling bin.
Metadata: An Introduction By Wendy Duff October 13, 2001 ECURE.
IASSIST Conference 2006 – Ann Arbor, May Metadata as report and support A case for distinguishing expected from fielded metadata Reto Hadorn S I.
Codebook Centric to Life-Cycle Centric In the beginning….
Data Management: Documentation & Metadata Types of Documentation.
ISO as the metadata standard for Statistics South Africa
Data Documentation Initiative (DDI): Goals and Benefits Mary Vardigan Director, DDI Alliance.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
6. Implications for Analysis: Data Content. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2.
DLI Training April 2004 Kingston Ontario. DDI What, Why, How?
A Metadata Application Profile for the DRIADE Project Sarah Carrier, Jed Dube, Jane Greenberg March 13, 2007 _____________________.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Data documentation and metadata for data archiving and sharing Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009.
BLAISE to DDI Vipavc Irena, ADP, Slovenia CESSDA - Seminar, September, 2004.
Introduction to Metadata, the DDI and the Metadata Editor Presentation to the SERPent project team by Margaret Ward 3 March 2010.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Documentation and Cataloguing in Data.
Metadata management: DDI and Nesstar at the Czech Social Science Data Archive Jindrich Krejci & Yana Leontiyeva Data without Boundaries, Ljubljana 24 &
Metadata Management and Tools August 1, 2013 Data Curation Course.
Gateway to Global Aging Data September 17 th, 2014 APRU Data Workshop Drystan Phillips.
A centre of expertise in digital information managementwww.ukoln.ac.uk DCMI Affiliates: Implications for Institutions Rosemary Russell UKOLN University.
Looking into the future… Providing Social Science Data Services Jim Jacobs.
COMMON COMMUNICATION FORMAT (CCF). Dr.S. Surdarshan Rao Professor Dept. of Library & Information Science Osmania University Hyderbad
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Santi Thompson - Metadata Coordinator Annie Wu - Head, Metadata and Bibliographic Services 2013 TCDL Conference Austin, TX.
HETUS Pilot Group 8 Privacy procedures and ethical issues Kimberly Fisher, Centre for Time Use Research – co-ordinator External consultant Kai Ludwigs.
METADATA ORGANISATION ESDS APPROACHES AND RESOURCES …………………………………………
Ingest – Workflow Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.
TIPI PODATKOV. Načrt Najprej je potrebno dobro premisliti o problemu Katere podatke hranimo, kako podatke razporediti v tabele, kakšne vrste podatkov.
Ingest – Acquisition and deposit Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.
Report Writing Lecturer: Mrs Shadha Abbas جامعة كربلاء كلية العلوم الطبية التطبيقية قسم الصحة البيئية University of Kerbala College of Applied Medical.
Metadata requirements for archiving structured data Alice Born Statistics Canada Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (9-11 April.
Installfest delavnica mag. Aleš Košir Lugos
Metadata standards Using DDI to Inform, Organize, and Drive Survey Data Production.
Division of HIV/AIDS Managing Questionnaire Development for a National HIV Surveillance Survey, Medical Monitoring Project Jennifer L Fagan, Health Scientist/Interview.
Metadata models to support the statistical cycle: IMDB
Professional Development Programme: Design and Development of Institutional Repository Using DSpace Nipul G Shihora INFLIBNET Centre Gandhinagar
Using secondary data.
Active Data Management in Space 20m DG
Data Management: Documentation & Metadata
Cataloging Tips and Tricks
DDI for the Uninitiated
Application of Dublin Core and XML/RDF standards in the KIKERES
Arhiv družboslovnih podatkov:
Binarna logistična regresija
ARHIV DRUŽBOSLOVNIH PODATKOV DRUGI DOSTOPI DO PODATKOV
Uporaba programa NESSTAR za sprotno pregledovanje podatkov
Irena Vipavc Brvar, Sonja Bezjak
Informacijska varnost v Oracle okolju
Updates on the XSLT stylesheets for DDI
Secondary Data Analysis Lec 10
Datasets in CRM Site Proposal
Simona Šabić, Association DrogArt Addictions 2017,
The role of metadata in census data dissemination
Indicator 3.05 Interpret marketing information to test hypotheses and/or to resolve issues.
Data Collection and Experimental Design
STEPS Site Report.
The Role of Metadata in Census Data Dissemination
Presentation transcript:

PRIPRAVA PODATKOV IN DOKUMENTACIJE STRATEGIJA PRIPRAVE PODATKOVNIH DATOTEK in DOKUMENTACIJE Irena Vipavc Brvar Oktober 2013

Dokumentacija Priprava kakovostne tehnične dokumentacije ali tako imenovane kodirne knjige je lahko časovno izredno zahtevna. Tukaj govorimo tako o ustrezni pripravi podatkovne datoteke (mikro podatkov) kot spremljajočega gradiva (metapodatkov) Podrobni vodiči so dostopni na spletnih straneh angleškega (1, 2) in ameriškega arhiva. V nadaljevanju pa sledi nekaj krajših napotkov.

$ € £ + + + + Raziskovalno osebje Primarni raziskovalec Sodelujoči <DDI 3.0> Questions Instrument <DDI 3.0> Variables Physical Stores + <DDI 3.0> Purpose Concepts Universe Geography People/Orgs <DDI 3.0> Funding Revisions + + + <DDI 3.0> Data Collection Data Processing $ € £ Data Arhiv/ Repozitorij Podan predlog projekta Publikacije 3

KAJ HRANITI ? KAKO HRANITI ? STRATEGIJA KAJ HRANITI ? KAKO HRANITI ? Pomembno: Pripravi varnostne kopije pomembnih datotek Nudi ustrezno pretvorbo formatov - Zagotovi avtentičnost in kontroliran dostop do hranjenih in delovnih datotek - Zagotovi kontrolo verzij - Nudi primerni pomnilnik - Zagotovi varnost Kaj hraniti: podatke, spremljajočo dokumentacijo, informacije o vzorčenju,... podatke, ki se lahko zgubijo. Spremljajoča dokumentacija naj vsebuje informacije kot izvor podatkov; kaj je bil osnovni namen zbiranja; kdo so bili avtorji in naročniki oz. sponzorji; kako so bili podatki zbrani; kakšni so pravni pogoji uporabe podatkov; opis spremenljivk; kako so bili podatki združeni – kodirna shema; v kakšnem formatu je hranjena računalniško berljiva podatkovna datoteka; na katerem mediju je hranjena.......

PRIPRAVA PODATKOVNE DATOTEKE 1 Tvorimo vnašalnik, ki bo omogočal preverjanje vrednosti vnosov. Programsko izvozimo celotne opise spremenljivk iz vprašalnika v podatkovno datoteko (se izognemo številnim napakam, ki bi nastala pri ročnem vnosu) Preverimo kvaliteto vnosa (naključnih 10%, prvih in zadnjih 5%) Ločimo osebe, ki kodirajo z osebami, ki vnašajo podatke. Zahtevna kodiranja (poklicne kode) opravijo izkušeni posamezniki. Kar lahko naredi računalnik ne dejmo sami.

PRIPRAVA PODATKOVNE DATOTEKE 2 Imena spremenljivk Opisi spremenljivk Združevanje skupin spremenljivk Kode in kodiranje Manjkajoče vrednost Variable Names Devise standard variable names and choose methods of construction and length of variable names. Variable Labels Devise labels to include item or questions number, indication of variable content, and whether constructed or derived from other items. Variable Groups Variable groups and corresponding variable group lists in the codebook are an effective way of organising a dataset and are especially recommended if a collection contains a large number of variables. Codes and Coding Codes should assure that all statistical software packages will be able to handle the data and should promote greater measurement comparability. Principles are available in various guides for most coding situations (see, for example, the ICPSR Guide (pp. 8-10)). Missing Data Careful planning on methods for handling and identifying missing data should be made at the very start, to ensure that imputation and missing data handling will be possible during analysis.

Pomembno je razlikovati med različnimi oblikami manjkajočih vrednosti: MANJKAJOČE VREDNOSTI Če je le mogoče ne uporabljamo praznih polj ampak definiramo vrednosti. Le v primeru če manjka večina odgovorov pri enoti/ ali odgovor na spremenljivko v ponovljivi raziskavi. Pomembno je razlikovati med različnimi oblikami manjkajočih vrednosti: Zavnitev /neodgovor Ne vem Napaka v postopku Neustrezen Imputacija manjkajočih vrednosti! (original + izpeljana) Koda za manjkajoče vrednosti (97,98,99 / -1, -9) – problem s št. mest . Refusal/no answer. The subject explicitly refused to answer the question or did not answer the question when he or she should have. 2. Don’t know. The subject was unable to answer the question, either because he or she had no opinion or because the required information was not available (e.g., a respondent could not provide family income in dollars for the previous year). 3. Processing error. For some reason, there is no answer to the question, although the subject provided one. This can result from interviewer error, incorrect coding, machine failure, or other problems. 4. Not applicable. The subject was never asked the question for one reason or another. Sometimes this results from “skip patterns” that occur, for example, when subjects who are not working are not asked questions about job characteristics. Other examples are sets of items asked only of random subsamples and items asked of one member of a household but not another. 5. No match. This situation may arise when data are drawn from different sources (for example, a survey questionnaire and an administrative database), and information from one source cannot be located.

ČIŠČENJE PODAKTOVNE DATOTEKE Uporaba sintakse! – sledljivost sprememb GET FILE='pb1101.sav_vaja' /drop= [seznam spremenljivk, ki jih takoj na začetku izločim iz datoteke]. MISSING VALUES q1 q2 q481b w q695b q695a q702 q696 q704 (3). VARIABLE LABELS omr 'Omrezna skupina'. VALUE LABELS omr 1 '01 - Ljubljana' 2 '02 - Maribor, Murska Sobota, Ravne na Koroskem' 3 '03 - Celje, Trbovlje' 4 '04 - Kranj' 5 '05 - Koper, Nova Gorica, Postojna' 7 '07 - Novo Mesto, Krsko'. EXECUTE . format wskup (F1.0). COMPUTE anketa = 1 . COMPUTE anketa = DATE.DMY(15,1,2004) . formats anketa(moyr8). SAVE OUTFILE='\pb0501.sav' keep= [seznam spremenljivk, ki jih ohranim; v vrstenm redu kot ga želim v datoteki] . Variable Names Devise standard variable names and choose methods of construction and length of variable names. Variable Labels Devise labels to include item or questions number, indication of variable content, and whether constructed or derived from other items. Variable Groups Variable groups and corresponding variable group lists in the codebook are an effective way of organising a dataset and are especially recommended if a collection contains a large number of variables. Codes and Coding Codes should assure that all statistical software packages will be able to handle the data and should promote greater measurement comparability. Principles are available in various guides for most coding situations (see, for example, the ICPSR Guide (pp. 8-10)). Missing Data Careful planning on methods for handling and identifying missing data should be made at the very start, to ensure that imputation and missing data handling will be possible during analysis.

VERZIRANJE IN AVTENTIFIKACIJA Odločimo se koliko in katere verzije bomo hranili, kako dolgo in kako naj organiziramo verziranje. Enolično poimenovanje z uporabo sistematičnega principa. Hranimo verzijo in status datoteke, npr osnutek, vmesni dokument, končna verzija, interni dokument. Dokumentiramo spremembe med verzijami datotek. Dokumentiramo povezavo med členi ali datotekami, kadar je to potrebno. Dokument o lokacijah hrambe datotek, kadar so te na različnih mestih. Redno sinhroniziramo datoteke na različnih lokacijah. Ohranimo eno glavno datoteko v ustreznem formatu, da se izognemo problemom paralelnega dela na datoteki. Indetificiramo eno lokacijo za hrambo glavnih in mejnih verzij. Točka 2: uniquely identify files using a systematic naming convention.

Koristi doslednega označevanje datotek, so: Podatkovne datoteke so razlikujejo med seboj. -Ustrezno poimenovanje datotek preprečuje zmedo, ko več ljudi dela na skupnih datotekah. -Podatkovne datoteke je lažje iskati. -Podatkovne datoteke lahko prikliče ne le ustvarjalec, ampak tudi drugi uporabniki. -Podatkovne datoteke lahko razporedimo v logično zaporedje. -Podatkovnih datotek ni mogoče pomotoma prepisati ali izbrisati. -Mogoče je prepoznati različne različice datotek. -Če se podatkovne datoteke preseli v drugo platformo za shranjevanje bodo njihova imena ohranila koristne informacije.

Koristi doslednega označevanje datotek, so: Pri imenovanju in označevanju datotek raziskav je potrebno misliti predvsem na naslednje tri kriterije: • Organizacija - pomembno za prihodnji dostop do gradiv in nj. priklic • Okvir - to lahko vključuje vsebinsko specifične ali opisne informacije, neodvisno od kje so podatki shranjeni • Doslednost - izberite poimenovanje in zagotovite, da se pravila sledijo in da se sistematično vključujejo iste informacije (kot so datum in čas), v istem vrstnem redu (npr. DDMMLLLL).

DOKUMENTACIJA - Metapodatki DOKUMENTIRAJ PODATKE Z UPORABO STRUKTIRANIH OBLIK Večina CESSDA arhivov danes spodbuja pripravo strukturirane dokumentacije v standardu DDI. Tako zapisane informacije omogočajo enostaven izpis v več oblikah in prenos med različnimi institucijami, ki dokumentacijo uporabljajo v nadaljnjem procesu. Dokumentacijo poimenujmo Metapodatki. Metapodatke lahko definiramo kot vse informacije potrebne za obveščanje in procesiranje statističnih struktur (Grossmann v Vipavc in Klep, 2003).

DDI (Data Documentation Initiative ) STANDARDI DDI (Data Documentation Initiative ) MARC, Dublin Core (bibliographic standards) SDMX (Statistical Data and Metadata Exchange) ISO 11179 (Metadata Registries) FGDC (Digital Geospatial Metadata) ISO 19115 (Geographic Information Metadata) PREMIS (Preservation Metadata), METS (Metadata Encoding and Transmission)

Enostaven Dublin Core (15 elementov) Title Format Creator Identifier Subject Source Description Language Publisher Relation Contributor Coverage Date Rights Type

OSNOVNA RAZDELITEV DDI ELEMENTOV DDI vključuje sledeče pomembne elemente Opis dokumenta Opis raziskave Opis podatkovne datoteke Opis spremenljivke Opis dodatnih in zunanjih, povezanih gradiv

DDI

DDI

DDI 3.0 Life Cycle Orientation DDI 3.0 documents all stages in the life cycle of a data collection: pre-production production post-production secondary use DDI 2.0 new research effort

Kontrolni seznam ravnanja s podatki Ste pridobili pismeno privolitev? Ste prepričani o tem kdo je lastnik vaših podatkov? Uporabljate standardizirane in konsistentne merske enote in postopke za pridobivanje in preverjanje podatkov? Ste sekundarnim uporabnikom priskrbeli zadostne informacije o metodologiji raziskave, pridobivanju in obdelavi podatkov? So podatki in spremljajoče datoteke označene ustrezno? Ali uporabljate priporočene in sodobne datotečne formate za shranjevanje podatkov? Ali je potrebno vaše podatke anonimizirati? Ali so kopije podatkov – digitalne in fizične – shranjene na varni lokaciji? Ste naredili varnostno kopijo datotek? Ali veste katera različica datoteke s podatki je originalna? Ste vključili tudi informacije za analizo, npr. izpeljane spremenljivke ali povzetke intervjujev?

Reference in predlogi za branje van den Eynden, V., Corti, L., Woollard, M., Bishop, L. and Horton, L. (2011). Managing Research Data:. Best Practice for Researchers. Colchester: UK Data Archive, University of Essex. [www.data-archive.ac.uk/media/2894/managingsharing.pdf] Borgman, Christine L. (2010). Research Data: Who will share what, with whome, when, and why? China-North Americal Library Conference. http://works.bepress.com/borgman/238/] National Science Foundation (2010). Data Management & Sharing FAQ. NSF. [http://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp] Australian Government, National Health and Medical Research Council, Australian Reseach Council (2007): Australian Code for the Responsible Conduct of Research [http://www.nhmrc.gov.au/_files_nhmrc/publications/attachments/r39.pdf] Digital Curation Centre – DCC(2010). Data management plans. [http://www.dcc.ac.uk/resources/data-management-plans] Jones, Sarah (2011). Develop a Data Management and Sharing Plan. Digital Curation Centre. [http://www.dcc.ac.uk/resources/how-guides/develop-data-plan] MIT Libraries (2010). Data management and publishing: organizing your files. [http://libraries.mit.edu/guides/subjects/data-management/organizing.html]