Download presentation
Presentation is loading. Please wait.
Published byMiranda Houston Modified over 9 years ago
2
1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics” 10 May – 11 July 2006 M Q Hasan Lecturer/ Statistician UN Statistical Institute for Asia and the Pacific Chiba, Japan Email : hasan@unsiap.or.jp
3
2 Overview Data management Data management planning Data management procedures Data management software Hands on experience References
4
3 Data management and the NSO Data management during production –Individual case Data management after production –Individual case Data management –All case – long term
5
4 Data management Management of data files Management files during analysis Management files afterwards
6
5 Data management Management of data files –Labeling data files –Documentation
7
6 Data management Management files during analysis –Version managements –Subset data –Arrange files in different folder –Index files
8
7 Data management Management files afterwards –Pass them to system administrator for future reference
9
8 DATA MANAGEMENT
10
M Q Hasan, UN-SIAP 9 These will lead to … Production of creditable data Design of robust/ efficient / flexible storage and accessible system Efficient procedure for sharing data with others
11
10 Data management before and during data processing
12
11 Define the relevant aspects of a dataset. Formulate a data preservation strategy. Design an access procedure. During DP Planning :
13
12 File format and file structure Naming files Creation and naming of variables Variable labels Defining the relevant aspects of a dataset
14
13 Chose file structure according to available computing resources and the experience of the data processors. Defining the relevant aspects of a dataset
15
14 Documentation –Provide responsibility to log all processing activities –Problems encounter –How problems are to be solved –Major decision taken Defining the relevant aspects of a dataset
16
15 Can be time consuming. Should contain all information about data, such as, survey method, sample information, time of collection, information about variables, missing values etc. Should start well before actual data processing. Follow standards. Preferably one file with reference to other files. DP : Documentation
17
16 Title: Child labour in Portugal: Social characterization of school-age children and their families, 1998. Subtitle : Child labour in Portugal, 1998. Alternative title : SIMPOC Portugal survey, 1998. Parallel title :Trabalho Infantil em Portugal: Caracterização social dos menores emidade escolar e suas famílias, 1998 files. DP : Documentation
18
17 Keywords. National survey, child, economic activity, child labour, household, household chores etc. Abstract. Purpose, nature, and scope of the child labour data collection. Special characteristics of the contents etc. Time period covered. If the data was collected in 1999, and one question was “did you work last year?”, The time period should be 1998-99. DP : Documentation
19
18 Date of collection. Date(s) when the data were collected. Country. Name of the country where the survey was conducted. Geographic coverage. Total geographic scope of the data. Geographic unit. Lowest level of geographic aggregation covered by the data—for example province, state, or district. Unit of analysis. For most child labour surveys, the basic unit of analysis or observation is the individual person. DP : Documentation
20
19 Time method. Panel, cross-sectional, trend, and time-series etc. Data collector. Responsible for administering the questionnaire or interview or for compiling the data. E.G NSO. Frequency of data collection. For example, in first-time. Sampling procedure. Reference to sampling documents. DP : Documentation
21
20 Mode of data collection. CAPI, CATI etc. Type of research instrument. Structured, semi- structured, open-ended questions etc. Actions to minimize losses. E.G follow-up visits, supervisory checks, historical matching etc. Control operations. Methods used to facilitate data control. DP : Documentation
22
21 Weighting. Reference to appropriate document. Cleaning operation. E.g consistency checking, wild code checking, etc. Response rate. Percentage of sample members who provided information. Estimates of sampling error. Indication of how precisely one can estimate a population value from a given sample. DP : Documentation
23
22 Location. Say where the data is currently stored (e.g. A national statistics office). Availability status. Provide a statement of data availability. Extent of data. Number of physical files that exist in a dataset. Completeness of dataset. Describe if items of collected information were not included in the data file. DP : Documentation
24
23 Access authority. Contact person or organization that controls access to the data collection. Date use statement. Reference to the terms of use for the data collection, if any. Citation requirement. Specify any text that should be cited in publications based on analysis of the data. DP : Documentation
25
24 File contents. Short description of the file(s). File structure. E.G. Hierarchical, rectangular, or relational etc. Record or record group. Describe the record groupings for hierarchical or relational. Label (of record). Detailed information for each record group. Dimensions (of record). Physical characteristics of the record, such items as number of variables per record, number of cases, etc. DP : Documentation
26
25 Overall case count. Number of cases or observations. Overall variable count. Number of variables. Data format. Delimited format, free format, software dependent, etc. Missing data. Provide information such standardized across the collection, that missing data are the result of merging, etc. Software. Identify the software used to create the file, including the software version number. Version statement. Version statement for the data file. DP : Documentation
27
26 list of variables with followings : –if variable is a weight; and if not reference weight variable for this variable; –question ID for the variable; –which format has been used (e.g. SAS, SPSS); –the number of decimal points in the variable; –whether the options are discrete or continuous which record type this variable belongs to; DP : Documentation
28
27 Usually generated in a package-specific format Convert data into other formats, if possible, Convert data into ASCII and generate codebook Reload ASCII data using same codebook Recheck data Conversion of data files to other formats as required DP
29
28 Possible list/type of files –Data in a package-specific format –Data in ASCII with necessary data dictionary –Public use data –Public use data in ASCII with necessary data dictionary –Final documentation –Questionnaire Storage of all files. DATA MANAGEMENT
30
29 Possible list/type of files contd. –Logical rules for consistency check. –Computer program files. –Interviewer and/or supervisor’s instruction manual. –Coding file/s. –Sampling and weight files. Storage of all files. DATA MANAGEMENT
31
30 Group them considering version, type etc. Create index file associated with each sub- directory. Add short description to each file according to the file contents in the index file. Storage of all files DATA MANAGEMENT
32
31 Hardware Automation software Directory structure Formulating a data preservation strategy DATA MANAGEMENT
33
32 DATA MANAGEMENT
34
33 DATA MANAGEMENT
35
34 DATA MANAGEMENT
36
35 Access policy Safe keeping person : system administrator Contact person : supervisor Content modifying authority : supervisor Finalize access condition to each file Designing an access procedure DATA MANAGEMENT
37
36 Micro data Aggregate tables Executive summary Reports Data type DATA DISSEMINATION
38
37 Online : direct access through internet in real time Off line : available on request Methods DATA DISSEMINATION
39
38 Backup policy During during data processing Data processors responsibility After finalization of data and documentation System administrator’s responsibility Designing an access procedure DATA MANAGEMENT
40
39 END
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.