Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experiences of managing Birth Cohort Data at CLS Jon Johnson (Senior Database Manager) Sub-brand to go here CLS is an ESRC Resource Centre based at the.

Similar presentations


Presentation on theme: "Experiences of managing Birth Cohort Data at CLS Jon Johnson (Senior Database Manager) Sub-brand to go here CLS is an ESRC Resource Centre based at the."— Presentation transcript:

1 Experiences of managing Birth Cohort Data at CLS Jon Johnson (Senior Database Manager) Sub-brand to go here CLS is an ESRC Resource Centre based at the Institute of Education

2 2 Contents 1 Introduction 2(Pre) History 3Centralised Computing 4Semi-centralised computing 5Personal Computing 6Consequences 7Survey Data ‘production line’ 8Requirements 9Potential Database strategies 10Staffing and skills

3 3 Introduction CLS has been an ESRC Resource Centre since 2005. We are responsible for three of the four British Birth Cohort studies NCDS (1958) BCS70 (1970) MCS (2000) NSHD (1946) is funded by MRC at UCL. www.cls.ioe.ac.uk

4 4 (Pre) History NCDS has its origins in the Perinatal Mortality Survey. Sponsored by the National Birthday Trust Fund, this was designed to examine the social and obstetric factors associated with stillbirth and death in early infancy among the children born in Great Britain in that one week. This was a ‘follow-up’ to the 1946 study with a similar scope. BCS70 began as the British Births Survey (BBS), and it was sponsored by the National Birthday Trust Fund in association with the Royal College of Obstetricians and Gynaecologists to follow up the 1958 study. MCS was the specifically designed as a longitudinal survey to follow up upon the three previous birth surveys. www.cls.ioe.ac.uk

5 5 Centralised Computing www.cls.ioe.ac.uk “If one had coded and tried to use all the information received from the 68 questions it is calculated that the results could have been expressed in a vast number of permutations probably in the region of 10 to 480 th power” Perinatal Mortality (1963) Four years after the data collection, the tabulations were eventually finalised. Things got faster... “The first batch of coded forms were sent for punching in October 1970... 113,994 punch cards there being a minimum of 6 cards per case. The punching was completed in November 1971” Researchers were reliant on the DP and computer professionals to generate tabulations.

6 6 Semi-Centralised Computing In the mid-1970’s, as at first SPSS and then other statistical packages became available. Researchers had the opportunity to use the data prepared and marshalled by the DP and computer scientists to analyse the data themselves using the central computer. Most users still relied on computer professionals to retrieve and tabulate data. www.cls.ioe.ac.uk

7 7 Personal Computing (c1984) With a powerful 386 computer on your desk and a copy of SPSS researchers could take the raw data and manipulate it for their own purposes. By the mid 1990’s this process had accelerated to the position where all the data from a survey could be easily handled on a single machine and the need for database professionals could be circumvented. www.cls.ioe.ac.uk

8 8 Consequences A study became snapshots of each survey making its value as a longitudinal resource cumbersome and inefficient to manage Data fragmentation as derivations became disconnected from original data Longitudinal linkage discrepancies e.g. Partnership, fertility histories Coding frame discrepancies Data security moved from IT to individuals Meta data was viewed as being separate from data With the introduction of dependent interviewing these problems would be further increased. www.cls.ioe.ac.uk

9 9 Survey Data ‘production line’ www.cls.ioe.ac.uk

10 10 Requirement Migrate and restructure the data back into a database to restore integrity and clean discrepancies Re-derive variables Integration of meta-data into data Create longitudinal checking algorithms Ability to manipulate data in-situ Log of changes and version control www.cls.ioe.ac.uk

11 11 Potential database strategies www.cls.ioe.ac.uk

12 12 Staffing and Skills At CLS we chose use SIR as our main database and SQL for holding metadata (DDI 2.0 model) Existing SIR experience Easy to cross-train from SPSS Migration of data from SPSS is straight-forward Security very configurable Version control and change log easy to implement Derivations, manipulations done in one place 3 FTE (mix of skills, data management, DBA) www.cls.ioe.ac.uk

13 13 Any questions? Institute of Education University of London 20 Bedford Way London WC1H 0AL Tel +44 (0)20 7612 6000 Fax +44 (0)20 7612 6126 Email info@ioe.ac.uk Web www.ioe.ac.uk


Download ppt "Experiences of managing Birth Cohort Data at CLS Jon Johnson (Senior Database Manager) Sub-brand to go here CLS is an ESRC Resource Centre based at the."

Similar presentations


Ads by Google