Life Cycle Models & Principles Jake Carlson Associate Professor of Library Science Data Services Specialist Purdue University Libraries
What will be Covered An introduction to terms and concepts relating to data lifecycles. An understanding of the purpose of lifecycle models. Coverage of some life cycle models and principles how they may relate to each other. An introduction to ICPSR’s lifecycle model, as a loose framework for this workshop.
Data Science “Data science enables the creation of data products.” “We're increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.” – Loukides, M. (2011) What is Data Science? /2010/06/what-is- data-science.html /2010/06/what-is- data-science.html
Data Curation “…the active and on-going management of data through its lifecycle of interest and usefulness to scholarly and educational activities.” - UIUC GSLIS “… the value-added activities and features that stewards of content engage in to make the content useful.” - Nancy McGovern, ICPSR “…the active and on-going management of data through its lifecycle of interest and usefulness to scholarly and educational activities.” - UIUC GSLIS “… the value-added activities and features that stewards of content engage in to make the content useful.” - Nancy McGovern, ICPSR
What is a Lifecycle? The continuous sequence of changes undergone by an organism from one primary form, as a gamete, to the development of the same form again. Graphic:
Data Lifecycles Primer on Data Management DataONE_BP_Primer_ pdf
Why Use Life Cycle Models? Helps define and explain complex processes (graphically). Help to identify important components, roles, responsibilities, milestones, etc. Demonstrate connections and relationships between parts and the whole. Provide a framework to develop services and support.
Limitations of Lifecycle Models “All models are wrong, but some are useful” George E.P. Box, Statistician, 1976 – Models generally reflect the interests, perspectives (and biases) of the agencies that created them. – Models mask complexity. – Models tend to overlook heterogeneity / diversity. – Models are often presented as orderly and linear. – Models depict the ideal.
Aspects of Lifecycle Models Subject Based – Scholarly Communication – Research – Data – Curation Source Based – Individual – Organizational – Community
Scholarly Communication Lifecycles
Gettysburg College Library Graphic: uides/scientific_information/index.dot uides/scientific_information/index.dot
Research Lifecycles Loughborough University Library (UK) Graphic:
Scholarly Communication Lifecycles Microsoft Research Graphic:
Research Lifecycle: Project The Research360 Project will develop technical and human infrastructure for research data management at the University of Bath… Focus in particular on issues and challenges that arise from private sector partnerships and research collaborations; rch360/about/
Research Lifecycles: Specialized Cross- Cultural Surveys Institute of Social Research Graphic:
Research Lifecycle: Funding Wayne State University, Division of Research Graphic:
Connecting Research & Data Lifecycles “How JISC is Helping Researchers” chelp.aspx
Data Lifecycles Chuck Humphrey (2006) “e-Science and the lifecycles of Research
A Data Curation Profile contains: Information about an individual data set, including it’s data lifecycle. Current management practice. Unmet needs.
Individual Data Lifecycles are Unique
Individual Data Lifecycles can be Complex
Data Lifecycle Model: UVA Data Mining Data Curation & Preservation Publication Rights & Restrictions DMP Consulting Grant Writing & Planning DM Planning Metadata & Documentation Data Processing HPC/Visualization Tool Development Data Storage Data Search Image: University of Virginia Libraries Scientific Data Consulting Group:
Data Lifecycle Model for ICPSR 1.Proposal and Planning 2. Project Start Up 3. Data Collection 4. Data Analysis 5. Preparing Data for Sharing 6. Deposit ICPSR’s Guide to Social Science Data Preparation and Archiving: eposit/guide/
Common Elements in Data Lifecycle Collect / Generate Process Analyze Finalize / Summarize for Publication
Curation Lifecycle Neil Beagrie (2004) “The Continuing Access and Digital Preservation Strategy for the UK Joint Information Systems Committee (JISC)” D-Lib Magazine.
Curation Lifecycle: DCC curation-lifecycle-model
OAIS Reference Model: Preservation
ICPSR Pipeline Process management/lifecycle/oais.html
Deposit Inputs – Materials to Deposit: Data Documentation Data Form (Description) Outputs – SIP: Deposited Files Metadata from the Deposit Signed Deposit Form
Ingest Actions: Processing Plan Assign a Study Number Formatting for Access and Preservation Outputs – AIP: Data Documentation Set Up Files Processing History
Archival Storage Actions: Migrations Checking integrity - checksums Making, storing and synching redundant copies at various locations Outputs – Curated AIP
Data Management Actions: Populating, Maintaining, Making the descriptive information accessible Outputs: Compliant Metadata
Access Actions: Data set is indexed, searchable and made available. Outcome – DIP: Data and document files Bibliography file Study description file Terms of use file File Manifest
Common Elements in Curation Lifecycle Deposit / Ingest Storage Document / Describe Discover / Access / Use Manage Preserve
Lifecycle Models & Data Services Need for developing your organizational model – based on community models and informed by individual lifecycles. Need for alignment between data lifecycles and curation lifecycles – informed by research and scholarly communication lifecycles
Alignment Between Lifecycles Proposal Develop ment & DMP Project Start-up Data Collection & File Creation Data Analysis Preparing Data for Sharing Ingest Data Mgmt Archival Access Research Scholarly Communication Access Storage Ingest Storage Archival Storage
Example of Lifecycle Alignment Image: Green, Ann G., and Myron P. Gutmann. (2007). “Building Partnerships Among Social Science Researchers, Institution-based Repositories, and Domain Specific Data Archives.” OCLC Systems and Services: International Digital Library Perspectives, 23: “Building Partnerships Among Social Science Researchers, Institution-based Repositories, and Domain Specific Data Archives.”
Life Cycle Models & Principles Jake Carlson Associate Professor of Library Science Data Services Specialist Purdue University Libraries