IASSIST 2007 Montreal, Quebec Data Preservation Alliance for the Social Sciences: A Model for Collaboration IASSIST 2007 Montreal, Quebec
The Odum Institute The Odum Institute is the oldest Institute at America’s First Public University Established in 1924, Odum is a multidisciplinary institute for research in the social sciences. The Institute maintains one of the country’s largest archive of computer-readable social science data. Holdings include national and international economic, electoral, demographic, financial, health, public opinion, and other types of data to meet a variety of research and teaching needs. 12/8/2018
Odum Data-PASS Focus Areas Virtual Data Center Adoption National Network of State Polls (NNSP) Harris Polls Private Research Organizations Distributed Storage 12/8/2018
Partnering with Harvard MIT Data Center in adopting VDC Spires/OpenText > DDI/XML Coordinated Question Level Search Modification Currently testing next generation called Dataverse Network 12/8/2018
National Network of State Polls & Harris Polls Working with polling agencies to close gaps in historical collection Assisting NNSP members with the ingest process Building on existing relationships Partnering with Harris Interactive to find missing surveys in the Louis Harris Data Center at the Odum Institute 12/8/2018
Private Research Organizations The late-1940s and 1950s witnessed the rise of private organizations and firms that deal almost exclusively in the production and analysis of information, knowledge, and public policy. These organizations are potentially a major source of social science research on important public policy issues. Organizations such as Research Triangle Institute (RTI International), the National Opinion Research Center (NORC), Westat, and ABT Associates have played important roles in the advancement of scientific research in the social sciences. 12/8/2018
Saving the Kennedy Assassination Study Following 9/11 NORC researchers wanted to replicate questions after the 1963 assassination study Data could not be found until the old card catalog was found and pointed to holdings of boxes in storage After six weeks, ten boxes of punched cards were retrieved Cards were hand-delivered to a New York firm Card reader was refurbished Data/documentation needed to be interpreted Multi-punched Single-punched conversion With persistent effort a clean data file emerged and was archived Example of the “Data Rescue” Process Identify Just after the tragic events of September 11, 2001, Tom Smith was reminded of the study NORC completed just after another American tragedy, the John F. Kennedy assassination. He remembered the detailed questions gauging how Americans were coping in 1963 and wondered how those strategies compared to this current situation. He and his colleagues set out to replicate these questions in the National Tragedy Study in 2001. First, he needed that early survey. Locate The archives didn’t have that 1963 study. Internal databases of records were consulted with no indication that the data or the punched cards existed in the 20,000+ cubic feet of storage. A retired NORC librarian was called in who thought the cards were in storage. Old hard-copy inventories of the materials in bulk storage were reviewed, but they were unable definitively confirm or rule out the existence of the cards. A second former employee was contacted who recalled the existence of an older card catalog listing the holdings of some of boxes in storage. After six weeks ten boxes were retrieved from an off-site storage facility in Chicago. Data Conversion The data had to be interpreted and converted to a current machine-readable medium. The 38 year-old cards had to be read The data had to be converted from multiple-punched data to single punched data National Data Conversion Institute in NYC could read the cards. A single set of the cards existed, so the cards were hand-delivered to NY. Complications arose in reading the “near perfect” card collection. The card-reader needed refurbishing The first test file was corrupted The firm didn’t know how to spread multiple-punched data Preservation Ultimately, with senior NORC staff working diligently with the Institute, a clean data file emerged. With the data fully recovered, NORC created a final SPSS system file with detailed labels and archived it with the Roper Center. If Tom Smith had not worked with these data in the mid-1970s, the data would have remained a hidden treasure—no finding aids pointed researchers to this valuable dataset. Smith and Forstrom concluded “…it took persistent efforts, the assistance of two ex- employees, and a bit of serendipity to unearth the data. Moreover, once recovered we had data on a medium that was so antiquated that it took four months of extensive efforts to convert it to a modern, user-friendly format.” 12/8/2018
Data-PASS Efforts Roper is negotiating with the National Opinion Research Center (NORC) to preserve valuable datasets Odum has been working with RTI International, a private research organization located in Research Triangle Park, NC, to develop a strategy for requesting data from PROs nationwide. 12/8/2018
Roper Center Archive of public opinion survey data Established in 1947 at Williams College Core historical data collections: Gallup Polls, 1936-present Fortune Magazine surveys, 1938-1949 American Soldiers Surveys, 1941-1946 Data files for over 15,000 surveys 12/8/2018
Roper Center – NARA Objective of collaboration USIA Data Collection Recover, preserve, document and make accessible the United States Information Agency Office of Research surveys, 1952-1999 USIA Data Collection Estimated at over 2,000 surveys Survey results contributed to formulation of US foreign and defense policy Some surveys are the only opinion surveys available from certain countries 12/8/2018
Roper Center – NARA Leveraging relative strengths (NARA) structure for working with the State Department in the context of its mandate to preserve federal electronic records standards for appraising, cataloging and preserving electronic records permanent storage and file-level access for all materials related to the collection access to additional USIA records, reports and related federal government records NARA provides: structure for working with the State Department in the context of its mandate for preservation of federal electronic records standards for appraising, cataloging and preserving electronic records permanent storage and file-level access for all materials related to the collection access to additional USIA records, reports and related federal government records 12/8/2018
Roper Center – NARA Leveraging relative strengths (Roper) potential flexibility in communications and approach federal agency-to-agency protocols may not be as flexible as required for a project of this type experience working with a variety of organizations to acquire data resources active migration and management of data more streamlined access to data-based materials access to related public opinion survey data from the private sector and non-federal public sector Roper Center provides: potential flexibility in communications and approach Federal government agency-to-agency protocols may not be as flexible as required for a project of this type experience working with a variety of organizations to acquire data resources active migration and management of data more streamlined access to data-based materials access to related public opinion survey data from the private sector and non-federal public sector 12/8/2018
Benefits to Cooperation Preservation of valuable datasets Many researchers gain access to additional material PROs gain electronic access to previous work PROs receive digital curation assistance Potential to reduce PROs storage costs 12/8/2018
Barriers to Preserving Data Contract restrictions High ingest costs Poor metadata Labor intensive operations Uniqueness requires custom solutions PRO’s lost opportunity costs Overhead associated with building relationships 12/8/2018
Questions from PRO Business Offices If datasets are assets, what is their value to our PRO? Can they be used to leverage existing research or identify new areas of interest? Do datasets have value to other organizations that might be willing to pay for them? What legal and technical issues are involved? What costs are associated with dataset archival? How can we build a business case for preservation at our PRO 12/8/2018
Approaching PROs Do background research Build on existing relationships Assess Different “PRO” business models Contractors for hire PROs with their own research agenda 12/8/2018
Early research data life cycle intervention Assist researchers and PROs with preservation requirements during proposal process Preparing for preservation at the proposal level can ensure That: Datasets are “born digital”, making preservation affordable. And PRO business models are more affordable 12/8/2018
Preserving Future Studies Funding agencies are key Ultimate owners of research data Could request and enforce archival of data Issues are easily addressed early in Life Cycle Single point of contact for many PROs 12/8/2018
Digital Curation Keys to Dataset Collection Development Knowing the producers’ & consumers’ needs Educating producers on preservation requirements Early involvement in the research data life cycle Building and Maintaining relationships 12/8/2018
Thank You 12/8/2018