Meta/Data As If Research Depends On It Roy Tennant #usetda2016 #usetda2016 Meta/Data As If Research Depends On It
Why Share Data?
Mandates
Facilitates data re-use
Brings more attention to the research it is based upon
Because it’s the right thing to do
Why Metadata?
The chained library of Hereford Cathedral
https://www.flickr.com/photos/bluefootedbooby/ Photo by https://www.flickr.com/photos/bluefootedbooby/ CC BY-NC-ND 2.0
Photo by U Mich Library https://www. flickr Photo by U Mich Library https://www.flickr.com/photos/mlibrary/ CC BY 2.0
Photo by Karin Dalziel https://www. flickr Photo by Karin Dalziel https://www.flickr.com/photos/nirak/ CC BY-NC 2.0
http://i1167.photobucket.com/albums/q626/bsloths1/worldcatdiscoveryinterface.jpg http://i1167.photobucket.com/albums/q626/bsloths1/worldcatdiscoveryinterface.jpg http://i1167.photobucket.com/albums/q626/bsloths1/worldcatdiscoveryinterface.jpg
Data doesn’t describe itself Guess what? Data doesn’t describe itself Not Even Close
Umm…what am I looking at?
Data Dictionary
Codebook
Survey Instrument
User Guide
HOW Metadata?
A Few Questions… What is it? Who collected it? How was it captured? When? What can I legally do with it?
…That Lead to Many More What are all of the data elements? What do they mean? That is, how exactly were they collected? Were the requirements of working with human subjects met? Etc.
At Least Three Types Descriptive — what is needed for discovery and selection Technical and Structural — what is needed to describe technical aspects of the data, such as file format, and how the data is structured Administrative — what is needed to manage or use the object (e.g., rights)
Using Which Standard?
How hard can this be? Photo by https://www.flickr.com/photos/h-k-d/ CC BY-ND 2.0
380 million records
Discovery Evaluation Identifiers 245 Title Statement 100 Personal Name 650 Subject 700 Personal Name 260 Publication Statement 300 Physical Description Evaluation 500 Notes
This is incredibly difficult to get right
Standards alone don’t solve your problem
Basic Metadata Workflow Authentication & Authorization Capture Validation Reporting and Remediation Protection & Backup
Basic Principles
Capture everything that you can
Whenever possible, validate it upon entry
Granularity is key for machine processing
Granularity Example Gabriel Garcia Márquez <IndividualName> <FirstGiven>Gabriel</FirstGiven> <LastFamily>Garcia Márquez</LastFamily> </IndividualName> “Gabriel Garcia Márquez” or “Garcia Márquez, Gabriel”
Strive for consistency
Employ constant vigilance & use remediation techniques
UNT Metadata Quality Assurance Mechanisms & Tools… 2. Metadata Analysis Tools NULL Values List/Browse All Values (by each qualifiers and elements) List Authorities Values Graphical reports and other fun stuff Clickable Maps by Institution and Collection Word Clouds by elements Records added overtime and other graphical reports “Enhancing the Quality of Metadata: Modular Approach to Digital Resource Lifecycle Management“ http://www.library.unt.edu/events/digital-projects-unit/enhancing-quality-metadata-modular-approach-digital-resource-lifecycle University of North Texas
Recent research
…both disciplines wanted context about the data producer’s research methods and were able to get enough detail to be able to reuse the data…Archaeologists relied on bibliographies to facilitate data discovery, whereas quantitative social scientists used bibliographies to facilitate reuse decisions. Data reusers in both disciplines also relied on intermediaries. Quantitative social scientists, particularly novices, relied on faculty advisors…whereas archaeologists relied on colleagues and museum curators to locate data and associated context.
It’s unlikely that novice social science researchers (NSSRs) will require more metadata for data sets, but given that NSSRs relied on advice from faculty advisors, we should consider about how to enable this type of “human scaffolding”.
Curation of research data as part of the evolving scholarly record requires new skill sets, including deeper domain knowledge, data modeling, and ontology development. Libraries are investing more effort in becoming part of their faculty’s research process and offering services that help ensure that their research data will be accessible if not also preserved. Good metadata will help guide other researchers to the research data they need for their own projects—and the data creators will have the satisfaction of knowing that their data has benefitted others. Karen Smith-Yoshimura, April 18, 2016
Concluding thoughts
Libraries are increasingly providing end-to-end research data services
The culture of sharing data is more advanced in some disciplines than others
https://www.facebook.com/OakHillFarm1/photos https://www.facebook.com/OakHillFarm1/photos
It’s an employment growth sector
It remains unclear just how much, and in which disciplines, these data resources will be re-used
it’s the right thing to do But… it’s the right thing to do
Roy Tennant Senior Program Officer OCLC Research tennantr@oclc.org @rtennant facebook.com/roytennant