A Focused Analysis of MARC Records in HathiTrust Records become statements HTRC currently uses solr Indexing Statements take a key=> value form title => Emma Alternatives being explored include RDF triplestore Processing makes assumptions about what information is where. Colleen Fallaw Sarah Yarrito Erik Radio MJ Han Tim Cole
Clear, rich, accurate description Scholar Needs Workset Clear, rich, accurate description Compound queries Matching
Concerns Found Ambiguous Current vs. original language not consistently distinguished for translated works Ambiguous Concepts such as geographical coverage information spread through multiple fields Distributed Punctuation and formatting vary for author, title, publisher, and date fields relied on for matching Inconsistent
Information Control Fields Descriptive Fields Language 008 Positions 35-37 041 $a Date of Publication 008 Positions 07-10, 11-14 260/264 $c Place of Publication 008 Positions 15-17 260/264 $a Geographic Coverage n/a 651 $a, 650 $z, 043 $a 651 $a(18%) 650 $z(25%) 043 $a(25%) x 6% 7% 3% 2% 12% 5% 60% *651 $a (Subject Added Entry) Topical Term/Topical term or geographic name entity element *650 $z: (Subject Added Entry) Geographic name/geographic subdivision *043 $a (Geographic Area Code) Geographic area code
Implications Link Enhance Correct