Can Bilateral Digitization Tear Down the Wall Between Institutions and the Public? Ben Brumfield Digital Frontiers 2012
“You know Ben that it really stinks that I can't get access to the original. My grandfather Jeremiah wrote the diary so that I could read about his daily life happenings. My grandfather Edward used to own it and if he had known that I would be so interested in it I'm sure he would have kept it and given it to me instead of the university.” Alan Williams,
Walls Professionally conserved Publicly accessible Catalogued 1000 miles away Reading room restrictions “Permission-to-publish” agreements Costly scanning fees
Penetrating the Walls Digitization Collaboration
Shallow Digitization (Institutional Version) “Scan-and-dump” facsimiles –Limited metadata –No transcripts –Not crawlable
Shallow Digitization (Amateur Version) Full transcripts –No facsimiles –No provenance –No metadata on sources –Invisible editorial decisions Cut-and-paste replication –No attribution
Deep Digitization Institutional Challenges –Funding –Manpower Non-institutional Challenges –Standards –Access to sources
Crowdsourcing Who are the volunteers? What can they do? OldWeather.org Zenas Matthews Harry Ransom Center Fragments
Accuracy Individual transcriptions are about 97% accurate Of 1000 transcribed logbook entries: –3 will be lost because of transcription errors –10 will be illegible –At least 3 will be errors in the logs
OldWeather Participation More than 1.6 million weather observations. 16,000 volunteers. 1 million log pages transcribed. Mean contribution of 100 transcriptions per user.
OldWeather Participation More than 1.6 million weather observations. 16,000 volunteers. 1 million log pages transcribed. Mean contribution of 100 transcriptions per user – but this statistic is worthless!
Power-law Distribution Most contributions are made by a core of well-informed enthusiasts. True regardless of project size. What are the implications?
One “Well-Informed Enthusiast” In 14 days, –Entire diary transcribed –250 revisions to 43 pages –Two dozen footnotes
Crowdsourcing’s Virtuous Circle Volunteers Deep digitization Findability More Volunteers!
One Volunteer’s Story Nat Wooding –Retired data analyst –100 pages of Julia Brumfield’s diaries transcribed and indexed in six months –No relation to diarist
One Volunteer’s Story Nat Wooding –Retired data analyst –100 pages of Julia Brumfield’s diaries transcribed and indexed in six months –No relation to diarist –Great-uncle was diarist’s letter carrier, also named Nat Wooding
Non-institutional Digitization
The Invisible Archive Private collections Family archivists (filing cabinets) –or their heirs (boxes in the attic) Non-notable subjects Flickr
The Standards Problem “We can't overemphasize the potential futility of citing websites, any websites,but especially non-institutional websites.” –Diggitt McLaughlin (H-SHEAR )
The Standards Problem “Needless to say, amateurs will continue to put out poorly edited versions of documents in print which we, as professionals, will continue to eschew using.” –Christopher L. Miller (H-OIEAHC list, )
Solutions Collaboration Participation by professionals in amateur projects FreeREG/FreeCEN
Solutions Community Flickr RootsTech
Solutions Software Platforms Suggested rigor Graceful degradation
Thanks! Ben Brumfield Slides and transcript to be posted at