Download presentation
Presentation is loading. Please wait.
Published byBernard Harrington Modified over 9 years ago
1
An Arizona Model for Capturing and Describing Documents on the Web Richard Pearce-Moses Director of Digital Government Information Arizona State Library, Archives and Public Records rpm at lib.az.us
2
What Does WWW Stand For? They both abbreviate to WWW Rugged Individualism Lack of standards ~ Lawlessness [Collage of Robert Conrad as James West in the Wild, Wild West removed to avoid violation of copyright.]
3
The Dream To collect, manage, preserve, and make useful the enormous amount of digital information our culture is now producing
4
The Reality Two Approaches Bibliocentric (Item-by-Item) Tech-centric (Capture-It-All) Emphasis on Software Tools and Technology Limited Assistance from Content Providers
5
Library of Congress & NDIIPP University of Illinois at Urbana-Champaign School of Library Information Science OCLC Content Providers Tufts University Perseus Project Michigan State University Library State libraries: Arizona Connecticut, Illinois, North Carolina, Wisconsin UIUC partners: NCSA WILL- AM/FM/TV Information Management Services
6
Digital Archives Libraries Artificial collections Item Level Control Archives Provenance Original Order Hierarchy Aggregate Control
7
Websites as Archival Collections Documents of Common Provenance Organized into Directories (Archival Series) Publications v. Records
8
The Art and Craft of Building a Collection What we do remains the same How we do it will change ※ Identification/Selection Acquisition Description Reference Preservation
9
Identification — Where Do We Look? Finding the Forest az.gov state.az.us ※ Domain Tool Identifies all distinct domains Reports new sites since previous spider Reports when sites disappear
10
Selection: Which Collections Do We Harvest? Collection-Level Analysis Macro appraisal sets priorities Materials appraised as series Content Providers Taxonomy Tool Names Administrative history Relationships Subjects Functions
11
Selection: Which Documents Do We Harvest? Identify Series Aggregate selection Set frequency of harvests Site Analysis Tool Display structure Harmonize physical, intellectual structure Identify inaccessible content Show what’s new Show significant changes
12
Description To be able to locate documents when the creator or provenance is known when the subject is known and to aid in selection as to character Series Description Make directory name a meaningful title Scope and contents note High-level subject headings Recorded in site analysis tool database Document Description Creator: taxonomy, internal metadata Title: from internal metadata, noun phrases Subject: from series metadata, internal metadata
13
Access Finding Aids A valuable bird’s-eye view for archivists Of limited value to patrons... Unless they’re transformed into topic maps Full Text Search Engines Ranking Algorithms Categorization / Packaging Results Based on series-level metadata Based on autoclassification
14
Description and Access Series-Level Description name=“Creator”Governor’s Drought Task ForceRural Watershed Alliance name=“Subject”reservoirsground water name=“Subject”droughtwater conservation name=“Subject”potable wateragriculture name=“Type”planningreports Categorized Results Your search for water, Phoenix Found documents in the following categories water (500+) water conservation (357) Salt River Project (210) drought (110) flood control (98) xeriscape (25) Found documents from the following agencies Water Resources (135) Governor's Drought Task Force (102) Phoenix (87) Maricopa County (84) Corporation Commission (35)
15
Administration / Curation / Stewardship Systematic Regular Workflows Not idiosyncratic Collaborative Consensual, Not Idiosyncratic Avoid Redundant Efforts Quality Control Need for Good Metrics Need for Regular Audits
16
Stay Tuned....
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.