Download presentation
Presentation is loading. Please wait.
Published byWilla Wilkerson Modified over 6 years ago
1
Joanne Archer University of Maryland Libraries
Problems and Issues in Selecting, Harvesting, and Cataloging Web Resources Joanne Archer University of Maryland Libraries
2
Crawler Web Harvesting Jargon Seed Crawl Harvest
3
Wayback Machine
4
Options for Web Harvesting
i.e. Pandora, Web Curator Tool In House Program i.e. Web Archiving Service Archive-It Third Party Subscription Pro: flexibility Pro: Ease-of-use Con: $$$ Con: $ Off the Shelf Software i.e. HTTrack, Adobe Web Capture Pro: inexpensive Con: not-scalable
5
Key Questions for Harvesting Projects
uniqueness ephemerality research value harvest frequency scope
6
Maryland’s Pilot Harvests (2008-2010)
Maryland State Documents Historic Preservation
7
Why harvest these areas?
Builds on existing strengths in print collections Collections are unique Large amount of material migrating to the web
8
Key Questions for Harvesting Projects
uniqueness ephemerality research value harvest frequency scope
9
Harvesting
10
Harvesting Challenges:
Javascript Streaming media Form and database driven content Password protected sites Robot.txt files Multiple hosts/subdomains
11
Single host = www.preservemd.org
Multiple hosts =
12
End-User Access
13
general material designation
End-User Access general material designation collection note URLs subject heading uniform title
14
Conclusions Challenges Start up costs What to collect
Metadata creation BUT We are well prepared to meet the challenges
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.