Latin American Government Documents Archive, LAGDA

Slides:



Advertisements
Similar presentations
Web Archives and Large-Scale Data: Preliminary Techniques for Facilitating Research Nicholas Woodward Latin American Network Information Center
Advertisements

Using SD K12 SharePoint ®. What is SharePoint? Microsoft SharePoint Components Web Browser Collaboration functions Process management modules Search modules.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
1 L U N D U N I V E R S I T Y a home grown, bespoke institutional Federated Search tool JIBS Conference at The John Rylands University Library,
Streamlined Scoping at North Carolina Kathleen Kenney.
Latin American and Human Rights Web Archiving as part of Research Library Special Collections Kent Norsworthy LLILAS Benson Digital Curation Coordinator,
University Archives University Archives & Archive-It WebCom
Archive-It Architecture Introduction April 18, 2006 Dan Avery Internet Archive 1.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
1 of 7 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
1 Archive-It Training University of Maryland July 12, 2007.
New Web of Science Rachel Mangan Customer Education
WebArchiv Czech Web Archive IIPC 2007, Paris.
1 News and media websites harvesting. 2 A daily crawl since December 2010 The selective crawl contains 92 websites National daily newspapers (
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
Web The Internet Archive. Agenda Brief Introduction to IA Web Archiving Collection Policies and Strategies Key Challenges (opportunities for.
The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving Lori Donovan, Internet Archive.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Databases and Library Catalogs Global Index Medicus/Global Health Library PubMed Source Bibliographic Database: International Health and Disability.
Re-Implementing ERM MENA-IUG 5 th Annual Conference 1-2 November 2010.
Introduction to eChalk For Students. What is eChalk? eChalk’s unique online learning environment provides your school with its own electronic “town square”
This walkthrough demonstrates how to search for eBooks in the EBSCO database.
INFORMATION MANAGEMENT DIVISION, PSAS  Turnitin is a suite of educational tools for digital assessment and plagiarism prevention. To Deter Plagiarism.
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
ERIKA Eesti Ressursid Internetis Kataloogimine ja Arhiveerimine Estonian Resources in Internet, Indexing and Archiving.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Training by the Office of Library and Information Services Contact for more information: karen.gardner- or
Introduction to metadata
CyberCemetery Preserving At-Risk Government Web Content.
ALA Institutional Repository Update ALA Archives at the University of Illinois Urbana-Champaign Chris Prom Cara Bertram Denise Rayman.
SOML Large Optics Daily Reporting Guide to using the new ETSEDMS server for Large Optics Daily Reporting.
XP Browser and Basics COM111 Introduction to Computer Applications.
Hannah Hawlk MEDT 7478 Fall 2012 From the opening menu, click on the “catalog” tab to open the catalog module.
The University of Texas at Austin If We Build it, Will They Come? Providing Enhanced Access to an Archive-It Collection LAGDA - Latin American Government.
Search and Access Technologies for Large Scale Web Archives Joseph JaJa, Sangchul Song, and Mike Smorul Institute for Advanced Computer Studies Department.
Ebrary. Open Ming Hua Library website: then click “Resources" > "Ebrary“
Matt Goldner Product & Technology Advocate Mela Kircher Product Manager WorldCat Local Metasearch 13 November 2009.
| 1 EBSCOadmin EBSCO Support EDS Wiki Renata Wlodarczyk | EBSCO.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Archiving & Preserving Digital Content
Omeka Web-Publishing Platform
Web-based Information Science Education
Digital Library Development in Australia
Member Access.
Using JSTOR May 2016.
2 At the top of the zone in which you want to add the Web Part, click Add a Web Part. In the Add Web Parts to [zone] dialog box, select the check box of.
2 At the top of the zone in which you want to add the Web Part, click Add a Web Part. In the Add Web Parts to [zone] dialog box, select the check box of.
Introducing Knowledge for Care Scotland
Joanne Archer University of Maryland Libraries
Creating Web Collections with Archive-It
CITY COLLEGE LIBRARIES
College of Information
Canvas Discussion Boards
Crawling with Heritrix
What Are Institutional Repositories?
Canvas Discussion Boards
Wisconsin County and Municipal Government Collections in Archive-It
Click on SEARCH for catalog
Student Introduction to CORE ELMS
Products and services for digital library
Adding , Editing, and Assigning Full Text Finder Links
Digitization Standards: Issues & Updates
Microsoft Office Illustrated Fundamentals
Sound Preservation: First Steps
Find your school and click on it.
The Internet and Electronic mail
What is StudentWeb? In StudentWeb you can access:
What is StudentWeb? In StudentWeb you can access:
Presentation transcript:

Latin American Government Documents Archive, LAGDA A joint Project of the Latin American Network Information Center (LANIC), the Nettie Lee Benson Latin American Collection, and The Univeristy of Texas Libraries Web crawling provided by the Internet Archive’s subscription service Archive-It LANIC got involved with Web archiving working as a partner on the CRL Political Communications project where we also got to know Michele Kimpton at the Internet Archive and the work they were doing. LARRP had identified presidential messages a needed area of collection and the Benson had just launched a project to manually download presidential and ministerial documents from Web sites to complete and continue their serials collection. And LANIC was looking at building a crawler based on IA’s Heritrix for this type of collecting. In this process of looking at how to best capture Latin American Web-based government documents, we became interested in participating in the Archive-It pilot. The Archive-IT pilot ran from September to November 2006. Based on the results of that pilot, University of Texas Libraries decided to subscribe to Archive-It with its launch February 2006 for further development and collection for the Latin American Government Documents Archive (LAGDA). LAGDA is a joint effort of the University of Texas Libraries, Nettie Lee Benson Latin American Collection, and LANIC using services provided by Internet Archive’s Archive-It.

The focus of LAGDA is on government Web sites containing the annual reports (informes/memorias) and the plans and programs of the presidents and the major ministries in all Latin American and Spanish-speaking Caribbean countries.

Collection management is accessed from the Archive-It homepage through the “Member Login.”

Under the Archive-It subscription, a partner is allowed up to 300 seed urls and a total of 10 million documents captured per year. The subscription charge is $10,000 per year, which includes ongoing storage. This cost is being borne by UT Libraries.

Selection of the seed urls was made by Don Gibbs, the Latin American bibliographer at the Benson Collection based on on-going collection needs. Don provided an excel spreadsheet of 279 urls which we pasted into the “Add Seeds” page, a sample of which you can see in the “Take a Tour” on the Archive-It homepage. This list is shows the results of the first crawl.

The dates of each crawl are given The dates of each crawl are given. Clicking on any date will give the site as captured on that date. Once a site is captured and in the Wayback Machine, it cannot be removed.

Capture includes text, images, and audio.

The collection manager can set frequency and change out urls, as well as set the depth of the crawl.

Catalog records are based on Dublin Core.

The end user can access collections through the Archive-It interface The end user can access collections through the Archive-It interface. Here the user is searching for the term “mujer” in the LAGDA collection of the University of Texas.

Here are the search results for mujer in the LAGDA collection.

To improve userability LANIC developed an interface to the LAGDA collection.

Browse Full Collection shows the list of the 279 seed urls in the collection. Each is linked to the list of capture dates in the Wayback Machine.

Clicking on the Ministerio de Defensa under Argentina brings the user to this page. The asterisk by the date denotes when the site was updated. The user can select any of the dates.

The captured site for the date selected.

Documents can be downloaded from the captured site.

Archive-It provides full text search of the captured site Archive-It provides full text search of the captured site. Here the term Fox has been entered.

Search results.

Site selected.

LANIC is developing value added service by locating documents for the user under Presidential Messages and Ministerial Documents to aid discovery.

Specific documents are located on the sites and linked by country.

Sample document.

The same is done for ministerial documents.

Direct link to page.

And document selected.

It is also possible to integrate this type of collecting into library collections. The Benson is developing a site to pull out the specific documents for their ongoing serial collections and creating catalog records for them. For this value added service the document is downloaded from the archived Web site and hosted on the library server. This also soothes anxieties about where these digital documents reside. We believe this is part of the shift from collecting for an individual library’s holdings to collecting for a research community. In managing the Archive-It collection we divided tasks into two areas of focus. One is quality of capture as monitored and managed through the backend. This would be seeding the crawls, reviewing crawl status, and verifying completeness of crawl. Problems in capture can arise from robot.txt blocking, Javascript, and other structural impediments. LANIC is managing this area. The other is to verify capture of identified collection documents. In other words did we get the documents we want. If not, we will adjust or add a seed url as needed or it may be a matter of contacting the Webmaster about an exemption to a robot.txt.block. This is a large task but does lend itself to collaboration. Don Gibbs has been in touch with the LARRP Official Documents Working Group about having bibliographers responsible for assigned countries check the archived Web sites to see that basic documents have been captured and to suggest other agencies for inclusion. Other ways to collaborate through the Archive-It service is to divide among institutions collection subject areas. UT is in discussions with another library to have them subscribe to capture another set of 300 urls to complement and fill-out the LAGDA collection. For the scholar, we have not only captured individual documents, we have captured the context surrounding those documents. We have also captured an artifact, the Web site itself.