Presentation is loading. Please wait.

Presentation is loading. Please wait.

Latin American Government Documents Archive, LAGDA

Similar presentations


Presentation on theme: "Latin American Government Documents Archive, LAGDA"— Presentation transcript:

1 Latin American Government Documents Archive, LAGDA
A joint Project of the Latin American Network Information Center (LANIC), the Nettie Lee Benson Latin American Collection, and The Univeristy of Texas Libraries Web crawling provided by the Internet Archive’s subscription service Archive-It LANIC got involved with Web archiving working as a partner on the CRL Political Communications project where we also got to know Michele Kimpton at the Internet Archive and the work they were doing. LARRP had identified presidential messages a needed area of collection and the Benson had just launched a project to manually download presidential and ministerial documents from Web sites to complete and continue their serials collection. And LANIC was looking at building a crawler based on IA’s Heritrix for this type of collecting. In this process of looking at how to best capture Latin American Web-based government documents, we became interested in participating in the Archive-It pilot. The Archive-IT pilot ran from September to November Based on the results of that pilot, University of Texas Libraries decided to subscribe to Archive-It with its launch February 2006 for further development and collection for the Latin American Government Documents Archive (LAGDA). LAGDA is a joint effort of the University of Texas Libraries, Nettie Lee Benson Latin American Collection, and LANIC using services provided by Internet Archive’s Archive-It.

2 The focus of LAGDA is on government Web sites containing the annual reports (informes/memorias) and the plans and programs of the presidents and the major ministries in all Latin American and Spanish-speaking Caribbean countries.

3 Collection management is accessed from the Archive-It homepage through the “Member Login.”

4 Under the Archive-It subscription, a partner is allowed up to 300 seed urls and a total of 10 million documents captured per year. The subscription charge is $10,000 per year, which includes ongoing storage. This cost is being borne by UT Libraries.

5 Selection of the seed urls was made by Don Gibbs, the Latin American bibliographer at the Benson Collection based on on-going collection needs. Don provided an excel spreadsheet of 279 urls which we pasted into the “Add Seeds” page, a sample of which you can see in the “Take a Tour” on the Archive-It homepage. This list is shows the results of the first crawl.

6 The dates of each crawl are given
The dates of each crawl are given. Clicking on any date will give the site as captured on that date. Once a site is captured and in the Wayback Machine, it cannot be removed.

7 Capture includes text, images, and audio.

8 The collection manager can set frequency and change out urls, as well as set the depth of the crawl.

9 Catalog records are based on Dublin Core.

10 The end user can access collections through the Archive-It interface
The end user can access collections through the Archive-It interface. Here the user is searching for the term “mujer” in the LAGDA collection of the University of Texas.

11 Here are the search results for mujer in the LAGDA collection.

12 To improve userability LANIC developed an interface to the LAGDA collection.

13 Browse Full Collection shows the list of the 279 seed urls in the collection. Each is linked to the list of capture dates in the Wayback Machine.

14 Clicking on the Ministerio de Defensa under Argentina brings the user to this page. The asterisk by the date denotes when the site was updated. The user can select any of the dates.

15 The captured site for the date selected.

16 Documents can be downloaded from the captured site.

17 Archive-It provides full text search of the captured site
Archive-It provides full text search of the captured site. Here the term Fox has been entered.

18 Search results.

19 Site selected.

20 LANIC is developing value added service by locating documents for the user under Presidential Messages and Ministerial Documents to aid discovery.

21 Specific documents are located on the sites and linked by country.

22 Sample document.

23 The same is done for ministerial documents.

24 Direct link to page.

25 And document selected.

26 It is also possible to integrate this type of collecting into library collections. The Benson is developing a site to pull out the specific documents for their ongoing serial collections and creating catalog records for them. For this value added service the document is downloaded from the archived Web site and hosted on the library server. This also soothes anxieties about where these digital documents reside. We believe this is part of the shift from collecting for an individual library’s holdings to collecting for a research community. In managing the Archive-It collection we divided tasks into two areas of focus. One is quality of capture as monitored and managed through the backend. This would be seeding the crawls, reviewing crawl status, and verifying completeness of crawl. Problems in capture can arise from robot.txt blocking, Javascript, and other structural impediments. LANIC is managing this area. The other is to verify capture of identified collection documents. In other words did we get the documents we want. If not, we will adjust or add a seed url as needed or it may be a matter of contacting the Webmaster about an exemption to a robot.txt.block. This is a large task but does lend itself to collaboration. Don Gibbs has been in touch with the LARRP Official Documents Working Group about having bibliographers responsible for assigned countries check the archived Web sites to see that basic documents have been captured and to suggest other agencies for inclusion. Other ways to collaborate through the Archive-It service is to divide among institutions collection subject areas. UT is in discussions with another library to have them subscribe to capture another set of 300 urls to complement and fill-out the LAGDA collection. For the scholar, we have not only captured individual documents, we have captured the context surrounding those documents. We have also captured an artifact, the Web site itself.


Download ppt "Latin American Government Documents Archive, LAGDA"

Similar presentations


Ads by Google