ERIKA Eesti Ressursid Internetis Kataloogimine ja Arhiveerimine Estonian Resources in Internet, Indexing and Archiving.

Slides:



Advertisements
Similar presentations
Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
Advertisements

PubMed/History; Accessing Full-Text Articles (module 4.4)
The results for this search are displayed in the Summary format with a total of 3808 citations.
The Internet Information Systems, Intermediate 2.
DREAMWEAVER Welcome to our website!
JSTOR User Services l February 2009 Using the JSTOR Interface User Services, February 2009.
1 ETT 429 Spring 2007 Microsoft Publisher II. 2 World Wide Web Terminology Internet Web pages Browsers Search Engines.
PubMed Search Options (Basic Course: Module 6). Table of Contents  History  Advanced Search  Accessing full text articles from HINARI/PubMed  Failure.
Updated august 2010 Coconino Community College. o A database is an organized collection of information that can be searched based on a variety of keywords.
Archive-It Architecture Introduction April 18, 2006 Dan Avery Internet Archive 1.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
+ RSS Aggregation and Syndication. + Really Simple Syndication (aka, Rich Site Summary) Image source:
1 Archive-It Training University of Maryland July 12, 2007.
OS and Application Files BACS 371 Computer Forensics.
PubMed/History; Accessing Full-Text Articles (module 4.4)
OARE Module 3: OARE Portal.
WebArchiv Czech Web Archive IIPC 2007, Paris.
Adventures in Digital Asset Management: Fedora at the National Library of Wales Glen Robson National Library of Wales
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Internet Fundamentals Total Advantage MS Excel 97, Hutchinson, Coulthard, 1998 McGraw Introduction to HTML Chapter 7.
Exploring Web Page Design. What is a Web Page?  A web page is a multimedia file which can be stored on a web server.  It can include text, graphics,
Online Autonomous Citation Management for CiteSeer CSE598B Course Project By Huajing Li.
IIPC GA Curator Tools Fair May 2014 WEB CURATOR TOOL Nicola Bingham Web Archivist.
PubMed/History, Advanced Search and Review (module 4.3)
Web Indexing and Searching By Florin Zidaru. Outline Web Indexing and Searching Overview Swish-e: overview and features Swish-e: set-up Swish-e: demo.
Estonian Web and Bibliographic Control Janne Andresoo.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Web software. Two types of web software Browser software – used to search for and view websites. Web development software – used to create webpages/websites.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
Accessing journals by via PubMed Note the link to find articles through HINARI/PubMed. Using this option will be covered in later in the Short Course.
Made by Reference Services Section, NCCU update Database for Beginners.
Easy Access with jumpstarts Transforming Research into Results Lund, 17 mars 2011.
The Internet Using the Internet Web addresses Searching Favourites Saving / Printing web pages.
CyberCemetery Preserving At-Risk Government Web Content.
Web Page Design Introduction. The ________________ is a large collection of pages stored on computers, or ______________ around the world. Hypertext ________.
Documenting Internet2 an IT perspective Eric Celeste University of Minnesota (Twin Cities) Libraries for the Coalition for Networked Information 6 December.
We now will use Advanced Search Builder option. Access to Advanced is from the initial PubMed page or the Search Results page. Advanced Search.
● A system of Internet servers that support specially formatted documents. The documents are formatted in a markup language called HTML. What is the World.
HINARI – Accessing Articles: Problems and Solutions (Appendix 1)
EndNote: The Next Steps Rebecca Starkey Reference Librarian The Joseph Regenstein Library
Metadata Extraction & Web Archives: Automating the Record Creation Process Abbie Grotke / Gina Jones /
Journals can be accessed by title from an alphabetical list. For this exercise, click on ‘L’ from the A-Z list. Note: there also is a View complete list.
Uncovering the Invisible Web. Back in the day… Students used to research using resources hand-picked by librarians and teachers. These materials were.
1 BCS, Oxfordshire, 19 February, 2004 WEB ARCHIVING issues and challenges Deborah Woodyard Digital Preservation Coordinator.
Web Design and Development. World Wide Web  World Wide Web (WWW or W3), collection of globally distributed text and multimedia documents and files 
Web Server.
Quick Launch. Google Drive 30 GB Cloud Space Document.
Our MP3 Search Engine Crawler –Searching for Artist Name –Searching for Song Title Website Difficulties Looking Back.
Website Design:. Once you have created a website on your hard drive you need to get it up on to the Web. This is called "uploading“ or “publishing” or.
Recent CMA Enhancements Java-based Scroller Component Sample Layout Fixed problem with Component Modifier when previewing Select List components Fixed.
We now will sample several of the resources from the Other Free Collections drop down menu.
HINARI website interface, journals, and other full text resources (module 2)
General Architecture of Retrieval Systems 1Adrienn Skrop.
ACES User Interface Workshop #1 Prototype Inspection 22. November 2011.
Once logged-in, you will be taken into the Full text journals, databases, and other resources sub-page of the website. Note the ‘You are logged’ in message.
Accessing journals by title 1 Journals can be accessed by title from an alphabetical list. For this exercise, click on ‘L’ from the A-Z list. Note: there.
Searching for Scientific Research Using Environmental Index (EBSCO)
Data mining in web applications
Archiving & Preserving Digital Content
Information Architecture
Intro to WordPress (Using XAMPP)
Chapter 10: Web Basics.
Rep change 1590 (ver 18) Access to Google books
OUTLINE Basic ideas of traditional retrieval systems
Search Search Engines Search Engine Optimization Search Interfaces
HINARI – Accessing Articles: Problems and Solutions (Appendix 1)
Latin American Government Documents Archive, LAGDA
EXPLORING THE INTERNET
Márton Németh – László Drótos How to catalogue a web archive?
The Internet and Electronic mail
Presentation transcript:

ERIKA Eesti Ressursid Internetis Kataloogimine ja Arhiveerimine Estonian Resources in Internet, Indexing and Archiving

Web Archiving Experience Based on criterias (Collection Development Department) very narrow approach – traditional bibliographic view of “publication” document centered – only certain files, not the site as a whole shortcomings of this approach slide 5 (httrack, nedlib, webzip) Future ways Whole “.ee” domain (crawler) Based on criterias centered on the conception of “website” - could be subdomain, server or catalog(Collection Development Department)

Current situation Database with title, url, comment, dates Httrack Website copier Archive on a hard drive Website information

Current situation Registration’s database Httrack website copier Httrack logs’ database Website files not indexed, not packed (50 GB) Simple web interface Record in OPAC links to original url Using freeware for all

Current situation Problems Manual labour Missing web publications Missing the context of publications (user may prefer to browse archived website not to search by title and author of a certain file) Limiting out resources that are not “publications” but are important for the national memory (websites of political parties) Advantages Take only what we need Extensive bibliographic descriptions Everything is under control

software httrack – takes a list of urls – saves website as a mirror, all links internal links are converted to point to local mirror tests in history – nedlib harvester, downloads files and meta- info does not save internal structure – webzip (similar to httrack saves a mirror problem – licenced software)

software

access httrack logs are read into database access script – redirecting urls – content-type (problem with x.php?file=id saves as html while in reality is pdf or else) OPAC records will be updated to point to the archive full-text index?

Perspectives DB registartion Harvester Files Stored locally Indexing Access to files check for correct saving Website inforamtion

Perspectives Test IIPC software (netpreserve.org) Use a crawler to collect (heritrix) Cooperation with neti.ee – Nuhk (Spy) crawler? Scope as much as possible strict criterias

Perspectives Heritrix ( Nutchwax, WERAhttp://crawler.archive.org/ crawl + index + pack + web interface Neti.ee ( own crawler, database of current situation, only “estonian” ip addresses, buffer includes only text- no images or other formats

Perspectives Based on criterias system more automated: Steps: 1) Login to a web interface 2) Add link (links) of a webpage, some comments, update period 3) System gets the page and saves to archive (packed or unpacked) 4) Control the webpage saved in archive

Perspectives ERIKA DIGAR DIGAR includes objects with controlled structure only In pdf format. ERIKA can contain any format you get from the internet