OpenWeb: Expanding access to Digital Collections Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University

Slides:



Advertisements
Similar presentations
Vanderbilt Television News Archive Current Status And Recent Accomplishments.
Advertisements

Vanderbilt Television News Archive Current Status And Recent Accomplishments.
Vanderbilt Television News Archive Off-air archiving of news broadcasts from national television networks NDIIPP Partners Meeting July 2008 Breakout session:
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
LeadManager™- Internet Marketing Lead Management Solution May, 2009.
1 L U N D U N I V E R S I T Y a home grown, bespoke institutional Federated Search tool JIBS Conference at The John Rylands University Library,
A partnership of Truman Presidential Museum & Library, Truman Institute, and the MU Design Team at CTIE Project Whistlestop.
Ball State University Libraries A destination for research, learning, and friends Using Google Analytics Data to Expand Discovery and Use of Digital Archival.
TVNA: What Is It? The only comprehensive, unedited collection of network news broadcasts (ABC, CBS, CNN,NBC) since 1968 In-depth indexing and abstracting.
TC2-Computer Literacy Mr. Sencer February 4, 2010.
Using library resources for research Paul Johnson Bedford Library.
Lund Online 07/10/2009 Ingolf Kaspar, Regional Sales Manager EBSCO Publishing.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
The FDLP Web Archive Dory Bower Archive-It Partner Meeting November 18, 2014.
Greenstone Digital Library Usage and Implementation By: Paul Raymond A. Afroilan Network Applications Team Preginet, ASTI-DOST.
Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004.
CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.
Web Programming Language Dr. Ken Cosh Week 1 (Introduction)
OCLC Online Computer Library Center A Global OpenURL Resolver Registry Phil Norman OCLC Dlsr4lib Workshop March 23 rd, 2006 Arlington VA.
By Raza / Faisal By: Raza Usmani Faisal Khan. What is SEO? It is the process of affecting the visibility of a website or a web page in a search engine's.
Session 7 Selection of Online Resources and Options for Providing Access.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
Power to the People: The IUB Libraries' Website Digital Asset Management System Doug Ryner, Tadas Paegle, & Julie Hardesty.
DEEP SEARCH Application of Primo Deep Search between Northwestern and Vanderbilt ELUNA 2015 Michael North - Northwestern University Dale Poulter - Vanderbilt.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Describe several.
Vanderbilt Television News Archive A resource for News, Popular Culture, and The Arts Marshall Breeding Director for Innovative Technology and Research.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
1 Library Services. 2 Benefits of using the Library To find resources for your assignments and identify areas of interest To produce extra good papers.
Strategies for improving Web site performance Google Webmaster Tools + Google Analytics Marshall Breeding Director for Innovative Technologies and Research.
Web Site Performance An analytical approach for benchmarking and tuning.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
Understanding library users you don't see Techniques for tracking and analyzing library Web resources Saturday June 24 Marshall Breeding Director for Innovative.
Marshall University Electronic Theses & Dissertations Program Implementation Issues & Responsibilities.
Five Years InterLab ’07 Los Alamos, New Mexico October 1–3, 2007 Valerie S. Allen, MSLIS U.S. Department of Energy Office of Scientific and.
VIDEO ARCHIVING Models and opportunities Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library Executive Director,
1 Wawasan Open Library Library Orientation 21 January 2007.
LET’S WORK TOGETHER: Integrating Social Media, Online Marketing, and Outreach ALA Annual 2012 June 25, 2012 Marshall Breeding Independent Consultant, Author,
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Beyond Search Engines: Advanced Web Searching Subject Directories  Librarians’ Index to the Internet  Infomine Finding Databases on a Subject  The Invisible.
Vanderbilt Television News Archive Marshall Breeding Director for Innovative Technology and Research Vanderbilt University
In Search of Your Family Roots Web FamilySearch. Two Approaches Fee vs. Free Fee Sites – Staff of transcribers – Large collection of data bases Free Sites.
Marshall Breeding Director for Innovative Technology and Research Vanderbilt University
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Extending Access To Information Resource Discovery Service William E. Moen, Ph.D. Kathleen R. Murray, Ph.D. School of Library and Information Sciences.
The World Wide Web: Information Resource. Hock, Randolph. The Extreme Searcher’s Internet Handbook. 2 nd ed. CyberAge Books: Medford. (2007). Internet.
A library is primarily set up to acquire, organized, store and make accessible to the users, within the quickest possible time all forms of information.
© 2010 Deep Web Technologies, Inc. Taking the Library Back from Google Abe Lederman, President and CTO Deep Web Technologies May 12, 2010.
Uncovering the Invisible Web. Back in the day… Students used to research using resources hand-picked by librarians and teachers. These materials were.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
How I Spent My Summer – or – Oxford-Illinois Digital Libraries Placement Program Summer 2015 Jennifer Westrick, MSLIS University of Illinois, OIDLPP.
ASP. ASP is a powerful tool for making dynamic and interactive Web pages An ASP file can contain text, HTML tags and scripts. Scripts in an ASP file are.
The World Wide Web: Information Resource. How a Search Engine works… How Search Works - YouTube
One Library’s Successful Venture in Providing Comprehensive Streaming Media Services Charleston Conference 2015 Saturday, November 7 10:45am - 11:15am.
Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.
Web Search Architecture & The Deep Web
The Catalog of the Future: Integrating Electronic Resources By Dana M. Caudle Cataloging Librarian Auburn University Libraries
Vanderbilt Television News Archive A resource for News, Popular Culture, and The Humanites Marshall Breeding Director for Innovative Technology and Research.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Delivers local and global resources and OCLC e-Content in a single search Paul Cappuzzello Senior Library Services Consultant
Digital Commons digitalcommons.unl.edu. Digital Commons is: an “institutional repository” (IR) a resource for scholarly communication an opportunity for.
Networked Information Resources Federated search, link server, e-books.
Strategies for improving Web site performance
The Internet An Overview.
The New Face of Information Retrieval: The Ankara University Open Access Platform Prof. Dr. Sekine Karakaş Prof. Dr. Doğan.
Lesson 1 The Web.
Vanderbilt Television News Archive
Presentation transcript:

OpenWeb: Expanding access to Digital Collections Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University Redefining Libraries: Web 2.0 and other Challenges May 2007 Xiamen, China

The Invisible Web A great amount of information cannot be found on the Web since it is locked inside databases. A great amount of information cannot be found on the Web since it is locked inside databases. Search engines are getting better at unlocking database content, but it involves help from site administrators Search engines are getting better at unlocking database content, but it involves help from site administrators Goal: Move TV News from the Invisible Web to the Open Web Goal: Move TV News from the Invisible Web to the Open Web

History and Background of the Archive Conceived by Nashville insurance executive Paul Simpson Conceived by Nashville insurance executive Paul Simpson Established by Vanderbilt University in August 1968, initially as a 3-month experiment that grew into a permanent institution. Established by Vanderbilt University in August 1968, initially as a 3-month experiment that grew into a permanent institution.

A Unique Archive The largest, most comprehensive collection of national broadcast news available to the general public. The largest, most comprehensive collection of national broadcast news available to the general public. Vanderbilt has systematically archived completed news programs since Aug 5, 1968 Vanderbilt has systematically archived completed news programs since Aug 5, 1968 A large amount of unique material A large amount of unique material Material available at costs affordable by scholars and researchers Material available at costs affordable by scholars and researchers Only resource where researchers can search across all the major national news networks. Only resource where researchers can search across all the major national news networks.

An extensive collection Over 825,000 abstracts in our news database Over 825,000 abstracts in our news database ~30,000 hours of regular nightly news programs ~30,000 hours of regular nightly news programs ~10,000 hours of special news broadcasts ~10,000 hours of special news broadcasts

Videotape loan service Lend material from the collection on VHS format Lend material from the collection on VHS format Compilation of requested news segments Compilation of requested news segments Duplications of complete programs Duplications of complete programs Service fees based on affiliation: Vanderbilt & Sponsors, Educational, Others Service fees based on affiliation: Vanderbilt & Sponsors, Educational, Others Material provided for viewing only. Tapes must be returned. Material provided for viewing only. Tapes must be returned. All licensing of materials borrowed must be negotiated with the original network All licensing of materials borrowed must be negotiated with the original network

Open Web Project

Problem: Hard to find us on the Web unless users already know we exist Hard to find us on the Web unless users already know we exist Searches on news content terms do not lead searchers to our Web site Searches on news content terms do not lead searchers to our Web site Content in our TV-NewsSearch trapped in a closed database Content in our TV-NewsSearch trapped in a closed database

Project Goals Provide better service Provide better service Expand use of the collection Expand use of the collection Increase Web site activity Increase Web site activity Boost service fee income Boost service fee income

Considerations and constraints Maintain value of paid subscription product. Maintain value of paid subscription product. Keep control of database content. Valuable intellectual property built through 35 years of manual labor. Keep control of database content. Valuable intellectual property built through 35 years of manual labor.

Business Model TV News Archive must operate on a sustainable business plan TV News Archive must operate on a sustainable business plan Mandate to eliminate VU subsidy Mandate to eliminate VU subsidy Income: Income: –Institutional subscription to online service: CNN streaming video –Stipend from the Library of Congress –Service fees for videotape loan service

Project strategy Project strategy Increase discovery through managed exposure of metadata to the Internet search engines Increase discovery through managed exposure of metadata to the Internet search engines Initial focus on Google since it represents the majority of Web search activity. Initial focus on Google since it represents the majority of Web search activity.

Troubling statistic Where do you typically begin your search for information on a particular topic? College Students Response: 89%Search engines (Google 62%) 89%Search engines (Google 62%) 2%Library Web Site (total respondents -> 1%) 2%Library Web Site (total respondents -> 1%) 2%Online Database 2%Online Database 1% 1% 1% Online News 1% Online News 1% Online bookstores 1% Online bookstores 0% Instant Messaging / Online Chat 0% Instant Messaging / Online Chat OCLC. Perceptions of Libraries and Information Resources (2005) p

Library Discovery Model Library Web Site / Catalog Web Library as search Destination

Web Discovery TV-NewsSearch Database Search and Retrieval + e-commerce request system TV News Web site Web Sucessful search Terms: “tv news” “vanderbilt tv archive” “vanderbilt television news archive” “news archives”

OpenWeb Strategy TV-NewsSearch Database Search and Retrieval + e-commerce request system Generate 805,000+ Static Pages TV News Web site OpenWeb Mirror Site Web Successful search Terms: All words and phrases in TV-NewsSearch Database

Implementation Details Create OpenWeb mirror site Create OpenWeb mirror site –Static Web page for each database record –Design each page to maximize content terms exposed to Google –Funnel users to existing site –Not meant to be an alternative interface

Generating the Open Web Perl script to systematically query each record and generate html page Perl script to systematically query each record and generate html page Create browse page to link all the record pages Create browse page to link all the record pages Processes entire database in about 2 hours Processes entire database in about 2 hours Refresh weekly Refresh weekly

Helping out Googlebot Google SiteMap protocol Google SiteMap protocol XML index that tells Google about your site XML index that tells Google about your site Limit of 50,000 links per index Limit of 50,000 links per index Multiple sitemaps can be tied together in a sitemap index Multiple sitemaps can be tied together in a sitemap index

Google Webmaster’s account Provides an interface to: Provides an interface to: –Submit sitemaps –Register sitemaps –Monitor googlebot’s access to sitemaps –Monitor how Google indexes your site –Monitor how users access your site through Google –Statistics, etc –Constantly evolving functionality.

OpenWeb Progress Initial planning: Jun 2005 Initial planning: Jun 2005 Generate Pages July 2005 Generate Pages July 2005 Submit html index to Google Jul 2005 Submit html index to Google Jul 2005 Submit XML sitemap: Aug 2005 Submit XML sitemap: Aug 2005

Monitoring Activity Analysis of Public Web logs Analysis of Public Web logs Analysis of OpenWeb logs Analysis of OpenWeb logs Impact on searching? Impact on searching? Impact on videotape requests? Impact on videotape requests? Write a script to trace each request to determine origin. Write a script to trace each request to determine origin.

Google Analytics Full-featured Web site use analysis utility Full-featured Web site use analysis utility Specializes in measuring site goals and conversions Specializes in measuring site goals and conversions Depends on data sent to Google via Javascript rather than Web server logs Depends on data sent to Google via Javascript rather than Web server logs

Results Significant improvement in the interest in the archive and in the use of the collection Significant improvement in the interest in the archive and in the use of the collection

Google Web Manager’s Account

Loan service income

New User Registration

Questions / Discussion For further information contact: For further information contact: Marshall Breeding Director for Innovative Technology and Research