Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson TPDL 2011 Berlin,

Slides:



Advertisements
Similar presentations
Chapter 6 Server-side Programming: Java Servlets
Advertisements

Connecting Social Content Services using FOAF, RDF and REST Leigh Dodds, Engineering Manager, Ingenta Amsterdam, May 2005.
Using EBSCOs Search Box Builder Tool Tutorial. Would you like to promote your EBSCOhost resources by adding an easy-to-use search box to your website?
DRUPAL How to create a website Summer Tech Academy 2010 Mercedes Conde.
Basic Searching Engineering Village. Agenda What is Engineering Village? Setting up a personal account Searching Engineering Village How to.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
AUTOMATED DISCOVERY OF PARAMETER POLLUTION VULNERABILITIES IN WEB APPLICATIONS Marco Balduzzi, Carmen Torrano Gimenez, Davide Balzarotti, and Engin Kirda,
Advanced Searching Engineering Village.
Engineering Village ™ Basic Searching.
Library Online Catalog Tutorial Pentagon Library Last Updated March 2008.
Engineering Village ™ ® Basic Searching On Compendex ®
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
Project Title: Deepin Search Member: Wenxu Li & Ziming Zhai CSCI 572 Project.
River Campus Libraries CUIPID Catalog User-Interface Platform for Iterative Development David Lindahl Director of Digital Library Initiatives River Campus.
Servlets and a little bit of Web Services Russell Beale.
What is SEO ? Search engine optimisation Way to optimise your web-site to increase your page rank in SE.
An Overview of Database Access on the Web An Overview of Database Access on the Web Using ASP and Microsoft Database Technology Sheffield Hallam University.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
Tutorial Holdings Management Adding, Editing, and Assigning Full Text Finder Links support.ebsco.com.
HTTP Overview Vijayan Sugumaran School of Business Administration Oakland University.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
Web Programming Language Dr. Ken Cosh Week 1 (Introduction)
Meta Tags What are Meta Tags And How Are They Best Used?
Websites with Weebly are easy!. Easy Website Creation with Weebly Making your library media center’s web presence current and effective Holly Frilot,
Blogs & feeds Jim des Rivieres Oct. 16, Grappling with question of how to present Jazz/OSLC data resources “Pure” data resources are presentation-
Photo Sharing with Flickr Rob Barth Web 2.0 In The Know.
© 2011 Autodesk Automating Autodesk® Revit® Server Rod Howarth Software Development Manager – Bornhorst + Ward.
1 Web Developer & Design Foundations with XHTML Chapter 13 Key Concepts.
Website Reconstruction using the Web Infrastructure Frank McCown Doctoral Consortium June.
CHAPTER 12 COOKIES AND SESSIONS. INTRO HTTP is a stateless technology Each page rendered by a browser is unrelated to other pages – even if they are from.
Using a Web Browser What does a Web Browser do? A web browser enables you to surf the World Wide Web. What are the most popular browsers?
V |© OverDrive, Inc | Page 1 Track circulation and make informed purchases using the Reports feature in Content Reserve. Contact:
Search Engine Optimization ext 304 media-connection.com The process affecting the visibility of a website across various search engines to.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
SEO  What is it?  Seo is a collection of techniques targeted towards increasing the presence of a website on a search engine.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
COMP3121 E-Commerce Technologies Richard Henson University of Worcester November 2011.
Julie Hannaford Director, Information Resources & Services OISE, University of Toronto Image credit to:
Design, Development & SEO: Building Search-Friendly Websites Justin Briggs SEO Manager at Paramore|Redd.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
System for Administration, Training, and Educational Resources for NASA SATERN Overview for Users December 2009.
Search Engine Optimization 101 What is SEM? SEO? How can I use SEO on my blogs and/or my personal web space?
Websites with Weebly are easy!. Easy Website Creation with Weebly Holly Frilot, Library Media Specialist Collins Hill High School, Suwanee, GA.
Appendix E: Overview of HTTP ©SoftMoore ConsultingSlide 1.
Videos, the More Tag, Permalinks, and Shortlinks.
Operating Systems Lesson 12. HTTP vs HTML HTML: hypertext markup language ◦ Definitions of tags that are added to Web documents to control their appearance.
Fundamentals of Web DevelopmentRandy Connolly and Ricardo HoarFundamentals of Web DevelopmentRandy Connolly and Ricardo Hoar Fundamentals of Web DevelopmentRandy.
Overview of Servlets and JSP
Our MP3 Search Engine Crawler –Searching for Artist Name –Searching for Song Title Website Difficulties Looking Back.
Automatic and Precise Client-Side Protection against CSRF Attacks.
What Is XSS ? ! Cross-site scripting (XSS) is a type of computer security vulnerability typically found in Web applications. XSS enables attackers to.
Websites with Weebly are easy!. What is Weebly? Weebly is a free website creator. It is very easy to use. If you feel comfortable creating documents with.
Embedding a Video, Image or Other Content Another way to add video or other content into your pages is through embedding. A popular example of this is.
What is Seo? SEO stands for “search engine optimization.” It is the process of getting traffic from the “free,” “organic,” “editorial” or “natural” search.
Google Analytics Graham Triggs Head of Repository Systems, Symplectic.
Making the Most of a Hybrid Alfresco Solution From Genesys Telecommunications: Michael Katten, Director of Technical Publications Joe McMonagle, Manager.
Maximizing Resort Image Better Your Winning Percentage Social Media, SEO and Booking Engines Give the House the Advantage Fermin Cruz Dial An Exchange.
THE FUTURE IS HERE: APPLICATION- AWARE CACHING BY ASHOK ANAND.
REST API Design. Application API API = Application Programming Interface APIs expose functionality of an application or service that exists independently.
Technical SEO tips for Web Developers Richa Bhatia Singsys Pte. Ltd.
SEO Tactics Search Engines Optimization is the best process which helps to improve your business in search engine mediums and social mediums such as Facebook,
Search Engine Optimization (SEO)
Syndication For Search Engines
Internet Searching: Finding Quality Information
Jill Sullivan Senior Marketing Manager Infront Webworks
Fred Dirkse CEO, OIC Group, Inc.
WorldCat: Broad Web visibility for our collection
Adding , Editing, and Assigning Full Text Finder Links
Traditional Internet Applications
Presentation transcript:

Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson TPDL 2011 Berlin, Germany 9/26/11

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 2 Linking to a particular copy “Rolling Stones - Satisfaction”

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 3 Metadata lost when YouTube video disappears video title The Rolling Stones – Satisfaction url

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 4 Metadata hard to recover from Search Engines

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 5 But nearly 300 copies remain in YouTube

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 6 Linking music-related URIs ‣ Transparent URI semantics ‣ ‣ ‣ ‣ Opaque URI semantics ‣ ‣

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 7 Popular Music US Top 40 Singles Charts of 9/25/10

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 8 Popular Music Selected Music Blogs

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 9 Popular Music The 500 Greatest Songs of all Time

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 10 Total Result Size Range US Top 40 Singles Charts of 9/25/10 123,239 83,298 43, Lady Gaga Alejandro Selena Gomez & The Scene A Year Without Rain

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 11 Total Result Size Range Selected Music Blogs 264, , ,936 Lady Gaga Bad Romance Mariah Carey featuring Juelz Santana & Bone Thugs-n-Harmony Don't Forget About Us

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 12 Total Result Size Range The 500 Greatest Songs of all Time 174, , ,076 Michael Jackson Billie Jean The Isley Brothers That Lady (Part 1 and 2)

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson URI Unavailability Rooted from a selected collection 13

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 14 URI Unavailability Expected Half-life

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 15 URI Publication and Removal Rate

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 16 Lifetimes of unavailable videos Years Month

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 17 Reasons for no unavailable videos

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 18 When a YouTube video disappears ‣ video title The Rolling Stones - Satisfaction ‣ url ‣ Published :44Removed (300 days online) HTTP/ Not FoundContent-Type: text/html; charset=utf-8

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 19 Metadata purged from YouTube Databases ‣ Video feed curl -I " HTTP/ Not Found Content-Type: text/html; charset=UTF-8 Private video ‣ Related videos curl -I " HTTP/ Not Found Content-Type: text/html; charset=UTF-8 Parent Video not found ‣ Video comments curl -I " HTTP/ OK Content-Type: application/atom+xml; charset=UTF-8

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 20 Metadata Normalization Dereferencing ASIN via amazon.com Webservice: Artist: Michael Jackson Title: Billie Jean (Single Version)

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 21 Availability of music-related metadata ‣ parsed out only at the first time a URI showed up in the result list for the first time ‣ YouTube crawling restrictions ‣ Remaining portion ‣ query video title against music related services via search engines ‣ Google/Yahoo! with site parameter

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 22 Retrieving and preserving a video’s metadata ‣ Active preservation attempt once a video copy is available ‣ Parse HTML out for structured music-related metadata ‣ YouTube generated meta data ‣ AmazonMP3 affiliate link ‣ search engines with free-form video title against music-related websites ‣ Preserving metadata into the public web infrastructure

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 23 Preservation Prototype

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 24 Metadata preservation Example: twitter

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 25 Pointing to a Resolver service ‣ UYc/ ‣ Author-side approach ‣ content creator points directly to a resolver service ‣ Server-side approach ‣ Plugin/Renderer class automatically rewrites YouTube video watch URIs to resolver service ‣ Client-side approach ‣ Web-Browser plugin intercepts click on Youtube video watch URIs and redirects to resolver service

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 26 YouTube Resolver service redirect HTTP/ OK search for preserved metadata ‣ in list of designated accounts query YouTube API with those HTTP/ Not Found HTTP Status Code HTTP/ See Others * *) exact best available granularity Provided (and evaluate) alternative copies

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 27 Future Work ‣ Evaluation of preservation and retrieval quality of chosen services ‣ exchange services ‣ additional automation of preservation process ‣ once YT URI was passed for resolving ‣ Evaluation of retrieved available copies ‣ redirect to best copy instead of returning a list to choose ‣ Consider international requesters ‣ taking requester’s location (country) into account

TPDL /26/11 Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson 28 Summary ‣ Pointing to a specific YouTube video copy by its URI has a risk of disappearance ‣ alternative copies over time available ‣ YouTube URIs unlikely to be cached once gone ‣ YouTube metadata only reliable for available URIs ‣ active preservation attempt ‣ Introducing a level of indirection: Resolver service ‣ check URI status and location header ‣ search the public web for injected metadata ‣ query for alternative copies