A bad case of content reuse Validator Website to Validate License Violations Validator – Only requires the URI of the site to check for a license violation.

Slides:



Advertisements
Similar presentations
DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen
Advertisements

CONFIDENTIAL DIGITAL WATERMARKING ALLIANCE. CONFIDENTIAL DIGITAL WATERMARKING ALLIANCE 2 Digital Watermarking Alliance Charter The Digital Watermarking.
Copyright © 2008 Pearson Prentice Hall. All rights reserved Copyright © 2008 Prentice-Hall. All rights reserved. Committed to Shaping the Next.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Project 1 Introduction to HTML.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
School of something FACULTY OF OTHER University Library The Library’s Digital Repository or Whatever happened to MIDESS? Michael Emly Jonathan Ainsworth.
Microsoft Office Open XML Formats Brian Jones Lead Program Manager Microsoft Corporation.
The Internet 8th Edition Tutorial 1 Browser Basics.
Tutorial 7 Working with Multimedia. XP Objectives Explore various multimedia applications on the Web Learn about sound file formats and properties Embed.
1st Project Introduction to HTML.
Glenn Research Center at Lewis Field Software Assurance of Web-based Applications SAWbA Tim Kurtz SAIC/GRC Software Assurance Symposium 2004.
The Internet & The World Wide Web Notes
A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background Dublin.
HTML 1 Introduction to HTML. 2 Objectives Describe the Internet and its associated key terms Describe the World Wide Web and its associated key terms.
Chapter ONE Introduction to HTML.
8/17/2015CS346 PHP1 Module 1 Introduction to PHP.
Web Development & Design Foundations with XHTML Chapter 11 Key Concepts.
Chapter 1 Introduction to HTML, XHTML, and CSS
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Tutorial 7 Working with Multimedia. XP Introducing Multimedia Bandwidth is a measure of the amount of data that can be sent through a communication pipeline.
The purpose of this Software Requirements Specification document is to clearly define the system under development, that is, the International Etruscan.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Karolina Muszyńska. Reverse engineering - looking at the solution to figure out how it works Reverse engineering - breaking something down in order to.
Copyright © Allyn & Bacon 2008 POWER PRACTICE Chapter 7 The Internet and the World Wide Web START This multimedia product and its contents are protected.
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
An Overview of MPEG-21 Cory McKay. Introduction Built on top of MPEG-4 and MPEG-7 standards Much more than just an audiovisual standard Meant to be a.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Tutorial 7 Working with Multimedia. XP Objectives Explore various multimedia applications on the Web Learn about sound file formats and properties Embed.
Sharing Using Social Networks in a Composable Web of Things Presenter: Yong-Jin Jeong Korea University of Technology and Education.
10/5/2015CS346 PHP1 Module 1 Introduction to PHP.
HTML, XHTML, and CSS Sixth Edition Chapter 1 Introduction to HTML, XHTML, and CSS.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
HTTPA (Accountable Hyper Text Transfer Protocol) PhD Proposal Talk Oshani Seneviratne DIG, MIT CSAIL May 31, 2011.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Session ID: Session Classification: Dr. Michael Willett OASIS and WillettWorks DSP-R35A General Interest OASIS Privacy Management Reference Model (PMRM)
Design engineering Vilnius The goal of design engineering is to produce a model that exhibits: firmness – a program should not have bugs that inhibit.
Tutorial 7 Working with Multimedia. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Explore various multimedia applications.
A bad case of content reuse Validator Website to Validate License Violations Validator – Only requires the URI of the site to check This work by Oshani.
ITCS373: Internet Technology Lecture 5: More HTML.
XP Practical PC, 3e Chapter 8 1 Browsing and Searching the Web.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
What’s MPEG-21 ? (a short summary of available papers by OCCAMM)
Linked Data: Emblematic applications on Legacy Data in Libraries.
Semantic Clipboard User Interface is integrated in the Browser Architecture of the Semantic Clipboard Illustration of a license incompliant content reuse.
Validator Website to Validate URI License Violations Validator – Only requires the URI of the site to check A bad case of content reuse This work by Oshani.
The future of the Web: Semantic Web 9/30/2004 Xiangming Mu.
A centre of expertise in digital information managementwww.ukoln.ac.uk Making Effective Use Of Benchmarking Tools Brian Kelly UKOLN University of Bath.
Plug-in Architectures Presented by Truc Nguyen. What’s a plug-in? “a type of program that tightly integrates with a larger application to add a special.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
HTML Concepts and Techniques Fifth Edition Chapter 1 Introduction to HTML.
Website Design, Development and Maintenance ONLY TAKE DOWN NOTES ON INDICATED SLIDES.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Chapter 1 Introduction to HTML, XHTML, and CSS HTML5 & CSS 7 th Edition.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
A centre of expertise in digital information managementwww.ukoln.ac.uk UKOLN is supported by: This work is licensed under a Attribution- NonCommercial-ShareAlike.
Web Design Principles 5 th Edition Chapter 3 Writing HTML for the Modern Web.
Windows Vista Configuration MCTS : Internet Explorer 7.0.
Copyright 2007, Paradigm Publishing Inc. EXCEL 2007 Chapter 8 BACKNEXTEND 8-1 LINKS TO OBJECTIVES Import data from Access, a Web site, or a CSV text file.
HTML PROJECT #1 Project 1 Introduction to HTML. HTML Project 1: Introduction to HTML 2 Project Objectives 1.Describe the Internet and its associated key.
Using E-Business Suite Attachments
Chapter 1 Introduction to HTML.
Project 1 Introduction to HTML.
Policy Aware Content Reuse on the Web
Distributed web based systems
Tutorial 7 Working with Multimedia
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
THREE TIER MOBILE COMPUTING ARCHITECTURE
Microsoft Excel 2007 – Level 2
Knowledge Sharing Mechanism in Social Networking for Learning
Presentation transcript:

A bad case of content reuse Validator Website to Validate License Violations Validator – Only requires the URI of the site to check for a license violation Please send your comments to Oshani Seneviratne Advisor: Tim Berners-Lee Reusing content saves resources and fosters creativity. However, reusing a particular piece of content without honoring the license expressed with it may violate the original content creator’s rights. There are several reasons this situation might happen. The person reusing the content may be: too lazy to check for the licenses hidden in the XHTML weary of the multi-step operations required to embed the license metadata ignorant as to what each of the licenses mean At the same time, the original content creator would also be interested in knowing whether someone has violated his or her license terms. Flickr has over 100 million Creative Commons Licensed images. Given a sample of web pages which embed such images, how many of these are properly attributed as specified in their licenses? Sample 1 (67 sites, 426 images)‏ Properly attributed images = 28 Misattributed images = 333 Misattribution = 78 % Sample 2 (70 sites, 241 images)‏ Properly attributed images = 8 Misattributed images = 194 Misattribution = 80 % Sample 3 (70 sites, 466 images)‏ Properly attributed images = 6 Misattributed images = 439 Misattribution = 94 % Results of the experiment summarized Build Policy Aware Systems, such as:  Validators to tell users what information is missing or inaccurate  Seamlessly integrate metadata by detecting and assisting in embedding the licenses  Notify users if their content is used in an inappropriate manner Policies are pervasive in web applications as they play a crucial role in enhancing security, privacy and usability of services offered on the Web. Use of Creative Commons licenses is the widely accepted method of expressing rights of the original content creators when it comes to digital multimedia content on the Web. The DRM alternative is often too prohibitive, and has a central point of failure from a policy perspective. Therefore rather than applying an enforcement model, the focus is on building a framework based on open standards and protocols which enables users to reuse content in a policy aware manner very easily. How can you Extract License Metadata? 1. Through APIs which expose the licenses. E.g. Flickr 2. Through RDFa (Resource Description Framework in Attributes) A simple scenario which illustrates a rights violation of a content creator: Check whether a particular site has any embedded Flickr images which are not properly attributed as specified in the Creative Commons license. Spider: This is a site crawler which searches for all the links in a given seed site using a Breadth First search algorithm to determine any embedded Flickr images. License Checker: This extracts the photo id from the Flickr image URI. Then all the information related to the photo is obtained through the Flickr API. Based on this information, the DOM of the page is checked for the proper attribution. Architecture of the Semantic Clipboard and the Interactions between each of the modules Enable transfer of content between Web applications with minimal effort in a policy aware manner, i.e. when content is copied, license metadata is also copied and pasted appropriately in the target application. Try it Out! More Information RDFa Extractor: Extracts all the semantic information in the form of RDF attributes embedded in the HTML page the user browses. RDF Store: Indexes and stores all the RDF data from the pages that the user has visited in a given browser session. Semantic Clipboard: Acts as the control panel to co-ordinate the copy and paste operations. Database: This is used to make the data persistent across browser sessions. Composer: Reasons whether the content can be used based on the source and the destination license terms. Prepares the content and the license metadata in a suitable manner in to be pasted right in to the target. Policy Aware Content Reuse on the Web The Problem The Solution How much of a problem is this? Background Try it out! validator.py More Information Exchange FlickrCC Attribution License Violations Validator Semantic Clipboard Goal Components  Assessment on the level of policy-awareness on the Web  Provide a platform to use the data exposed on the Semantic Web  A License Violations Validator for Flickr images: to check for license violations use the information given by the validator to be policy-compliant  Semantic Clipboard: to detect reusable content while browsing seamlessly integrate such content along with their metadata  Assess the level of violations with regards to other types of licenses such as ‘no commercial use’, ‘share alike’ and ‘no derivatives’  Assess the level of license violations on other types of media  Extend to licenses embedded in free-floating content  Explore new and efficient ways of license violations detection  Improve the User Interfaces of the CC license violations validator and the Semantic Clipboard Contributions Future Work More Information Notification System: This will pretty- print and report the images with missing attributions in a Web interface. The user can then use the missing information in his or her own work to be license compliant. User Checker (optional): This module can be used to send actual notifications to the original content creators for any violations if the system is linked to some user base. All of these components are implemented in the Tabulator, a Semantic Web Browser which can be installed as a Firefox Extension. Semantic Clipboard can be turned on/off through a menu option. When using the application, content can be collected from a variety of sources. A browser based editor is used to demo how an application could call the Clipboard for the content, and embed it with the license metadata or warn if the target document’s license is incompatible with the source license. A simple experiment was conducted to get an assessment on this, and the results are as follows: