Validator Website to Validate URI License Violations Validator – Only requires the URI of the site to check A bad case of content reuse This work by Oshani.

Slides:



Advertisements
Similar presentations
CNIT 132 – Week 9 Multimedia. Working with Multimedia Bandwidth is a measure of the amount of data that can be sent through a communication pipeline each.
Advertisements

Copyright © 2008 Pearson Prentice Hall. All rights reserved Copyright © 2008 Prentice-Hall. All rights reserved. Committed to Shaping the Next.
UNDERSTANDING JAVA APIS FOR MOBILE DEVICES v0.01.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Project 1 Introduction to HTML.
Information Retrieval in Practice
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Microsoft Office Open XML Formats Brian Jones Lead Program Manager Microsoft Corporation.
Management of information. Objectives Discuss the benefits of good management practice Present reference management tools Present bookmark management.
Document Management System
HTML 1 Introduction to HTML. 2 Objectives Describe the Internet and its associated key terms Describe the World Wide Web and its associated key terms.
Chapter ONE Introduction to HTML.
8/17/2015CS346 PHP1 Module 1 Introduction to PHP.
Databases & Data Warehouses Chapter 3 Database Processing.
Web Development & Design Foundations with XHTML Chapter 11 Key Concepts.
IT 210 The Internet & World Wide Web introduction.
Chapter 1 Introduction to HTML, XHTML, and CSS
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
The purpose of this Software Requirements Specification document is to clearly define the system under development, that is, the International Etruscan.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Describe several.
Interoperability Scenario Producing summary versions of compound multimedia historical documents.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Sharing Using Social Networks in a Composable Web of Things Presenter: Yong-Jin Jeong Korea University of Technology and Education.
10/5/2015CS346 PHP1 Module 1 Introduction to PHP.
HTML, XHTML, and CSS Sixth Edition Chapter 1 Introduction to HTML, XHTML, and CSS.
The Semantic Web and Microformats. The Semantic Web Syntax = how you say something – Letters, words, punctuation Semantics = meaning behind what you say.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
HTTPA (Accountable Hyper Text Transfer Protocol) PhD Proposal Talk Oshani Seneviratne DIG, MIT CSAIL May 31, 2011.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Session ID: Session Classification: Dr. Michael Willett OASIS and WillettWorks DSP-R35A General Interest OASIS Privacy Management Reference Model (PMRM)
Design engineering Vilnius The goal of design engineering is to produce a model that exhibits: firmness – a program should not have bugs that inhibit.
A bad case of content reuse Validator Website to Validate License Violations Validator – Only requires the URI of the site to check This work by Oshani.
A bad case of content reuse Validator Website to Validate License Violations Validator – Only requires the URI of the site to check for a license violation.
Future Learning Landscapes Yvan Peter – Université Lille 1 Serge Garlatti – Telecom Bretagne.
ITCS373: Internet Technology Lecture 5: More HTML.
U.S. Department of Commerce Web Advisory Group Minding Your Own Business The Platform for Privacy Preferences Project.
ITGS Databases.
What’s MPEG-21 ? (a short summary of available papers by OCCAMM)
Semantic Clipboard User Interface is integrated in the Browser Architecture of the Semantic Clipboard Illustration of a license incompliant content reuse.
MEMBERSHIP AND IDENTITY Active server pages (ASP.NET) 1 Chapter-4.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
What is Web Information retrieval from web Search Engine Web Crawler Web crawler policies Conclusion How does a web crawler work Synchronization Algorithms.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
1 State and Session Management HTTP is a stateless protocol – it has no memory of prior connections and cannot distinguish one request from another. The.
HTML Concepts and Techniques Fifth Edition Chapter 1 Introduction to HTML.
Website Design, Development and Maintenance ONLY TAKE DOWN NOTES ON INDICATED SLIDES.
Chapter 1 Introduction to HTML, XHTML, and CSS HTML5 & CSS 7 th Edition.
Information Resource Stewardship A suggested approach for managing the critical information assets of the organization.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
A centre of expertise in digital information managementwww.ukoln.ac.uk UKOLN is supported by: This work is licensed under a Attribution- NonCommercial-ShareAlike.
Understanding Web-Based Digital Media Production Methods, Software, and Hardware Objective
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Windows Vista Configuration MCTS : Internet Explorer 7.0.
Objective % Select and utilize tools to design and develop websites.
Sarah Whitcher Kansa (Open Context / Alexandria Archive Institute)
Using E-Business Suite Attachments
Chapter 1 Introduction to HTML.
Project 1 Introduction to HTML.
Policy Aware Content Reuse on the Web
Objective % Select and utilize tools to design and develop websites.
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
THREE TIER MOBILE COMPUTING ARCHITECTURE
Microsoft Excel 2007 – Level 2
2/24/2019 6:15 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
5.00 Apply procedures to organize content by using Dreamweaver. (22%)
Presentation transcript:

Validator Website to Validate URI License Violations Validator – Only requires the URI of the site to check A bad case of content reuse This work by Oshani Seneviratne is licensed under Creative Commons Attribution - Non Commercial - Share Alike 3.0 license. Please send your comments to Policies Content Reuse Oshani Seneviratne Decentralized Information Group, MIT Reusing content saves resources and fosters creativity. However, reusing a particular piece of content without honoring the license expressed with it may violate the original content creator’s rights. There are several reasons this situation might happen. The person who is reusing the content may be: too lazy to check for the licenses hidden in the XHTML weary of the multi-step operations required to embed the license metadata ignorant as to what each of the licenses mean At the same time, the original content creator would also be interested in knowing whether someone has violated his or her license terms. Flickr has over 100 million Creative Commons Licensed images. Given a sample of web pages which embed such images, how many of these are properly attributed as specified in their licenses? Screenshots of the results from the experiment Sample 1 (67 sites, 426 images) Properly attributed images = 28 Misattributed images = 333 Misattribution = 78 % Sample 2 (70 sites, 241 images) Properly attributed images = 8 Misattributed images = 194 Misattribution = 80 % Sample 3 (70 sites, 466 images) Properly attributed images = 6 Misattributed images = 439 Misattribution = 94 %  Assessment on the level of policy-awareness on the Web  Provide a platform to use the data exposed on the Semantic Web  A License Violations Validator for Flickr images: to check for any license violations use the information given by the validator to be policy-compliant  Semantic Clipboard: to detect reusable content while browsing seamlessly integrate such content along with their metadata  Assess the level of violations with regards to other types of licenses such as ‘no commercial use’, ‘share alike’ and ‘no derivatives’  Assess the level of license violations on other types of media  Extend to licenses embedded in free-floating content  Explore new and efficient ways of license violations detection  Improve the User Interfaces of the CC license violations validator and the Semantic Clipboard Results of the experiment summarized An experiment was conducted to check this: Samples of sites were randomly generated from the Technorati cosmos (which can be used to retrieve sites linking to a given base URI, in this case, a Flickr Farm URI). Then attribution was checked for each of the embedded images in those sites. The results from 3 samples are as follows: Build Policy Aware Systems, such as:  Validators to tell users what information is missing or inaccurate  Seamlessly integrate metadata by detecting and assisting in embedding the license information  Notify users if their content is used in an inappropriate manner Policies are pervasive in web applications as they play a crucial role in enhancing security, privacy and usability of services offered on the Web. Use of Creative Commons licenses is the widely accepted method of expressing rights of the original content creators when it comes to digital multimedia content on the Web. The Digital Rights Management alternative is often too prohibitive, and has a central point of control, thus a central point of failure from a policy perspective. Therefore rather than applying an enforcement model, the focus is on building a framework based on open standards and protocols which enables users to reuse content in a policy aware manner with ease. Extracting License Metadata 1. Through APIs which expose the licenses For e.g. Flickr allows users to specify the license associated with their images. These license information can then be queried through the Flickr API. 2. Through RDFa (Resource Description Framework in Attributes) Creative Commons licenses can be expressed in machine readable form using RDFa. The content creator and consumer can use RDFa for rights expression and rights/policy compliance respectively. A simple scenario which illustrates a rights violation is given below. Check whether a particular site has any embedded Flickr images which are not properly attributed as specified in the Creative Commons license. Spider: This is essentially a site crawler which will search for all the links embedded in the given seed site using a Breadth First search algorithm to determine any embedded images. This crawler avoids straying outside of the site, but instead simply dig down into a single web page. If it detects any embedded Flickr images, this will extract the photo id from the Flickr URI. Using this photo id, all the information related to the photo could be obtained through the Flickr API. License Checker: If a photo has a CC license attached, according to the CC 2.5 specification, regardless of the purpose for what it is used for, the photo should be given proper attribution. This module also checks to which Flickr user this photo belongs to, by querying the Flickr API using the photo id, and then construct the Flickr user URI to check for attribution. Notification System: This will pretty-print and report the images which are missing attributions in a Web interface. The user can then use the missing information in his or her own work to be license compliant. User Checker (optional): This module can be used to send actual notifications to the original content creators for any violations, if the system is linked to some user base. The components of the FlickrCC License Violations Validator Architecture of the Semantic Clipboard and the Interactions between each of the modules Try it out! validator.py More Information WSRI-Exchange Enable transfer of content between Web applications with minimal effort in a policy aware manner, i.e. when content is copied, license metadata is also copied and pasted appropriately in the target application. Install Tabulator and Try it Out! More Information RDFa Extractor: Extracts all the semantic information in the form of RDF attributes embedded in the HTML page the user browses. RDF Store: Indexes and stores all the RDF attributes from the pages that the user has visited in a given browser session. Semantic Clipboard: Acts as the control panel to co-ordinate the copy and paste operations. Database: Implemented using the Firefox SQLite ‘Storage Connection’ API, and persists data across browser sessions. All of these components are implemented in the Tabulator, a Semantic Web Browser which can be installed as a Firefox Extension. Semantic Clipboard can be turned on/off through a menu option. When using the application, content can be collected from a variety of sources. Once the user selects the content to be reused, it will be made persistant in a database. A browser based editor is used to demo how an application could call the Clipboard for the content, and embed it with the license metadata or warn if the target document’s license is incompatible with the source license. Composer: Reasons whether the content can be used based on the source and the destination license terms. Prepares the content and the license metadata in a suitable manner in to be pasted right in to the target DOM.