Using copy-detection and text comparison algorithms for cross- referencing multiple editions of literary works A. Zaslavsky, Alejandro Bia, K. Monostori,

Slides:



Advertisements
Similar presentations
Business School How to Interpret Turnitin Reports Lindsay Williams Senior Lecturer in Business and Management.
Advertisements

Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
SPICE! An Ontology Based Web Application By Angela Maduko and Felicia Jones Final Presentation For CSCI8350: Enterprise Integration.
Safeguarding and Charging for Information on the Internet Hector Garcia-Molina, Steven P. Ketchpel, Narayanan Shivakumar Stanford University Presented.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Web Search - Summer Term 2006 III. Web Search - Introduction (Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ADVISE: Advanced Digital Video Information Segmentation Engine
1 SWE Introduction to Software Engineering Lecture 22 – Architectural Design (Chapter 13)
Interfaces for Selecting and Understanding Collections.
©TheMcGraw-Hill Companies, Inc. Permission required for reproduction or display. COMPSCI 125 Introduction to Computer Science I.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Automated Changes of Problem Representation Eugene Fink LTI Retreat 2007.
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
©TheMcGraw-Hill Companies, Inc. Permission required for reproduction or display. COMPSCI 125 Introduction to Computer Science I.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Copy-detection b Copy-prevention Physical isolationPhysical isolation Hardware for authorisationHardware for authorisation Active documentsActive documents.
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
Lecture 5 Geocoding. What is geocoding? the process of transforming a description of a location—such as a pair of coordinates, an address, or a name of.
Cluj Napoca, 28 August IEEE International Conference on Intelligent Computer Communication and Processing Digital Libraries Workshop Towards.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Module 1: Introduction to Microsoft SQL Server 7.0.
A summary of the report written by W. Alink, R.A.F. Bhoedjang, P.A. Boncz, and A.P. de Vries.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Final Year Project Interim Presentation Software Visualisation and Comparison Tool Presented By : Shane Lillis, , 4th Year Computer Engineering.
報告人 : 葉瑞群 日期 :2012/01/9 出處 : IEEE Transactions on Knowledge and Data Engineering.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
A Survey of Patent Search Engine Software Jennifer Lewis April 24, 2007 CSE 8337.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
ISpheresImage iSpheresImage Feature Overview and Progress Summary.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Technical Overview. Project Overview Document Library Document List Index TransmittalsPlanning.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
TEMPLATE DESIGN © E-Eye : A Multi Media Based Unauthorized Object Identification and Tracking System Tolgahan Cakaloglu.
LaHave House Project 1 LaHave House Project Automated Architectural Design BML + ARC.
May 26-28ICNEE 2003 ARCHON: BUILDING LEARNING ENVIRONMENTS THROUGH EXTENDED DIGITAL LIBRARY SERVICES Hesham Anan, Kurt Maly, Mohammad Zubair,et al. Digital.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
ONLINE SEARCH AND REDACTION SYSTEM Many concepts of digitalization which aim is to present datas on internet are faced with two main subjects and problems:
Computing and Information Technology Interactive Digital Educational Library Technical Development Content Collection Edward Fox (director) John A. N.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Soon Joo Hyun Database Systems Research and Development Lab. US-KOREA Joint Workshop on Digital Library t Introduction ICU Information and Communication.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
June 3-6, 2003E-Society Lisbon Automatic Metadata Discovery from Non-cooperative Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science.
Use Google Scholar! What the experts say: Use Google Scholar Use simple search for articles on library homepage Better: in the digital library main screen.
Capture and Storage of Tabular Data Leveraging Ephesoft and Alfresco W. Gary Cox Senior Consultant Blue Fish Development Group.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
A Generic Toolkit for Electronic Editions of Medieval Manuscripts
Digital Video Library - Jacky Ma.
Athanasios Topaloudis 3rd Forum 15/02/2017
CSCI-235 Micro-Computer Applications
S.Rajeswari Head , Scientific Information Resource Division
Software Documentation
BasketLens: Searching for baskets of words in text collections
DIGITAL LIBRARY.
March 8, 2000 IS 240: Principles of Information Retrieval
Data Science with Python
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Chapter 1: The Database Environment
A New String Matching Algorithm Based on Logical Indexing
Title Introduction: Discussion & Conclusion: Methods & Results:
Presentation transcript:

Using copy-detection and text comparison algorithms for cross- referencing multiple editions of literary works A. Zaslavsky, Alejandro Bia, K. Monostori, School of Computer Science & Software Engineering Australia Monash University, Australia, Spain & Miguel de Cervantes DL, University of Alicante, Alicante, Spain, European Conference on Digital Libraries, Darmstadt, 2001

Overview Copy-detection, plagiarism and comparative literary analysis Text processing in DLs and humanities research Tools and approaches MatchDetectReveal architecture Cervantes's Quijote DL & MDR Conclusion

Introduction Problems Intellectual property Plagiarism Search results Copy-prevention Special hardware Active documents Copy-detection Plagiarism.org SCAM Koala sif

Copy-detection Digital watermarking Codewords Line-shift coding Word-shift coding Feature coding String comparison 30 32

Copy-Detection Architecture Registration Module Comparison Module Parsing Module

MatchDetectReveal(MDR) Internet MDR users MDR customizer 4matching engine 4format converter 4search engine 4visualiser local repository matching rule DB indexes Similarity & overlap rule interpreter IEEE DL ACM DL Local cluster Global resources Base Document Set Generator      

Example screen dump

Conclusion Comparative analysis of editions Cleaning up OCR output Performance Text ordering not necessary Fine granularity of overlap detection

Future Work Similar blocks of text XML output Rules for overlap & similarity Visualisation of results