A DΙgital Library Infrastructure on Grid EΝabled Technology SAPIR – Search in Audio Visual Content.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Databases. What is a database? It is a collection of information, which can be searched and sorted. It can be information about anything. Toys, pupils,
Presentation and Multimedia
R and HDInsight in Microsoft Azure
Mark J. Myers Electronic Records Archivist, KY Dept for Libraries and Archives (2001-May, 2014) Electronic Records Specialist, TX State Library and Archive.
ARCHIVE IMAGING SEARCHABLE VIA THE WEBPAC Marthie de Kock The Hong Kong Institute of Education 9 December 2002.
Digital Video Archiving. ViArchive Overview ViArchive provides user friendly solutions for… – uploading video clips with metadata (searchable file info.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Chorus cluster meeting, Vilamoura April SAPIR Search in Audio-visual content using P2p IR Yosi Mass, Raul Santos.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Basic Computer Networks Configurations (cont.) School of Business Eastern Illinois University © Abdou Illia, Spring 2006 Week 2, Thursday 1/19/2006)
Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula.
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
Metadata Presentation by Rick Pitchford Chief Engineer, School of Communication COM 633, Content Analysis Methods Fall 2009.
Grid and Cloud Computing By: Simon Luangsisombath.
Web 2.0: Concepts and Applications 3 Syndicating Content.
What is Asset Bank? Asset Bank is an enterprise-scale Digital Asset Management system A fully searchable, categorised library of digital images, videos.
OU Digital Library development project Liz Mallett – Project Manager James Alexander – Project Developer 25 January 2012.
WDK Driver Test Manager. Outline HCT and the history of driver testing Problems to solve Goals of the WDK Driver Test Manager (DTM) Automated Deployment.
©2011 Quest Software, Inc. All rights reserved. Steve Walch, Senior Product Manager Blog: November, 2011 Partner Training Webcast.
Cluj Napoca, 28 August IEEE International Conference on Intelligent Computer Communication and Processing Digital Libraries Workshop Towards.
Digital Library Architecture and Technology
Universität Stuttgart Universitätsbibliothek Information Retrieval on the Grid? Results and suggestions from Project GRACE Werner Stephan Stuttgart University.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
Tutorial. What is Instagram? Instagram is a free, online photo sharing, video sharing and social networking service that enables users to take pictures.
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
Simple Database.
Grid Appliance – On the Design of Self-Organizing, Decentralized Grids David Wolinsky, Arjun Prakash, and Renato Figueiredo ACIS Lab at the University.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Archival information system ARHiNET Croatian national archival information system Vlatka Lemić Croatian State Archives, Croatia.
Electronic Records Management: A Checklist for Success Jesse Wilkins April 15, 2009.
IST DIVAS Presentation 1 Advanced search technologies for digital audio-visual content.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Digitizing the Past Newsfilm Digitization Guideline With 3 cases & 3 tips June (JungYun) Oh ; Digital Preservation ; December 15, 2010.
Copenhagen, 7 June 2006 Toolkit update and maintenance Anton Cupcea Finsiel Romania.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Power at Your Fingertips –Overlooked Gems in Oracle EM John Sheaffer Principal Sales Consultant – Oracle Corporation.
Image Comparison Tool Product Proposal Tim La Fond and Peter Beckfield.
Going Google… Drive Eric Yamoah and Haris Azmi August 14, 2015.
1 May File allocation system with minimized reallocation for multimedia home server Hironori Sakakihara TA 8 Technical Secretary 100/AGS483.
Chittampally Vasanth Raja 10IT05F vasanthexperiments.wordpress.com.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
MMDB-9 J. Teuhola Standardization: MPEG-7 “Multimedia Content Description Interface” Standard for describing multimedia content (metadata).
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Soon Joo Hyun Database Systems Research and Development Lab. US-KOREA Joint Workshop on Digital Library t Introduction ICU Information and Communication.
SAPIR Search in Audio-Visual Content using P2P Information Retrival For more information visit: Support.
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
WebDat: A Web-based Test Data Management System J.M.Nogiec January 2007 Overview.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
The world’s libraries. Connected. CONTENTdm ® Digital Collection Management Solutions Learn what to consider when outsourcing your library’s digitization.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Torrent-based software distribution
Multimedia Training Kit
SCALABLE OPEN ACCESS Hussein Suleman
Peer to Peer Information Retrieval
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Network Controllable MP3 Player
Adding content – Wearable technology
The Bentley Digital Media Library
Presentation transcript:

A DΙgital Library Infrastructure on Grid EΝabled Technology SAPIR – Search in Audio Visual Content

Digital Libraries Powered by the Grid The SAPIR Project 1/2 The searchable space created by the massive amounts of existing video and multimedia files greatly exceeds the area searched by today's major engines. Traditional search engines are limited to searching in the associated text and meta-data of the multimedia content. If content providers don't clearly or accurately describe their multimedia files, or use inaccurate tags, the current method falls short. SAPIR goal is to provide searches over huge quantities of multimedia objects, using both text and multimedia features and exploiting the similatiry and search-by-examples queries Centralized search engines prove not to be efficient nor scalable for this task

Digital Libraries Powered by the Grid The SAPIR Project 2/2 SAPIR is based on a scalable, completely decentralized, largely self-organizing P2P system where peers act both as client and servers and the users produce audio-visual content using multiple devices The project aims at proving in practice the theoretical advantages shown by multimedia P2P systems like MCAN and MChord This technology can provide a significant advantage to the European community over existing, centralized, text-only search engines and can be applied to various fields such as government services, tourism, healthcare, and more.

Digital Libraries Powered by the Grid Challenge  In order make significative tests with the proposed infrastructure, the project needs a huge amount of data  Starting from June, we started extracting metadata from the Flickr archive (  The data collected so far represents the world’s largest multimedia metadata collection available  Target collection: 100 million of images  Already processed: 40 million of images  For each image we keep text data and 3 MPEG-7 features → 160 millions of processed features.

Digital Libraries Powered by the Grid Data Challenge: Procedures The Data Challege is being conducted as follows  Each job consists in a pilot job. When it starts, it  Download from a SAPIR server and install a package that contains  The configuration files  The sw needed to run the application  The SAPIR application  Starts the SAPIR application  The SAPIR application  download from the SAPIR server the ID range of the photos to download  Starts a series of threads  Each thread  Download an image form the ID range  Extract the text and MPEG-7 features  Upload the features on the SAPIR database

Digital Libraries Powered by the Grid Challenge Flickr CECE … … SAPIR Pilot Job SAPIRSAPIR Config IDs

Digital Libraries Powered by the Grid Data Challenge : images  Preparation and porting of the Sapir application to the grid  Starting date 16/06  Computing resources: non-grid nodes and 3 PPS-Sites managed by DILIGENT  PPS-SNS, PPS-CNR, PPS-ESRIN  Total number of images processed 5 M  Data Challenge  Starting date 16/07  Total number of images processed till the 26/09 > 36 M

Digital Libraries Powered by the Grid Data Challenge : jobs  Number of jobs submitted :  Preparation phase 250 jobs a day  Data Challenge  From 16/07 to 29/ jobs a day  From 20/07 till now 1000 jobs a day  Variable number of images processed per jobs  Preparation phase 250 images per job  Data Challenge 1000 images per job

Digital Libraries Powered by the Grid Data Challenge: performance  Total number of jobs: 52 k  Total number of jobs processed successfully: 38.2 k  Total number of jobs failed: 13.8 k  Failure rate: 26%  Total number of images expected: M  Total number of images processed: 31.8 M (4.2 TB) JobsImages PhaseSubmittedProcessedExpectedProcessed Preparation M1.3 M DC 1 st part M2.5 M DC 2 nd part M28 M