Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP3 – Retrieval systems.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP1. Project Management.
Software Quality Assurance Plan
Multi-Model Digital Video Library Professor: Michael Lyu Member: Jacky Ma Joan Chung Multi-Model Digital Video Library LYU9904 Multi-Model Digital Video.
1 Texmex – November 15 th, 2005 Strategy for the future Global goal “Understand” (= structure…) TV and other MM documents Prepare these documents for applications.
Information Retrieval in Practice
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.
AceMedia Personal content management in a mobile environment Jonathan Teh Motorola Labs.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Overview of Search Engines
MUSCLE WP9 E-Team Integration of structural and semantic models for multimedia metadata management Aims: (Semi-)automatic MM metadata specification process.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Smart Learning Services Based on Smart Cloud Computing
Exploring a million hours of sounds Richard Ranft, The British Library 27 November 2014 Search Solutions 2014.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
WP5.4 - Introduction  Knowledge Extraction from Complementary Sources  This activity is concerned with augmenting the semantic multimedia metadata basis.
Multi-agent Research Tool (MART) A proposal for MSE project Madhukar Kumar.
Introduction to Interactive Media The Interactive Media Development Process.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Introduction.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
CHAPTER TEN AUTHORING.
18 Month Review Meeting Project # Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP1. Project Management Josh Reiss,
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal VideoConference Archives Indexing System.
Software Development Cycle What is Software? Instructions (computer programs) that when executed provide desired function and performance Data structures.
WP4 – Sound Object Identification WP5 – Enriched Access Tools.
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP2 – Media Semantics and Ontologies.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Prof. Thomas Sikora Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Integration Activities in “Tools for Tag Generation“
Systems Analysis and Design in a Changing World, Fourth Edition
Topics Covered Phase 1: Preliminary investigation Phase 1: Preliminary investigation Phase 2: Feasibility Study Phase 2: Feasibility Study Phase 3: System.
EASAIER Enabling Access to Sound Archives through Integration, Enrichment and Retrieval Ying Ding.
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval Annual Review Meeting - Introduction.
1 Applications of video-content analysis and retrieval IEEE Multimedia Magazine 2002 JUL-SEP Reporter: 林浩棟.
Introduction to Interactive Media Interactive Media Tools: Authoring Applications.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
Implementation of a Relational Database as an Aid to Automatic Target Recognition Christopher C. Frost Computer Science Mentor: Steven Vanstone.
Information Retrieval
M4 / September Integrating multimodal descriptions to index large video collections M4 meeting – Munich Nicolas Moënne-Loccoz, Bruno Janvier,
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
SAPIR Search in Audio-Visual Content using P2P Information Retrival For more information visit: Support.
MPEG-7 Audio Overview Ichiro Fujinaga MUMT 611 McGill University.
Pascal Kelm Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Video Key Frame Extraction for image-based Applications.
1 CS 430 / INFO 430 Information Retrieval Lecture 17 Metadata 4.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
General Architecture of Retrieval Systems 1Adrienn Skrop.
MPEG 7 &MPEG 21.
LREC – Workshop on Crossing media for Improved Information Access, Genova, Italy, 23 May Cross-Media Indexing in the Reveal-This System Murat Yakici,
MANAGEMENT INFORMATION SYSTEM
DARE: Domain analysis and reuse environment Minwoo Hong William Frakes, Ruben Prieto-Diaz and Christopher Fox Annals of Software Engineering,
Information Retrieval in Practice
Working meeting of WP4 Task WP4.1
Technologies: for Enhancing Broadcast Programmes with Bridgets
Visual Information Retrieval
Search Engine Architecture
Introduction Multimedia initial focus
Presentation transcript:

Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP3 – Retrieval systems

12 Month Review Meeting Project # Introduction to Workpackage overview  Objectives:  To provide retrieval systems offering the ability to search by various musical similarity measures.  To search for spoken words or phrases.  To search across different media for associated content.  Queries may be text-based, spoken or audio examples.  Tasks:  T3.1: Music retrieval  T3.2: Speech retrieval  T3.3: Cross-media retrieval  T3.4: Vocal query interface

12 Month Review Meeting Project # Introduction to Workpackage participants & schedule  Participants and their contributions  QMUL22 mm– music & cross media retrieval  DIT:5 mm– music retrieval  ALL:24 mm– speech retrieval & vocal queries  LFUI:3 mm– integration of retrieval engines  NICE:12 mm– retrieval for attributes of speech  Schedule  T3.1 Music retrieval:month 9 – month 20  T3.2 Speech retrieval:month 8 – month 20  T3.3: Cross-media retrieval:month 1 – month 6 month 21 – month 26  T3.4: Vocal query interface:month 7 – month 20

12 Month Review Meeting Project # Introduction to Workpackage Task Music retrieval  Searching and organizing music collections:  Adequate representation for the audio in the query  Textual and keyword: e.g Author, title, date, genre, etc  Automatic feature extraction  Low-level acoustic similarity measures  Mid-level features – characterize the rhythmic structure  High-level features  musically relevant parameters  visualisation of key events along the assets' timeline

12 Month Review Meeting Project # Introduction to Workpackage Task Speech retrieval  Retrieving the content of speech corpuses in English and Hungarian languages  Building test corpuses  Levels of recognition  Phoneme level recognition  Pronunciation dictionary filter or morphological analysis  Text corpus based language model  Phoneme and word level indexing  Fast retrieval  Improve the performance

12 Month Review Meeting Project # Introduction to Workpackage Task Cross-media retrieval  Searching media in various formats (audio recordings, video recordings, notated scores, images)  Using metadata  Feature extraction  Similarity measures  Optimised multidimensional search methods  Video analysis  enter a piece of media as a query and might retrieve an entirely different type of media

12 Month Review Meeting Project # Introduction to Workpackage Task Vocal query interface  Voice initiated media retrieval (without natural language processing)  Recording of query  Phoneme level recognition  Pronunciation dictionary based word(s) identification  Speaker adaptation

12 Month Review Meeting Project # Deliverables  D3.1 Report outlining retrieval system functionality and specification (Month 6)  D3.2 Prototype on speech and music retrieval systems with vocal query interface (Month 20)  D3.3 Prototype on cross-media retrieval system (Month 20)

12 Month Review Meeting Project # Deliverable D3.1 – Report outlining retrieval system functionality and specification Topics described:  User requirements  Relations to other work packages  Music retrieval  Speech retrieval  Indexing and speaker retrieval  Cross-media retrieval  Vocal query interface  Retrieval system integration and knowledge management – role of ontology  Example user interfaces Contributors: ALL, DIT, NICE, QMUL, RSAMD

12 Month Review Meeting Project # Milestones  M3.1 Initial vocal query system tested, initial speech and music retrieval algorithms developed (Month 8)  M3.2 Vocal query is fully-functional, speech and music retrieval implemented, cross-media retrieval method finalized (Month 14)  M3.3 Vocal query finished, speech and music retrieval systems established, basic cross-media retrieval implemented (Month 20)  M3.4 Cross-media retrieval fully functional, further work is only refinement and optimization (Month 26)

12 Month Review Meeting Project # Milestones M3.1 – Speech retrieval & Vocal query  Vocal query system:  Is ready for demonstration in Hungarian and in English  phoneme level recognition implemented and tested, performance improvement in progress  Hungarian Tri-phone management is under development  English pronunciation dictionary embedded, with multiple versions of pronunciation  Morphological analyzer implemented for the Hungarian pronunciation  The performance of the Hungarian version is better than the English, the reason is under investigation  Speech retrieval:  See above  Language model established for the Hungarian version, for English we are looking for a good text corpus  Word level recognition implemented, under testing, the performance depends on the content/domain of the speech  Phoneme based search finished, word based is under implementation

12 Month Review Meeting Project # Milestones M3.1 – Music retrieval  Music retrieval:  Extractors for tempo, key changes and mode detection have been implemented as VAMP plugins.  SoundBite similarity retrieval is fully functional and available as a MAC OS application.  A segmenter based on SoundBite has been implemented as a VAMP plugin.  A framework for the automatic extraction of audio features has been built. It uses VAMP plugins and outputs descriptors directly in RDF format, allowing easy integration with the ontology.

12 Month Review Meeting Project # Workpackage Progress  Parallel development of separate retrieval engines is in good progress  Results are according to the schedule of the technical annex  We aim to demonstrate our results by running software modules  Scientific research induce risk when targeting improvement of performance  Integration of different retrieval engines into a common architecture and user interface will be challenging  Utilization of the power of the ontology centric approach

12 Month Review Meeting Project # Workpackage Progress Music retrieval - 1  Music retrieval system  Searches assets according to their relevance to music- related queries,using various metadata and automatically extracted features to produce a ranked list of audio files.  Search methods:  Textual and keyword: e.g Author, title, date, genre, etc...  Similarity based on automatically extracted low-level features  Music-related parameters using automatically extracted descriptors: Instrument, orchestration, tempo, key, etc...

12 Month Review Meeting Project # Workpackage Progress Music retrieval – 2 – Music Analysis module To PCM & compressed audio assets repository Input Audio File (PCM)‏ Manual Entry Tags/Data De-Noising / Restoration Source Separation Mid-Level Feature Extractors High-Level Features Extractors Compression Reliability Metric High-level features (parametric search)‏ Mid-level descriptors similarity search Optimal source separation & denoising parameters To Metadata Repository Manual Tags & Manual High Level Features Archive application (musical audio)‏

12 Month Review Meeting Project # Workpackage Progress Music retrieval - 3  Music Analysis module  Musically relevant descriptors are automatically extracted by a module running a series of “VAMP” plugins.  Descriptors returned by the plugins can be classified as:  Mid-level features used by the retrieval system to search for similar audio assets (e.g. Timbre profile ).  High-level features enable the user to search for audio assets using musically relevant parameters. High level descriptors are also used for the visualisation of key events along the assets' timeline (e.g. position of beats, bars, key changes and instruments), providing a considerable aid in the analysis of a piece of music.

12 Month Review Meeting Project # Workpackage Progress Music retrieval - 4  Similarity Search  Based on SoundBite algorithm: It allows simultaneous segmentation, thumbnailing and modelling of an audio asset

12 Month Review Meeting Project # Workpackage Progress Speech retrieval - 1  Research and testing is made on well prepared corpuses  Text and corresponding speech needed hours  Quality is very important  Low noise  Mixed sound source omitted  Accent sensitive  Field interviews, phone conversations  Quality of the recording  Silence to silence segmentation - automated and manual  Half of the corpus for training  Half of the corpus for testing  Hungarian corpus  Hungarian radio station ‘Kossuth’ broadcast quality  More than 20 hours - segmentation enhanced manually  English corpus  TIMIT for research purpose  US Supreme Court recordings

12 Month Review Meeting Project # Workpackage Progress Speech retrieval - 2  Preparation for speech recognition  Training on segmented corpuses  Importance of same accent and same speech content domain  Layers of speech recognition  Phoneme level recognition -> acoustic score  Pronunciation dictionary/morphological analysis based recognition -> word exist or not  Language model -> final probability  Mixed, phoneme and word based indexing keeping probabilities  Index based fast retrieval with score value

12 Month Review Meeting Project # Workpackage Progress Speech retrieval - 3  Phoneme level recognition  ALL spent many months with research to refine its algorithms  Successful results  Using gaussian mixture model in HMM nodes  Triphone and allophone identification and management  Speaker clustering  Our phoneme level recognition exceeded the 60% hit rate  This solution was not good enough to build up a reliable speech retrieval on it  The workflow:  Input: silence to silence segments of wave  Output: probable phoneme sequences with acoustic score

12 Month Review Meeting Project # Workpackage Progress Speech retrieval - 4  Dictionary level  Filtering of feasible phoneme sequences by  a pronunciation dictionary in English (custom based pronunciation)  a morphological analyzer in Hungarian (rule based pronunciation)  The workflow:  Input: phoneme sequences with acoustic score  Output: feasible word sequences with kept acoustic score  Language level  On the basis of big text corpuses we rank the probability of the word sequences  The solution performs much better on domain specific speech (legal, medical domain)  The workflow:  Input: word sequences with acoustic score  Output: word sequences with modified acoustic score

12 Month Review Meeting Project # Workpackage Progress Speech retrieval - 5  Integration of speech retrieval into the EASAIER architecture  Speech retrieval works as a black box  Relies on binary indexes  Business logic layer needed  Temporal RDF triplets generated  User initiated retrieval performed

12 Month Review Meeting Project # Workpackage Progress Cross media retrieval - 1  Cross-media retrieval  The CM retrieval engine and its functionalities were specified in Deliverable 3.1  Video analysis modules necessary for CMR are specified in internal EASAIER software modules document and consist of:  Video Transcoding Module  Shot Detection and Key-Frame Extraction Module  Low-Level Feature Extraction Module

12 Month Review Meeting Project # Workpackage Progress Cross media retrieval – 2 – Video Analysis module Compression Audio Stream Extraction Video Segmentation and Keyframe extraction Keyframe Analysis Manual Annotation Audio stream analysis Input Video File Keyframes PCM Original video file Streaming video file (eg. mpeg 4) Multimedia assets repository KF temporal data Video segments temporal data Metadata Features Temporal data Video segments metadata KF Extracted Features Metadata repository

12 Month Review Meeting Project # Workpackage Progress Cross media retrieval - 3  For transcoding purposes we chose and tested ffmpeg software  First version of Shot Detection and Key-Frame Extraction were developed at QMUL and it is ready for integration  MPEG-7 eXperimentation Model (XM) will be integrated for purpose of Low-Level Feature Extraction.

12 Month Review Meeting Project # Workpackage Progress Vocal query  Technology based on the first two layer of speech recognition (see above)  Phoneme level recognition  Pronunciation dictionary or morphological analyzer  Process  Phoneme level recognition with acoustic score  Matching to the dictionary  Solution is the item with the best acoustic score which also found in the dictionary  Technology will be demonstrated  Speed have to be tuned  Performance under evaluation  Performance difference between English and Hungarian

12 Month Review Meeting Project # Contributions and Connections with Other Workpackages

12 Month Review Meeting Project # Upcoming Work Plan Months – Music related  Up to month 20  Music retrieval - prototype available, remaining time will be integration, testing, and continuing development of music retrieval based on other similarity measures.  Month 20- Deliver D3.2  Between month 20 and 26  Music retrieval - only further work is refinement based on user studies (WP7)  Month 26  Deliver D3.3

12 Month Review Meeting Project # Upcoming Work Plan Months – Speech related  Before D3.2 – Month 20  Phoneme level recognition  Speed tuning  Language specific performance improvement  Triphone implementation in English  Speaker clustering  Dictionary level  More pronunciation variants in English  Language model  Bigger text corpuses in English and in Hungarian  Indexing and retrieval  Reimplementation for mixed phoneme and word search  After Month 20  Testing and refinement  Performance tuning

12 Month Review Meeting Project # Upcoming Work Plan Months – Cross media related  All cross-media retrieval effort will be focused on coding and testing the software.  Material produced for working documents (System Architecture, Software Modules, Metadata) specify the software to be developed and integrated, and how this will be done.  The required routines have for the most part been developed at a 'proof-of-concept' level (ontological relationships between different media, feature extraction for images and video, key frame display for retrieved video)  Month 26-  M3.4 -Cross-media retrieval fully functional, further work is only refinement and optimization.  D3.3 Prototype on cross-media retrieval system  Between month 20 and 26, QMUL will work closely with SILOGIC to integrate cross-media retrieval into the EASAIER system, which we intend to use as the prototype for this.

12 Month Review Meeting Project # Demonstration Overview  Retrieval engines and tools are demonstrated separately  Speech related Demo  Segmentation user interface  Vocal query (isolated word search) in Hungarian  Vocal query (isolated word search) in English  Music related Demo  Soundbite – timbre-based music similarity embedded in ITunes

12 Month Review Meeting Project # Demonstration Segmentation User interface for manual segmentation  Synchronizing silence to silence speech segments to the text  Checking automated silence to silence segmentation  Synchronizing word boundaries to the text  Synchronizing phoneme boundaries to the text

12 Month Review Meeting Project # Demonstration Segmentation – screen shot

12 Month Review Meeting Project # Demonstration Vocal query Searching for a spoken word in a dictionary  Recording input  Phoneme level recognition  Matching probable phoneme sequences to content of the pronunciation dictionary  Display of the most probable solution from the dictionary

12 Month Review Meeting Project # Demonstration Vocal query – screen shot

12 Month Review Meeting Project # Demonstration Music retrieval – screen shot