Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP3 – Retrieval systems.

Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP3 – Retrieval systems

12 Month Review Meeting Project #033902 Introduction to Workpackage overview  Objectives:  To provide retrieval systems offering the ability to search by various musical similarity measures.  To search for spoken words or phrases.  To search across different media for associated content.  Queries may be text-based, spoken or audio examples.  Tasks:  T3.1: Music retrieval  T3.2: Speech retrieval  T3.3: Cross-media retrieval  T3.4: Vocal query interface

12 Month Review Meeting Project #033902 Introduction to Workpackage participants & schedule  Participants and their contributions  QMUL22 mm– music & cross media retrieval  DIT:5 mm– music retrieval  ALL:24 mm– speech retrieval & vocal queries  LFUI:3 mm– integration of retrieval engines  NICE:12 mm– retrieval for attributes of speech  Schedule  T3.1 Music retrieval:month 9 – month 20  T3.2 Speech retrieval:month 8 – month 20  T3.3: Cross-media retrieval:month 1 – month 6 month 21 – month 26  T3.4: Vocal query interface:month 7 – month 20

12 Month Review Meeting Project #033902 Introduction to Workpackage Task 3.1 - Music retrieval  Searching and organizing music collections:  Adequate representation for the audio in the query  Textual and keyword: e.g Author, title, date, genre, etc  Automatic feature extraction  Low-level acoustic similarity measures  Mid-level features – characterize the rhythmic structure  High-level features  musically relevant parameters  visualisation of key events along the assets' timeline

12 Month Review Meeting Project #033902 Introduction to Workpackage Task 3.2 - Speech retrieval  Retrieving the content of speech corpuses in English and Hungarian languages  Building test corpuses  Levels of recognition  Phoneme level recognition  Pronunciation dictionary filter or morphological analysis  Text corpus based language model  Phoneme and word level indexing  Fast retrieval  Improve the performance

12 Month Review Meeting Project #033902 Introduction to Workpackage Task 3.3 - Cross-media retrieval  Searching media in various formats (audio recordings, video recordings, notated scores, images)  Using metadata  Feature extraction  Similarity measures  Optimised multidimensional search methods  Video analysis  enter a piece of media as a query and might retrieve an entirely different type of media

12 Month Review Meeting Project #033902 Introduction to Workpackage Task 3.4 - Vocal query interface  Voice initiated media retrieval (without natural language processing)  Recording of query  Phoneme level recognition  Pronunciation dictionary based word(s) identification  Speaker adaptation

12 Month Review Meeting Project #033902 Deliverables  D3.1 Report outlining retrieval system functionality and specification (Month 6)  D3.2 Prototype on speech and music retrieval systems with vocal query interface (Month 20)  D3.3 Prototype on cross-media retrieval system (Month 20)

12 Month Review Meeting Project #033902 Deliverable D3.1 – Report outlining retrieval system functionality and specification Topics described:  User requirements  Relations to other work packages  Music retrieval  Speech retrieval  Indexing and speaker retrieval  Cross-media retrieval  Vocal query interface  Retrieval system integration and knowledge management – role of ontology  Example user interfaces Contributors: ALL, DIT, NICE, QMUL, RSAMD

12 Month Review Meeting Project #033902 Milestones  M3.1 Initial vocal query system tested, initial speech and music retrieval algorithms developed (Month 8)  M3.2 Vocal query is fully-functional, speech and music retrieval implemented, cross-media retrieval method finalized (Month 14)  M3.3 Vocal query finished, speech and music retrieval systems established, basic cross-media retrieval implemented (Month 20)  M3.4 Cross-media retrieval fully functional, further work is only refinement and optimization (Month 26)

12 Month Review Meeting Project #033902 Milestones M3.1 – Speech retrieval & Vocal query  Vocal query system:  Is ready for demonstration in Hungarian and in English  phoneme level recognition implemented and tested, performance improvement in progress  Hungarian Tri-phone management is under development  English pronunciation dictionary embedded, with multiple versions of pronunciation  Morphological analyzer implemented for the Hungarian pronunciation  The performance of the Hungarian version is better than the English, the reason is under investigation  Speech retrieval:  See above  Language model established for the Hungarian version, for English we are looking for a good text corpus  Word level recognition implemented, under testing, the performance depends on the content/domain of the speech  Phoneme based search finished, word based is under implementation

12 Month Review Meeting Project #033902 Milestones M3.1 – Music retrieval  Music retrieval:  Extractors for tempo, key changes and mode detection have been implemented as VAMP plugins.  SoundBite similarity retrieval is fully functional and available as a MAC OS application.  A segmenter based on SoundBite has been implemented as a VAMP plugin.  A framework for the automatic extraction of audio features has been built. It uses VAMP plugins and outputs descriptors directly in RDF format, allowing easy integration with the ontology.

12 Month Review Meeting Project #033902 Workpackage Progress  Parallel development of separate retrieval engines is in good progress  Results are according to the schedule of the technical annex  We aim to demonstrate our results by running software modules  Scientific research induce risk when targeting improvement of performance  Integration of different retrieval engines into a common architecture and user interface will be challenging  Utilization of the power of the ontology centric approach

12 Month Review Meeting Project #033902 Workpackage Progress Music retrieval - 1  Music retrieval system  Searches assets according to their relevance to music- related queries,using various metadata and automatically extracted features to produce a ranked list of audio files.  Search methods:  Textual and keyword: e.g Author, title, date, genre, etc...  Similarity based on automatically extracted low-level features  Music-related parameters using automatically extracted descriptors: Instrument, orchestration, tempo, key, etc...

12 Month Review Meeting Project #033902 Workpackage Progress Music retrieval – 2 – Music Analysis module To PCM & compressed audio assets repository Input Audio File (PCM)‏ Manual Entry Tags/Data De-Noising / Restoration Source Separation Mid-Level Feature Extractors High-Level Features Extractors Compression Reliability Metric High-level features (parametric search)‏ Mid-level descriptors similarity search Optimal source separation & denoising parameters To Metadata Repository Manual Tags & Manual High Level Features Archive application (musical audio)‏

12 Month Review Meeting Project #033902 Workpackage Progress Music retrieval - 3  Music Analysis module  Musically relevant descriptors are automatically extracted by a module running a series of “VAMP” plugins.  Descriptors returned by the plugins can be classified as:  Mid-level features used by the retrieval system to search for similar audio assets (e.g. Timbre profile ).  High-level features enable the user to search for audio assets using musically relevant parameters. High level descriptors are also used for the visualisation of key events along the assets' timeline (e.g. position of beats, bars, key changes and instruments), providing a considerable aid in the analysis of a piece of music.

12 Month Review Meeting Project #033902 Workpackage Progress Music retrieval - 4  Similarity Search  Based on SoundBite algorithm: It allows simultaneous segmentation, thumbnailing and modelling of an audio asset

12 Month Review Meeting Project #033902 Workpackage Progress Speech retrieval - 1  Research and testing is made on well prepared corpuses  Text and corresponding speech needed 10-20 hours  Quality is very important  Low noise  Mixed sound source omitted  Accent sensitive  Field interviews, phone conversations  Quality of the recording  Silence to silence segmentation - automated and manual  Half of the corpus for training  Half of the corpus for testing  Hungarian corpus  Hungarian radio station ‘Kossuth’ broadcast quality  More than 20 hours - segmentation enhanced manually  English corpus  TIMIT for research purpose  US Supreme Court recordings

12 Month Review Meeting Project #033902 Workpackage Progress Speech retrieval - 2  Preparation for speech recognition  Training on segmented corpuses  Importance of same accent and same speech content domain  Layers of speech recognition  Phoneme level recognition -> acoustic score  Pronunciation dictionary/morphological analysis based recognition -> word exist or not  Language model -> final probability  Mixed, phoneme and word based indexing keeping probabilities  Index based fast retrieval with score value

12 Month Review Meeting Project #033902 Workpackage Progress Speech retrieval - 3  Phoneme level recognition  ALL spent many months with research to refine its algorithms  Successful results  Using gaussian mixture model in HMM nodes  Triphone and allophone identification and management  Speaker clustering  Our phoneme level recognition exceeded the 60% hit rate  This solution was not good enough to build up a reliable speech retrieval on it  The workflow:  Input: silence to silence segments of wave  Output: probable phoneme sequences with acoustic score

12 Month Review Meeting Project #033902 Workpackage Progress Speech retrieval - 4  Dictionary level  Filtering of feasible phoneme sequences by  a pronunciation dictionary in English (custom based pronunciation)  a morphological analyzer in Hungarian (rule based pronunciation)  The workflow:  Input: phoneme sequences with acoustic score  Output: feasible word sequences with kept acoustic score  Language level  On the basis of big text corpuses we rank the probability of the word sequences  The solution performs much better on domain specific speech (legal, medical domain)  The workflow:  Input: word sequences with acoustic score  Output: word sequences with modified acoustic score

12 Month Review Meeting Project #033902 Workpackage Progress Speech retrieval - 5  Integration of speech retrieval into the EASAIER architecture  Speech retrieval works as a black box  Relies on binary indexes  Business logic layer needed  Temporal RDF triplets generated  User initiated retrieval performed

12 Month Review Meeting Project #033902 Workpackage Progress Cross media retrieval - 1  Cross-media retrieval  The CM retrieval engine and its functionalities were specified in Deliverable 3.1  Video analysis modules necessary for CMR are specified in internal EASAIER software modules document and consist of:  Video Transcoding Module  Shot Detection and Key-Frame Extraction Module  Low-Level Feature Extraction Module

12 Month Review Meeting Project #033902 Workpackage Progress Cross media retrieval – 2 – Video Analysis module Compression Audio Stream Extraction Video Segmentation and Keyframe extraction Keyframe Analysis Manual Annotation Audio stream analysis Input Video File Keyframes PCM Original video file Streaming video file (eg. mpeg 4) Multimedia assets repository KF temporal data Video segments temporal data Metadata Features Temporal data Video segments metadata KF Extracted Features Metadata repository

12 Month Review Meeting Project #033902 Workpackage Progress Cross media retrieval - 3  For transcoding purposes we chose and tested ffmpeg software  First version of Shot Detection and Key-Frame Extraction were developed at QMUL and it is ready for integration  MPEG-7 eXperimentation Model (XM) will be integrated for purpose of Low-Level Feature Extraction.

12 Month Review Meeting Project #033902 Workpackage Progress Vocal query  Technology based on the first two layer of speech recognition (see above)  Phoneme level recognition  Pronunciation dictionary or morphological analyzer  Process  Phoneme level recognition with acoustic score  Matching to the dictionary  Solution is the item with the best acoustic score which also found in the dictionary  Technology will be demonstrated  Speed have to be tuned  Performance under evaluation  Performance difference between English and Hungarian

12 Month Review Meeting Project #033902 Contributions and Connections with Other Workpackages

12 Month Review Meeting Project #033902 Upcoming Work Plan Months 12-24 – Music related  Up to month 20  Music retrieval - prototype available, remaining time will be integration, testing, and continuing development of music retrieval based on other similarity measures.  Month 20- Deliver D3.2  Between month 20 and 26  Music retrieval - only further work is refinement based on user studies (WP7)  Month 26  Deliver D3.3

12 Month Review Meeting Project #033902 Upcoming Work Plan Months 12-24 – Speech related  Before D3.2 – Month 20  Phoneme level recognition  Speed tuning  Language specific performance improvement  Triphone implementation in English  Speaker clustering  Dictionary level  More pronunciation variants in English  Language model  Bigger text corpuses in English and in Hungarian  Indexing and retrieval  Reimplementation for mixed phoneme and word search  After Month 20  Testing and refinement  Performance tuning

12 Month Review Meeting Project #033902 Upcoming Work Plan Months 12-24 – Cross media related  All cross-media retrieval effort will be focused on coding and testing the software.  Material produced for working documents (System Architecture, Software Modules, Metadata) specify the software to be developed and integrated, and how this will be done.  The required routines have for the most part been developed at a 'proof-of-concept' level (ontological relationships between different media, feature extraction for images and video, key frame display for retrieved video)  Month 26-  M3.4 -Cross-media retrieval fully functional, further work is only refinement and optimization.  D3.3 Prototype on cross-media retrieval system  Between month 20 and 26, QMUL will work closely with SILOGIC to integrate cross-media retrieval into the EASAIER system, which we intend to use as the prototype for this.

12 Month Review Meeting Project #033902 Demonstration Overview  Retrieval engines and tools are demonstrated separately  Speech related Demo  Segmentation user interface  Vocal query (isolated word search) in Hungarian  Vocal query (isolated word search) in English  Music related Demo  Soundbite – timbre-based music similarity embedded in ITunes

12 Month Review Meeting Project #033902 Demonstration Segmentation User interface for manual segmentation  Synchronizing silence to silence speech segments to the text  Checking automated silence to silence segmentation  Synchronizing word boundaries to the text  Synchronizing phoneme boundaries to the text

12 Month Review Meeting Project #033902 Demonstration Segmentation – screen shot

12 Month Review Meeting Project #033902 Demonstration Vocal query Searching for a spoken word in a dictionary  Recording input  Phoneme level recognition  Matching probable phoneme sequences to content of the pronunciation dictionary  Display of the most probable solution from the dictionary

12 Month Review Meeting Project #033902 Demonstration Vocal query – screen shot

12 Month Review Meeting Project #033902 Demonstration Music retrieval – screen shot

Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP3 – Retrieval systems.

Similar presentations

Presentation on theme: "Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP3 – Retrieval systems."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP3 – Retrieval systems.

Similar presentations

Presentation on theme: "Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP3 – Retrieval systems."— Presentation transcript:

Similar presentations

About project

Feedback