Presentation is loading. Please wait.

Presentation is loading. Please wait.

IPSOM Indexing, Integration and Sound Retrieval in Multimedia Documents.

Similar presentations

Presentation on theme: "IPSOM Indexing, Integration and Sound Retrieval in Multimedia Documents."— Presentation transcript:

1 IPSOM Indexing, Integration and Sound Retrieval in Multimedia Documents

2 IPSOM 2 Outline Objectives3 Problems to be addressed4 Research team5 Background6 Work plan7 Dissemination of results8

3 IPSOM 3 Objectives Improved access to spoken information –spoken interface (accessible by the visually impaired) –detection and indexing of units in spoken books words sentences topics Development of multimedia spoken books –broaden the usage of spoken books (didactic applications, etc.) –multimedia interfaces for access and retrieval

4 IPSOM 4 Problems to be Addressed spoken books offer the visually impaired community a powerful source of information and leisure however: –information is sequentially stored in analogue form 30,000 hours  2,000 books –information retrieval is extremely slow and difficult error prone trial-and-error basis not structured

5 IPSOM 5 Research Team Speech Processing Group of INESC Lisboa –António Serralheiro (PhD) –Isabel Trancoso (PhD) –Carlos Teixeira (PhD) –Diamantino Caseiro (MSc) –Rui Amaral (MSc) –Hugo Amorim (UG)* Large Scale Informatics Laboratory of Faculdade de Ciências de Lisboa –Nuno Guimarães (PhD) –Teresa Chambel (MSc) National Library –José Borbinha (MSc)

6 IPSOM 6 Background Previous work on speech recognition and synthesis, development of spoken corpora and alignment tools Current work on topic detection for broadcast news recognition (ALERT project) Previous work on video segmentation and indexing Current work on techniques and methods for integrating digital video with text (UNIBASE project) Collaboration with the NISO efforts for the “Talking Book” standard Collaboration in the DAISY (world-wide initiative) project for digital talking books

7 IPSOM 7 Work Plan Duration: 36 months Manpower: 193 person*month

8 IPSOM 8 Dissemination of Results Conferences & Workshops –National (e. g. RECPAD, PROPOR) –International (e.g. AACE, ICASSP, ASRU, EUROSPEECH, ICSLP, ECDL) Final Workshop –specially aimed at the visually impaired community Web Site –dissemination of didactic or other multimedia applications Browsing tools for spoken books Digitally stored and indexed spoken books (distributed through BN) –invaluable resource for data-driven prosody modelling and unit selection for text-to-speech synthesis

9 IPSOM 9 Budget Overall funding

10 IPSOM 10 Budget Requested Funding by Institution

11 IPSOM 11 Spoken Corpora Alignment Generation of an N-gram framework for topic detection. Generation of phonetic transcriptions for the spoken book texts –To be done automatically using the grapheme-to-phone module of the DIXI+ project –Pronunciation of specific proper names or technical terms may eventually need to be manually corrected Generation of speaker-dependent acoustic models, adapted to the reader of the spoken book –Context-dependant models are the state-of-the-art for LVCSR tasks –Especially important to model the intra and inter word vowel reduction phenomena that characterise European Portuguese. Labelling of the spoken corpora –Initial segmentation stage –Knowledge sources derived from the above subtasks.

12 IPSOM 12 Multimedia Applications for Indexing and Retrieval Search and retrieval models –keyword, combined indexes, topics, and standardised metadata Access interface, visualisation and navigation on the "spoken books" base –Search –Query –Retrieval –Navigation Integration of the tools/applications for "browsing" and "retrieval" with the phonographic base –Access interfaces and performance issues have to be designed Usability testing and evaluation

Download ppt "IPSOM Indexing, Integration and Sound Retrieval in Multimedia Documents."

Similar presentations

Ads by Google