Presentation is loading. Please wait.

Presentation is loading. Please wait.

Language Resources College 11 th ECESS meeting 11th ECESS Meeting College Language Resources 0. Minutes making for College ‘Language Resources’ 1. Goal.

Similar presentations


Presentation on theme: "Language Resources College 11 th ECESS meeting 11th ECESS Meeting College Language Resources 0. Minutes making for College ‘Language Resources’ 1. Goal."— Presentation transcript:

1 Language Resources College 11 th ECESS meeting 11th ECESS Meeting College Language Resources 0. Minutes making for College ‘Language Resources’ 1. Goal of meeting 2. Status members of College 3. Interests and acceptance of associated members and observers 4. Acceptance of College minutes of last meeting 5. College-Action List of 10 th meeting 6. Status of partners Pronunciation lexica (Pool Lex1, Pool Lex2) Acoustic data for TTS voices (Pool Voice1, Pool Voice2) Text Corpora (Pool Text1, Pool Text2)

2 7. The actual state of LR specification Accepting specification for Text Corpora (Pool Text1, Pool Text2) Accepting specification for Acoustic data for TTS voices (minimal requirements, Pool Voice2) 8. Further plans of partners 9. Discussion: General issues ECESS LR specification documents (public webpage) LR distribution (internal page) Splitting LR 10. Discussion: Further directions of LR College Extension of LR collection (new languages) Specification for new types of Pools Publications, promotion of ECESS LR 11. New Action List of College Language Resources College 11 th ECESS meeting

3 Status and further plans of partners Interests and acceptance of members, associated members and observers Accepting the specification for Text Corpora (Pool Text1, Pool Text2) Finalizing the specification for Acoustic data for TTS voices (Pool Voice2) ECESS LR specification documents (public and internal page) Extension of LR collection Goal of Meeting 1. Goal of Meeting

4 2. Status members of College Current members of LR College AMU(Coordinator) Grażyna Demenko Siemens (Ute Ziegenhain) Middle East Technical University, Ankara (Tolga Çiloğlu) CAS (Jinhua Tao) Uni Munich (Uwe Reichel) Associated partners and Observers Nokia(Imre Kiss) Microsoft Portugal(Daniela Braga) University of Bielefeld(Dafydd Gibbon) CNRS Aix en Provence(Daniel Hirst) Language Resources College 11 th ECESS meeting

5 3. Interests and acceptance of associated members and observers Voting a member of LR College CNRS, Aix en Provence (Daniel Hirst) University of Bielefeld (Dafydd Gibbon) Others potentially interested in LR? Language Resources College 11 th ECESS meeting

6 introduction of the agenda Dafydd Gibbon (Uni Bielefeld) want to contribute (MBROLA diphone voice, German lexicon) CNRS wants to become member of LR college present resources: UK lexicon, UK baseline voice, Mandarin lexicon, Mandarin voice, Polish lexicon (extended format), Catalan (UK baseline voice and Polish lexicon still have to be validated) POS tagging still has to be specified (size of text, domains, tokenisation problems, tag set, format of POS tags, validation) minimal requirements for recording voice (Hartmut Pfitzinger) plans of partners (table of supported languages) Acceptance of College minutes of last meeting 4. Acceptance of College minutes of last meeting Language Resources College 11 th ECESS meeting

7 discussion, general issues: settled documents are on the public web- page, documents wich are still under discussion will be only on the internal page agreed specifications will be renamed as ECESS version, not TC- STAR anymore splitting LRs, for instance phonetic lexicon: proper names should be put in a separate lexicon, because they are task specific, may confuse the OOV routines, and increase production costs in college "tools", Maribor acts as a distributor of tools needed for evaluation promotion of ECESS LR (LREC 2008) extension of LR collection (new pools, languages) Language Resources College 11 th ECESS meeting

8 5. College-Action List of 10 th meeting Finalizing specifications for Text Corpora POS: PT1, PT2 Finalizing specifications for Acoustic data fot TTS voices (PV2) Lexicons PL1, PL2: final documentation, reports of validation to be published on the internal ECESS pages Extension of LR collection (new types of Pools e.g., speaker characterization/emotional/pathological voices/speech) Language Resources College 11 th ECESS meeting

9 6. Status of partners Pronunciation lexica (Pool Lex1, Pool Lex2) Acoustic data for TTS voices (Pool Voice1, Pool Voice2) Text Corpora (Pool Text1, Pool Text2) Language Resources College 11 th ECESS meeting

10

11 7. The actual state of LR specification Accepting the specification for Text Corpora (Pool Text), Ute Ziegenhain, SIEMENS Tagged text corpora (end of Sept.) Finalizing the specification for Acoustic data for TTS voices (Pool Voice2), IPDS Kiel Preparing Polish lexicon (extended version) for validation Language Resources College 11 th ECESS meeting

12 8. Further plans of partners

13 Uni Bielefeld: Input for ECESS The topics proposed so far by the Bielefeld partner are based on current Bielefeld activities and need to be adapted to ECESS needs. After further discussion, it is suggested that the top priority should be in the area of lexicon design i.e. formal specification and XML model for a flexible lexicon format which will permit extension in the following areas: a) Multilingual lexicon for speech synthesis b) Integrated lexicon for multimodal speech synthesis (e.g. gesture sublexicons) c) Integrated lexicon for NLP and synthesis components. A demonstration core lexicon for German is being prepared. Language Resources College 11 th ECESS meeting

14 9. Discussion. General issues ECESS LR specification documents (public page): The language independent specification is public and should be accessible from the public web-page. LR distribution (internal webpage): contact information LSPs specifications (internal page): The language specific data (LSP – language specific peculiarities) is part of the LR dedicated for a pool. The LSPs have to be approved by the LR college and be located on the internal webpage of ECESS (College LR). Splitting LR The data in the lexicon pool could be divided into lexicon of common words and lexicon of proper names: partners interested only in parts of the lexica could then choose what they want to deliver and exchange. Advantage: some partners may only want to deliver/get certain parts of a particular language; production costs for the different parts are more comparable. Language Resources College 11 th ECESS meeting

15 Extension of LR collection New types of Pools (e.g. acoustic databases for speaker characterization, emotional databases, special databases with pathological voices/speech) depending on interests and needs of ECESS. Inclusion of new languages. Specification for new types of Pools: preliminary remarks Promotion of ECESS LR, publications: SASR, Poland 2008, update the publication list Language Resources College 11 th ECESS meeting 10. Discussion. Further directions of LR College

16 Make available to partners, end of Sept. decide on Ute specifications promotion of ECESS activities SASR Workshop, Poland 2008 (flyers, presentation) (AW) LR – publications/SASR/Poland’2008 (AW) emotional databases (exchange the information) (IH) Specifications for the acoustic data, make the info available (Hatrmut), (AW) lexicon (PL) evaluation (AW) Availability of lexica (splitting) (AW) Collect info about lexica for inflected languages (adding new spcification) (ZK) Language Resources College 11 th ECESS meeting 11. New Action List of College


Download ppt "Language Resources College 11 th ECESS meeting 11th ECESS Meeting College Language Resources 0. Minutes making for College ‘Language Resources’ 1. Goal."

Similar presentations


Ads by Google