SPEEch on the griD (SPEED)

Slides:



Advertisements
Similar presentations
1 of 18 Information Dissemination New Digital Opportunities IMARK Investing in Information for Development Information Dissemination New Digital Opportunities.
Advertisements

Rolls-Royce supported University Technology Centre in Control and Systems Engineering UK e-Science DAME Project Alex Shenfield
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI /05/2015 VTP Proposed virtual team project Fire and Smoke Simulation.
© EADS 2010 – All rights reserved Force Protection Call 4 A-0938-RT-GC EUSAS European Urban Simulation for Asymmetric Scenarios Scalarm: Massively Self-Scalable.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Natural Language Understanding
Towards a definition of GestBase - an open database of gestures Milan Rusko Institute of Informatics of the Slovak Academy of Sciences, Bratislava.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Data Mining Process A manifestation of best practices A systematic way to conduct DM projects Different groups has different versions Most common standard.
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
Assessment Practices That Lead to Student Learning Core Academy, Summer 2012.
SPEEch on the griD (SPEED). SPEEch on the griD (SPEED) Motivation Automatic speech processing computationally demanding in the training, optimalization.
Institute of Informatics: PELLUCID1 Workflow Process Creation by Pellucid Agents Michal Laclavik, Zoltan Balogh Institute of Informatics, Slovak Academy.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Data Structures and Algorithms Dr. Tehseen Zia Assistant Professor Dept. Computer Science and IT University of Sargodha Lecture 1.
Performance Comparison of Speaker and Emotion Recognition
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Eva Pajorova ( Institute of Informatics, Slovak Academy of Sciences, Bratislava, Slovakia 02/5477.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
Pattern Recognition NTUEE 高奕豪 2005/4/14. Outline Introduction Definition, Examples, Related Fields, System, and Design Approaches Bayesian, Hidden Markov.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Workflow management tool for Earth science applications Ladislav Hluchy, Viet Tran Institute of Informatics.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
 System Requirement Specification and System Planning.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° Reproducible.
RCSLT Outcomes Project TOMs CONNECT 17th November 2016
OrbEEt Project Introduction <Location>, <Date> Presenter
A NONPARAMETRIC BAYESIAN APPROACH FOR
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning for Computer Security
Big Data Enterprise Patterns
Information Collection and Presentation Enriched by Remote Sensor Data
SOFTWARE DESIGN AND ARCHITECTURE
Fire and Smoke Simulation Slovak NGI International Liaison
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Succeeding as a Systems Analysts
NA3: User Community Support Team
Architecture Components
RCSLT Outcomes Project TOMs CONNECT 17th November 2016
Computational NeuroEngineering Lab
PROGRESS AND CHANGES IN
Analysis and Understanding
Survey phases, survey errors and quality control system
Objective of This Course
Survey phases, survey errors and quality control system
Statistical Models for Automatic Speech Recognition
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
Automatic Speech Recognition: Conditional Random Fields for ASR
Smart Learning concepts to enhance SMART Universities in Africa
Data Analytics Life Cycle
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Wide Area Workload Management Work Package DATAGRID project
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
John H.L. Hansen & Taufiq Al Babba Hasan
LECTURE 15: REESTIMATION, EM AND MIXTURES
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Speaker Identification:
Gio.net First Proposal to discuss during Barcelona Meeting
Quality Management Anita Téringer– ITS Hungary
Huawei CBG AI Challenges
Presentation transcript:

SPEEch on the griD (SPEED) Virtual team project SPEEch on the griD (SPEED) Ladislav Hluchý, Milan Rusko Institute of Informatics Slovak Academy of Sciences Bratislava, Slovakia http://www.ui.sav.sk https://wiki.egi.eu/wiki/VT_SPEED http://www.slovakgrid.sk 06/08/2018 EGI Community Forum 27 March 2012 Leibniz

SPEEch on the griD (SPEED) Almost every step in automatic speech recognition (ASR) and text-to-speech (TTS) training and testing could benefit from data parallelism and splitting a computing task among many execution nodes of a computer cluster. Feature extraction, initiation and forward-backward training of the acoustic models, re-alignment of the speech data and alignment for clustering, tree-based clustering, linear transform estimations for adaptation of acoustic models, belong to the tasks in which huge amount of data rapidly causes the ASR development to be extremely time expensive. Data parallelism may be used to shorten this development and is applicable in different contexts such as in per-subset training, Hidden Markov Models' state, stream, speaker, and even sentence-level contexts. The training process has a lot of input parameters, e.g. the type of features, number of forward-backward iterations, where to apply re-alignment of the speech data and when to apply tree clustering, what type of linear transform to choose, and so far. The set of input parameters depends on the speech data available, the domain (the type of data such as broadband speech or telephone speech). Some steps can be language dependent as well. Finally, validation and evaluation criteria increase the number of variable input parameters, which leads to the need for more computing power in the optimization process. 06/08/2018 EGI Community Forum 27 March 2012 Leibniz

SPEEch on the griD (SPEED) Motivation Automatic speech processing computationally demanding in the training, optimization and testing phases, therefore optimization is done „part by part“. But optimization of one part of the recognizer is not independent from the settings of the other parts, the optimization process should be holistic, taking into account the influence of as much parameters as possible. Output Making power of GRID computing available to a wider community of researchers dealing with speech processing for everyday work Developing methods for holistic optimization and diagnostics in speech processing and tools implementing these methods in the grid platform Other to be added by the consortium 06/08/2018 EGI Community Forum 27 March 2012 Leibniz

SPEEch on the griD (SPEED) The holistic optimization The ASR system consists of many modules. Features of the architecture of such a module together with many values/settings of its control characteristics can be taken as components of a “settings-vector”, or input vector of the particular module. In an ideal case the GRID should enable optimization of the settings of all the input vector values of the system, giving the globally optimized setting of an ASR system for a given purpose. 06/08/2018 EGI Community Forum 27 March 2012 Leibniz

Holistic optimization The aim is to find clusters of the input parameters vector space that increase the system performance for the different training/testing tasks. 06/08/2018 EGI Community Forum 27 March 2012 Leibniz

SPEEch on the griD (SPEED) The aim of the Virtual Team Project is to optimize the training and testing processes of the speech technologies to find clusters of the input parameters vector space that increase the system performance for the different training/testing tasks. The multi-language virtual team could test the impact of language on the optimization processes. A computer cluster can help in identifying the impact of run-time diagnosis on the optimization process, as well as facilitate the search for input parameters subsets that increase the diagnosis performance. 06/08/2018 EGI Community Forum 27 March 2012 Leibniz

SPEEch on the griD (SPEED) Tasks: 1. Establishment of contacts, investigation of the state of the art, formation of a consortium 2. Methodology development for a) holistic optimization i. ASR (may include speaker identification, speaker recognition and language recognition) ii. Text to Speech (TTS) systems b) holistic diagnostics i. ASR ii. TTS 3. Implementation aspects a) porting the computations in the Automatic Speech Processing domain to the Grid platform b) solving particular domain-dependent problems of using GRID computing in automatic speech processing i. Problem of needed high data transfers and its influence on GRID computing speed ii. Data security and program security 4. Storage possibilities for large databases in GRID 5. Porting commercial applications to GRID 06/08/2018 EGI Community Forum 27 March 2012 Leibniz

SPEEch on the griD (SPEED) The consortium is looking for partners with expertise in automatic speech processing, natural language processing, speech and language resources, speech and language modelling, optimalization, high-performance computing and preferably with own available computational capacity 06/08/2018 EGI Community Forum 27 March 2012 Leibniz

SPEEch on the griD (SPEED) Members: NGIs: Slovakia: Milan Rusko (IISAS - Institute of Informatics of the Slovak Academy of Sciences (Leader)) Speech processing group Ladislav Hluchy (IISAS - Institute of Informatics of the Slovak Academy of Sciences (NIL)) Grid computing group Jozef Juhar (Tech. Univ. Košice) Speech processing Ireland: David O'Callaghan (Trinity College Dublin) Switzerland: Milos Cernak (Idiap research institute ) UK: Martin Wynne (University of Oxford) John Coleman (Phonetics Laboratory at Oxford University) Claire Devereux (STFC) EGI.eu: Nuno Ferreira Gergely Sipos aaaa aaaaaaaaa aaa aa aaaa aaaaaaaaaaaa We are looking for other partners Karolis Eigelis aaaa aaaaaaaaa aaaaaaaaaaaaaa Mailing List: vt-speech-processing@mailman.egi.eu 06/08/2018 EGI Community Forum 27 March 2012 Leibniz

Virtual Team Project SPEEch on the griD (SPEED) Thank you for your attention. Any questions? 06/08/2018 EGI Community Forum 27 March 2012 Leibniz