This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° Reproducible.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

A MapReduce Workflow System for Architecting Scientific Data Intensive Applications By Phuong Nguyen and Milton Halem phuong3 or 1.
Web Accessible Virtual Research Environment for Ecosystem Science Community Presentation by Siddeswara Guru.
Languages & The Media, 5 Nov 2004, Berlin 1 New Markets, New Trends The technology side Stelios Piperidis
VisTrails: Overview Juliana Freire University of Utah Joint work with: Erik Andersen, Steven P. Callahan, David Koop, Emanuele.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Bootstrapping pronunciation models: a South African case study Presented at the CSIR Research and Innovation Conference Marelie Davel & Etienne Barnard.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Cracow Grid Workshop’10 Kraków, October 11-13,
Volunteer Thinking with Bossa David P. Anderson Space Sciences Laboratory University of California, Berkeley.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
DISTRIBUTED COMPUTING
INFSO-RI Module 01 ETICS Overview Alberto Di Meglio.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Colour of Ocean Data, Brussels, November 2002 Colour of Ocean Data: Discussion Panel Lesley Rickards British Oceanographic Data Centre.
Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.
SPEEch on the griD (SPEED). SPEEch on the griD (SPEED) Motivation Automatic speech processing computationally demanding in the training, optimalization.
1 Large-Scale Profile-HMM on the Grid Laurent Falquet Swiss Institute of Bioinformatics CH-1015 Lausanne, Switzerland Borrowed from Heinz Stockinger June.
Hendrik J Groenewald Centre for Text Technology (CTexT™) Research Unit: Languages and Literature in the South African Context North-West University, Potchefstroom.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° Data Repositories.
Edinburgh e-Science MSc Bob Mann Institute for Astronomy & NeSC University of Edinburgh.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° The Sci-GaIA.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° Public Health.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° Open Science.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° iGrid Aron Kondoro.
Education eLibrary and Repository
Accessing the VI-SEEM infrastructure
Workload Management Workpackage
INTRODUCTION TO GENERATING SERVICES
Stephan Nathanael Mgaya
WEKA Machine Learning Use Case – Breast Cancer - Final report
What is HPC? High Performance Computing (HPC)
Intelligent Medical Image Analyzer
Public Health Gateway In Kenya
CyVerse Tools and Services
Hydrodynamic Galactic Simulations
Big Data A Quick Review on Analytical Tools
SuperB and its computing requirements
A comparison between a Computational Grid and a High-end Multicore Server in an academic environment David Risinamhodzi – North-west University- South.
Segun OYEYIOLA – Obafemi Awolowo University, Ile-Ife - Nigeria
Medical Image Analyzer - Final report
iGrid Aron Kondoro – University of Dar-es-Salaam - Tanzania
SPEEch on the griD (SPEED)
Recap: introduction to e-science
NA3: User Community Support Team
The Sci-GaIA project and introduction to the Hackfest
Deep Exploration and Filtering of Text (DEFT)
Task 1 Activities Achievements Pictures
WIMEA – ICT: Science Gateway for Weather Information Management in East Africa to interact with ICT Tool WRF MAKWEBA, Damas – DSM Institute of Technology.
An easier path? Customizing a “Global Solution”
Segun OYEYIOLA – Obafemi Awolowo University -
eCulture Science Gateway – reloaded
Public Health Gateway In Kenya
Reproducible ASR workflows (RASR) - Final report
EOSCpilot All Hands Meeting 8 March 2018 Pisa
DARIAH Competence Centre: architecture and activity summary
Automatic Speech Recognition: Conditional Random Fields for ASR
Module 01 ETICS Overview ETICS Online Tutorials
In the card you will find a device Don’t reveal it to others Think
INTRODUCTION AND SPECIFIC NEEDS
Technical Capabilities
Public Health Gateway In Kenya
Brian Matthews STFC EOSCpilot Brian Matthews STFC
DEGISCO project - Desktop Grids for application developers and users
Expand portfolio of EGI services
Scientific Workflows Lecture 15
GGF10 Workflow Workshop Summary
Presentation transcript:

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° Reproducible Automatic Speech Recognition workflows David Risinamhodzi – North west University – South Africa e-Research Summer Hackfest – Catania (Italy)

Introduction & Overview 2 Automatic Speech Recognition training(ASR) Defn: Automatic speech recognition (ASR) can be defined as the independent, computer-driven transcription of spoken language into readable text in real time (Stuckless, 1994). Popular in the HLT community in SA Done for 11 official languages in SA Available speech corpus 50 GB high quality NCHLT speech data (16 kHz) 4.3 GB Lwazi telephony data (8 kHz)

Introduction & Overview (ASR) 3 Speech recognition requirements: Sufficient storage space for large audio and text datasets High Performance Computing (HPC): Many CPUs as most training and recognition is performed in parallel GPUs - use GPUs to train Deep neural nets (DNNs) High speed internet to move data around compute nodes Mechanism to manage datasets

Speech recognition tools : 4 HTK – Hidden Markhov Toolkit Bundled into “ASR_template” Libsvm – Support Vector Machine Delivered to endpoints with CODE-RADE

Scientific problem 5 Lack of collaboration Lack of exploitation of the available distributed computing Long hours of training systems on personal computers Research questions to be answered : Are speech recognition models reproducible ? How do speech recognition models vary according to different dictionaries and training ? Corollary issues : Provenance and publication of models ease of exploring ASR models – access for researchers Make access to national language resources easier

Computing and data model 6

TODO 7 ASR tools are working – but lots of integration for a researcher environment is necessary 1)Data ingestion into open-access repository 1) Dictionaries 2) Corpii 2)Web interface : 1) Select dictionary, corpus 2) Select task 3) Select specific parameters, or range 4) Task submission 3)Validation and comparison 4)Publication of the new analysis (model, accuracy, etc)

Implementation strategy 8 Web Interface to schedule training of jobs using specific datasets and recipes encompassing:- Ported application on the South African National Grid Boast of the fastest computer in Africa LENGAU Open Access Repository used to share data and results OneData platform is a good option to manage this Science Gateway used for job submission Kepler workflow management system may be a good option

Summary and conclusions 9 Goal is to encourage collaboration Fully exploit available resources and technologies Automate the process as much as possible Encourage use of Indigo cloud services in the near future.

Gracia! sci-gaia.eu