BUT Speech@FIT: 18 years of research in speech data mining Pavel Matejka / Honza Černocký Brno University of Technology, BUT Speech@FIT, Czech Republic 11/2016
Speech@FIT – people Founded in 1997 Grew from 1 person to >20 faculty researchers grad and pre-grad students support staff BUT Speech@FIT Honza Cernocky 05/2016
Our work Data mining from spontaneous conversational speech Transcription Speaker identification Language identification Spoken term detection and keyword spotting Achievements Performance proven in international evaluations NIST Language recognition since 2003 NIST Speaker recognition since 2006 + projects, SITW, MediaEval, … EU (FP*, Horizon2020) and U.S. Govt funded projects (DARPA, IARPA) Strong presence in prestigious events (Johns Hopkins University workshop) Open source software used worldwide: phnrec, STK, TNet, KALDI. BUT Speech@FIT Honza Cernocky 05/2016
Security/Defense sector Narrow the search space Gender ID Language ID Speaker ID AGE estimation Speech quality measures BUT Speech@FIT Honza Cernocky 05/2016
What for II: Call Centers, Lectures BUT Speech@FIT Honza Cernocky 05/2016
Current projects DARPA RATS IARPA BABEL September 2010 – June 2014, prolongation for 2015-16 BBN, BUT, JHU, others VAD, SID, LID, KWS in very bad channels IARPA BABEL March 2012 – July 2016 BBN, BUT, JHU, MIT, NWU speech recognition in many languages with limited resources and short and shortening time for system development Czech MoI „DRAPAK“ Information mining in speech acquired by distant microphones, Oct 2015 – Sep 2020, BUT and Phonexia From telephone close-talk speech to far-field microphones, mainly for security and defense applications. FP7 A-PiMod “Applying Pilots Models for Safer Aircraft” September 2013 – August 2016 DLR (the German NASA), BUT, Honeywell Brno, others Multimodal cockpit. BUT Speech@FIT Honza Cernocky 05/2016
Current projects II DARPA Lorelei TAČR Meeting assistant „MINT“ BUT, Phonexia, Lingea, Tovek: 10/2014 - 12/2017 Recognition and analysis of meetings Follow up of M4, AMI, AMIDA, this time with a real commercial output. EU Horizon 2020 BISON Phonexia, BUT, Telefonica I&D, MyForce, U. Bologna, contact centers Big speech data mining for contact centers, incl. business output and (lots of!) legal and ethical issues. DARPA Lorelei University of South California in L.A., BUT and others, 2015-2019 situational awareness by identifying elements of information in foreign language and English sources, such as topics, names, events, sentiment and relationships BUT Speech@FIT Honza Cernocky 05/2016
Thank you for your attention ! http://speech.fit.vutbr.cz/ BUT Speech@FIT Honza Cernocky 05/2016