Deep Exploration and Filtering of Text (DEFT)

Deep Exploration and Filtering of Text (DEFT)
Denys Katerenchuk Speech Lab, Queens College, CUNY

DEFT DARPA, Department of Defense Excess of data
Important data may be implicit Efficiency Goal: Develop automated deep NLP technology for efficient information processing and understanding

DEFT “Overwhelmed by deadlines and the sheer volume of available foreign intelligence, analysts may miss crucial links, especially when meaning is deliberately concealed or otherwise obfuscated” Bonnie Dorr, DARPA program manager

Overview Introduction Innovation Algorithms and Tools Results
Future work

Introduction NLP has a range of tools to discover target data
Information Retrieval Automated Speech Recognition Machine Translation Etc. Robust combination is needed!

Information-Rich Relational Analysis for Spoken Data
Innovation Information-Rich Relational Analysis for Spoken Data

Hypothesis Combine rich speech representation and annotations with prosodic features to improve F- measure score

Motivation Speech is different from text
Speech may contain disfluency and depends on ASR performance Named Entities are often OOV Prosodic information is ignored

ASR Hypothesis Representation
Lattice CN

Lattice vs CN CN Much smaller in size and less processing time
Better align (improve WER) Contain posterior probabilities Lattice Contain complete hypothesis

Automatic Speech Recognition
KALDI ( Powerful open source tool for ASR Written in C++ and can be easily modified CN and Lattice support Complete recipes for some common corpora Supports Grid Engine

Named Entity Recognition
Blender Name Tagger Simple to use Works with ACE data (corpus) Using MALLET Machine Learning toolkit

Prosody Analysis AuToBI The must tool for extraction prosodic features
No need for introduction

Algorithm Speech AuToBI KALDI (ASR) Prosodic features Text New
Representation Clusters Blender Name Tagger (NER) Model Blender Name Tagger (NER)

ASR Results KALDI Training time ~ 78 hours WSJ Models Triphone
SGMM – (better, but not ready yet) 1 best Triphone Recognizer performance: % (WER)

Results

Future work Improve current ASR model Improve current CN approach
Add n-best and oracle models Extend feature set Event Recognition

Thank you!

Deep Exploration and Filtering of Text (DEFT)

Similar presentations

Presentation on theme: "Deep Exploration and Filtering of Text (DEFT)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Deep Exploration and Filtering of Text (DEFT)

Similar presentations

Presentation on theme: "Deep Exploration and Filtering of Text (DEFT)"— Presentation transcript:

Similar presentations

About project

Feedback