Deep Exploration and Filtering of Text (DEFT)

Slides:



Advertisements
Similar presentations
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Advertisements

Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
1 Texmex – November 15 th, 2005 Strategy for the future Global goal “Understand” (= structure…) TV and other MM documents Prepare these documents for applications.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.
EMNLP Industry Panel Comments © 2001, David A. Evans, Clairvoyance Corporation 1June 4, 2001 The Rubber and the Road Industrial Perspectives on NLP EMNLP.
Detecting missrecognitions Predicting with prosody.
Incorporating Tone-related MLP Posteriors in the Feature Representation for Mandarin ASR Overview Motivation Tone has a crucial role in Mandarin speech.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
ELN – Natural Language Processing Giuseppe Attardi
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
1 An Assessment of a Speech-Based Programming Environment Andrew Begel Microsoft Research (formerly UC Berkeley)
Acknowledgements Contact Information Objective An automated annotation tool was developed to assist human annotators in the efficient production of a high.
Improving out of vocabulary name resolution The Hanks David Palmer and Mari Ostendorf Computer Speech and Language 19 (2005) Presented by Aasish Pappu,
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
1 Prosody-Based Automatic Segmentation of Speech into Sentences and Topics Elizabeth Shriberg Andreas Stolcke Speech Technology and Research Laboratory.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
School of something FACULTY OF OTHER School of Languages, Cultures and Societies – Faculty of Arts School of Computing – Faculty of Engineering Multilingual.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Basics of Natural Language Processing Introduction to Computational Linguistics.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
Tight Coupling between ASR and MT in Speech-to-Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° Reproducible.
N-best list reranking using higher level phonetic, lexical, syntactic and semantic knowledge sources Mithun Balakrishna, Dan Moldovan and Ellis K. Cave.
G. Anushiya Rachel Project Officer
Statistical Machine Translation Part II: Word Alignments and EM
Linguistic knowledge for Speech recognition
NSF Grant Number: IIS PI: Joseph Picone Institution: Mississippi State University Title: Integrating Prosody, Speech Recognition, Parsing In Spoken-Language.
Investigating Pitch Accent Recognition in Non-native Speech
PRESENTED BY: PEAR A BHUIYAN
Conditional Random Fields for ASR
A Country Report – COCOSDA Activities in China Data More and more companies on data resources and services suppliers are emerging in China: a new.
Text Analytics Giuseppe Attardi Università di Pisa
Automatic Speech Recognition
Social Knowledge Mining
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
Advanced NLP: Speech Research and Technologies
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Advanced NLP: Speech Research and Technologies
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
Automatic Speech Recognition: Conditional Random Fields for ASR
Translingual Knowledge Projection and Statistical Machine Translation
MONITORING MESSAGE STREAMS: RETROSPECTIVE AND PROSPECTIVE EVENT DETECTION Rutgers/DIMACS improve on existing methods for monitoring huge streams of textualized.
Command Me Specification
CS224N Section 3: Corpora, etc.
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
MONITORING MESSAGE STREAMS: RETROSPECTIVE AND PROSPECTIVE EVENT DETECTION Rutgers/DIMACS improve on existing methods for monitoring huge streams of textualized.
Idiap Research Institute University of Edinburgh
Presenter : Jen-Wei Kuo
CS224N Section 3: Project,Corpora
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
1-P-30 Speech-to-Speech Translation using Dual Learning and Prosody Conversion Zhaojie Luo, Yoichi Takashima, Tetsuya Takiguchi, and Yasuo Ariki (Kobe.
Presentation transcript:

Deep Exploration and Filtering of Text (DEFT) Denys Katerenchuk Speech Lab, Queens College, CUNY

DEFT DARPA, Department of Defense Excess of data Important data may be implicit Efficiency Goal: Develop automated deep NLP technology for efficient information processing and understanding

DEFT “Overwhelmed by deadlines and the sheer volume of available foreign intelligence, analysts may miss crucial links, especially when meaning is deliberately concealed or otherwise obfuscated” Bonnie Dorr, DARPA program manager

Overview Introduction Innovation Algorithms and Tools Results Future work

Introduction NLP has a range of tools to discover target data Information Retrieval Automated Speech Recognition Machine Translation Etc. Robust combination is needed!

Information-Rich Relational Analysis for Spoken Data Innovation Information-Rich Relational Analysis for Spoken Data

Hypothesis Combine rich speech representation and annotations with prosodic features to improve F- measure score

Motivation Speech is different from text Speech may contain disfluency and depends on ASR performance Named Entities are often OOV Prosodic information is ignored

ASR Hypothesis Representation Lattice CN

Lattice vs CN CN Much smaller in size and less processing time Better align (improve WER) Contain posterior probabilities Lattice Contain complete hypothesis

Automatic Speech Recognition KALDI (http://kaldi.sourceforge.net/) Powerful open source tool for ASR Written in C++ and can be easily modified CN and Lattice support Complete recipes for some common corpora Supports Grid Engine

Named Entity Recognition Blender Name Tagger Simple to use Works with ACE data (corpus) Using MALLET Machine Learning toolkit

Prosody Analysis AuToBI The must tool for extraction prosodic features No need for introduction

Algorithm Speech AuToBI KALDI (ASR) Prosodic features Text New Representation Clusters Blender Name Tagger (NER) Model Blender Name Tagger (NER)

ASR Results KALDI Training time ~ 78 hours WSJ Models Triphone SGMM – (better, but not ready yet) 1 best Triphone Recognizer performance: 67.37% (WER)

Results

Future work Improve current ASR model Improve current CN approach Add n-best and oracle models Extend feature set Event Recognition

Thank you!