Towards Separating Trigram- generated and Real sentences with SVM Jerry Zhu CALD KDD Lab 2001/4/20.

Slides:



Advertisements
Similar presentations
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
An Introduction of Support Vector Machine
IMAN SAUDY UMUT OGUR NORBERT KISS GEORGE TEPES-NICA BARLEY SEEDS CLASSIFICATION.
SI485i : NLP Set 11 Distributional Similarity slides adapted from Dan Jurafsky and Bill MacCartney.
CMPUT 466/551 Principal Source: CMU
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Pattern Recognition and Machine Learning
Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.
Modeling Consensus: Classifier Combination for WSD Authors: Radu Florian and David Yarowsky Presenter: Marian Olteanu.
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
Improving Speech Recognition with SVM Jerry Zhu CALD KDD Lab 2001/2/23 (Many thanks to all my reviewers!)
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
Who would be a good loanee? Zheyun Feng 7/17/2015.
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
A speech about Boosting Presenter: Roberto Valenti.
Graphical models for part of speech tagging
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
Predicting Consumer Choice Using Supermarket Scanner Data: Combining Parametric and Non-parametric Methods Elena Eneva April CALD.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Multimodal Information Analysis for Emotion Recognition
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Today Ensemble Methods. Recap of the course. Classifier Fusion
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Optimizing Local Probability Models for Statistical Parsing Kristina Toutanova, Mark Mitchell, Christopher Manning Computer Science Department Stanford.
Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma
Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
A Comprehensive Comparative Study on Term Weighting Schemes for Text Categorization with SVM Lan Man 3 Nov, 2004.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Conditional Random Fields & Table Extraction Dongfang Xu School of Information.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Language Model for Machine Translation Jang, HaYoung.
Artificial Intelligence 2004 Speech & Natural Language Processing
Modeling IDS using hybrid intelligent systems
Presentation transcript:

Towards Separating Trigram- generated and Real sentences with SVM Jerry Zhu CALD KDD Lab 2001/4/20

Domain: Speech Recognition A large portion of errors are due to over- generation of trigram language models. If we can detect trigram-generated sentences, we can improve accuracy. when the director of the many of your father and so the the monster and here is obviously a very very profitable business his views on today's seek level thanks very much for being with us

A two-class classification problem ‘fake’ (trigram-generated) or real sentence? Data: 100k fake and 100k real long (> 7 words) sentences. ‘fake’ sentences don’t look right (bad syntax), don’t make sense (bad semantics). Boils down to finding good features. Semantic coherence has been explored [Eneva et al], but not syntactic features, and the combination. SVM margin for probabilities.

Previous work: semantic features Around 70 semantic features, most interestingly: Content word co-occurrence statistics Content word repetition Decision tree + Boosting, around 80% accuracy. We hope the combination of syntactic features will significantly improve accuracy.

Exploring syntactic features Bag-of-word feature (raw counts, frequency, binary; linear or polynomial kernel) : 57% Tag with part-of-speech (39 POS): when/WRB the/DT director/NN of/IN the/DT many/NN of/IN your/PRP$ father/NN Bag-of-POS: 56% Sparse Sequence of POS: any k POS in that order, weighted by the span. 39 k features. ( … WRB-IN-DT …) … 5 ….

Exploring syntactic features (cnt.) Sparse Sequence works on letters for text categorization, but on POS: 58% (k=3, max span=8) Leave stopwords together with POS: WRB the NN of the many of your NN Sparse sequence on stopwords&POS: 57%

Exploring syntactic features (cnt.) Stopwords&POS 4-grams: novelty rate, count distribution likelihood ratio, min, max, median, mean counts These combined with semantic features: 75% Semantic features alone: 77%

SVM margin Empirically ‘good shape’

Summary Now we know these features don’t work… SVM wasn’t a wise choice with large amount of data and a lot of noise…

Future? Parsing Logistic regression instead of SVM