Classifying Movie Scripts by Genre Alex Blackstock Matt Spitz 6/9/08.

Slides:



Advertisements
Similar presentations
Large-Scale Entity-Based Online Social Network Profile Linkage.
Advertisements

Tweet Classification for Political Sentiment Analysis Micol Marchetti-Bowick.
TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.
HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task Robert Dale, Ilya Anisimoff and George Narroway Centre for Language Technology.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.
A Brief Overview. Contents Introduction to NLP Sentiment Analysis Subjectivity versus Objectivity Determining Polarity Statistical & Linguistic Approaches.
Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University.
Literary Style Classification with Deep Linguistic Features Hyung Jin Kim Minjong Chung Wonhong Lee.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Assuming normally distributed data! Naïve Bayes Classifier.
Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.
Beauty and the beast Characters in the film (Lesson Two)
Beauty and the beast The trailer of the film. Beauty and the beast First, you will learn how to analyze film trailer. Second, you will learn how to analyze.
How does computer know what is spam and what is ham?
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
What is a review? A review is an evaluation of a publication, a product or a service, such as a movie (a movie review) A review consists of: The title.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Towards Improving Classification of Real World Biomedical Articles Kostas Fragos TEI of Athens Christos Skourlas TEI of Athens
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Computer Science and Engineering Lehigh University.
ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer 1 Ting-Hao (Kenneth) Huang Yun-Nung (Vivian) Chen Lingpeng Kong
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Acknowledgements Contact Information Objective An automated annotation tool was developed to assist human annotators in the efficient production of a high.
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Mebi 591D – BHI Kaggle Class Baselines kaggleclass.weebly.com/
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
05/03/03-06/03/03 7 th Meeting Edinburgh Naïve Bayes Fact Extractor (NBFE) v.1.
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
Natural language processing tools Lê Đức Trọng 1.
Semi-supervised Dialogue Act Recognition Maryam Tavafi.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
Hendrik J Groenewald Centre for Text Technology (CTexT™) Research Unit: Languages and Literature in the South African Context North-West University, Potchefstroom.
Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.
©2012 Paula Matuszek CSC 9010: Text Mining Applications Lab 3 Dr. Paula Matuszek (610)
A Statistical Model for Multilingual Entity Detection and Tracking R. Florian, H. Hassan, A. Ittycheriah, H. Jing, N. Kambhatla, X. Luo, N. Nicolov, S.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Selective Sampling for Information Extraction with a Committee of Classifiers Evaluating Machine Learning for Information Extraction, Track 2 Ben Hachey,
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
Musical Genre Categorization Using Support Vector Machines Shu Wang.
Song Genre and Artist Classification via Supervised Learning from Lyrics Adam Sadovsky Xing Chen CS 224N Final Project.
Learning part-of-speech taggers with inter-annotator agreement loss EACL 2014 Barbara Plank, Dirk Hovy, Anders Søgaard University of Copenhagen Presentation:
Text Classification and Naïve Bayes Naïve Bayes (I)
Writing a movie review.
A Simple Approach for Author Profiling in MapReduce
Named Entity Tagging with Conditional Random Fields
Sentiment Analyzer Using a Multi-Level Classifier
Natural Language Processing of Knee MRI Reports
Feature Film Features: Applying machine learning to movie genre identification  CSCI 5622 Group L: Grant Baker, John Dinkel, Derek Gorthy, Jeffrey Maierhofer,
The Open World of Micro-Videos
Tagging Review Comments Rationale #10 Week 13
Visualizing and Understanding Convolutional Networks
Concave Minimization for Support Vector Machine Classifiers
Mark Chavira Ulises Robles
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Writing a movie review.
SNoW & FEX Libraries; Document Classification
Embedding based entity summarization
Presentation transcript:

Classifying Movie Scripts by Genre Alex Blackstock Matt Spitz 6/9/08

Overview Motivation classifying movie scripts may identify box office flops and successes before they're even produced! Data freely-available movie scripts (DailyScripts.com, etc) ‏ IMDB genres (several labels/movie) ‏ Tools Lucene MEMM from PA3 jBNC (naïve Bayes classifier) ‏ Stanford Named Entity Recognizer Stanford Part-Of-Speech Tagger

Processing Scripts

Features Non-NLP dialogue shape character information NLP POS ratios Named Entity appearances Character-Based NLP analyze individual characters exclamations main vs. secondary

Evaluation Metrics Example output: Blade II (gold labels: Action, Thriller, Horror) ‏ guessed labels: Action, Adventure, Horror, Thriller,... F1 Score per genre weighted-average over all genres # of guesses allowed = # of gold labels Partial Credit Score allows for some error # guesses allowed = # of gold labels * 1.5 penalized for guesses that are beyond # gold labels, but still get points

Conclusions Success! best feature set: basic NLP & POS tagging PC Score: F1 Score: Classifier comparison (jBNC) ‏ N-way classification problem 22 genres average of 3.02 genres/datum Dataset Issues consistency diversity size