Download presentation
Presentation is loading. Please wait.
1
Classifying Movie Scripts by Genre Alex Blackstock Matt Spitz 6/9/08
2
Overview Motivation classifying movie scripts may identify box office flops and successes before they're even produced! Data freely-available movie scripts (DailyScripts.com, etc) IMDB genres (several labels/movie) Tools Lucene MEMM from PA3 jBNC (naïve Bayes classifier) Stanford Named Entity Recognizer Stanford Part-Of-Speech Tagger
3
Processing Scripts
4
Features Non-NLP dialogue shape character information NLP POS ratios Named Entity appearances Character-Based NLP analyze individual characters exclamations main vs. secondary
5
Evaluation Metrics Example output: Blade II (gold labels: Action, Thriller, Horror) guessed labels: Action, Adventure, Horror, Thriller,... F1 Score per genre weighted-average over all genres # of guesses allowed = # of gold labels Partial Credit Score allows for some error # guesses allowed = # of gold labels * 1.5 penalized for guesses that are beyond # gold labels, but still get points
6
Conclusions Success! best feature set: basic NLP & POS tagging PC Score: 0.601 F1 Score: 0.551 Classifier comparison (jBNC) N-way classification problem 22 genres average of 3.02 genres/datum Dataset Issues consistency diversity size
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.