Download presentation
Presentation is loading. Please wait.
Published byRolf Hubbard Modified over 9 years ago
1
Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval Research, Cognitive and Neural Sciences Division
2
Outline New Feature Creation Error Analysis
3
New Feature Creation
4
Why create new features? You may want to generalize across sets of related words Color = {red,yellow,orange,green,blue} Food = {cake,pizza,hamburger,steak,bread} You may want to detect contingencies The text must mention both cake and presents in order to count as a birthday party You may want to combine these The text must include a color and a food
5
Why create new features by hand? More likely to capture meaningful generalizations Build in knowledge so you can get by with less training data
6
Rule Language ANY() is used to create lists COLOR = ANY(red,yellow,green,blue,purple) FOOD = ANY(cake,pizza,hamburger,steak,bread) ALL() is used to capture contingencies ALL(cake,presents) More complex rules ALL(COLOR,FOOD)
7
Group Project: Make a rule that will match against questions but not statements QuestionTell me what your favorite color is. StatementI tell you my favorite color is blue. QuestionWhere do you live? StatementI live where my family lives. QuestionWhich kinds of baked goods do you prefer StatementI prefer to eat wheat bread. QuestionWhich courses should I take? Statement You should take my applied machine learning course. QuestionTell me when you get up in the morning. StatementI get up early.
8
Possible Rule ANY(ALL(tell,me),BOL_WDT,BOL_WRB)
9
Advanced Feature Editing * Click here
10
Types of Basic Features Primitive features inclulde unigrams, bigrams, and POS bigrams
11
Types of Basic Features The Options change which primitive features show up in the Unigram, Bigram, and POS bigram lists You can choose to remove stopwords or not You can choose whether or not to strip endings off words with stemming You can choose how frequently a feature must appear in your data in order for it to show up in your lists
12
Types of Basic Features * Now let’s look at how to create new features.
13
Creating New Features *The feature editor allows you to create new feature definitions * Click on + to add your new feature
14
Examining a New Feature Right click on a feature to examine where it matches in your data
15
Examining a New Feature
16
Error Analysis
17
Create an Error Analysis File
18
Use TagHelper to Code Uncoded File The output file contains the codes TagHelper assigned. What you want to do now is to remove prediction column and insert the correct answers next to the TagHelper assigned answers.
19
Load Error Analysis File
21
Error Analysis Strategies Look for large error cells in the confusion matrix Locate the examples that correspond to that cell What features do those examples share? How are they different from the examples that were classified correctly?
22
Group Project Load in the NewsGroupTrain.xls data set What is the best performance you can get by playing with the standard TagHelper tools feature options? Train a model using the best settings and then use it to assign codes to NewsGroupTest.xls Copy in Answer column from NewsGroupAnswers.xls Now do an error analysis to determine why frequent mistakes are being made How could you do better?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.