Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Slides:



Advertisements
Similar presentations
How to utilize your iTunes for use with your SwiMP3 Please be aware that all the SwiMP3 line of players do not have licensing to any of the songs that.
Advertisements

Microsoft ® Office 2007 Overview Get up to speed with the 2007 system Leon County Schools presents:
PowerPoint Presentations
Microsoft® Office Outlook® 2003 Training
All About Me Highlight All About Me Change the font Change the size of the font Change the color of the font Try Text Effects Replace the clip art Move.
Welcome to IT-Training -We’re here to teach you PowerPoint-
Tutorial 8: Developing an Excel Application
MACROS CS1100 Computer Science and its Applications CS11001.
Programming in Visual Basic
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
An Online Microsoft Word Tutorial & Evaluation Begin.
Discipline, Crime, and Violence October 2014 Tara K. McDaniel, M.S.
1 © 2006 by Smiths Group: Proprietary Data Smiths Group Online Performance Review Tool Training.
Machine Learning in Practice Lecture 3 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
TagHelper: User’s Manual Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center.
Microsoft ® Office Word 2007 Training Mail Merge II: Use the Ribbon and perform a complex mail merge [Your company name] presents:
® IBM Software Group © 2006 IBM Corporation The Eclipse Data Perspective and Database Explorer This section describes how to use the Eclipse Data Perspective,
COMPREHENSIVE Excel Tutorial 8 Developing an Excel Application.
Introduction to VBA. This is not Introduction to Excel We’re going to assume you have a basic level of familiarity with Excel If you don’t, or you need.
Wikispaces in Education Tutorial Jennifer Carrier Dorman
TagHelper & SIDE Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
Dreamweaver – Setting up a Site and Page Layouts Web Design Section 7-2 Part or all of this lesson was adapted from the University of Washington’s “Web.
Access The L Line The Express Line to Learning 2007 L Line L © Wiley Publishing All Rights Reserved.
TagHelper: Basics Part 1 Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval.
1 / 12 PSLC Summer School, June 21, 2007 Identifying Students’ Gradual Understanding of Physics Concepts Using TagHelper Tools Nava L.
1 ADVANCED MICROSOFT WORD Lesson 16 – Customizing Features Microsoft Office 2003: Advanced.
Microsoft Word 2010 Visit: Pass4sureofficial.com is.
Examining data using Microsoft Access Queries Using Criteria and Calculations SESSION 3.2 This section covers specifying an exact match condition in a.
Homework #4 HTML Web Assignment II ©2001 E. Kinnear.
Colleague, Excel & Word Best of Friends Presented by: Joan Kaun & Yvonne Nelson College of the Rockies.
Oracle Data Integrator Transformations: Adding More Complexity
9/2/ CS171 -Math & Computer Science Department at Emory University.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
POWER TEACHER GRADE BOOK TRAINING Tracy Tom Tina.
Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.
Data Management Seminar, 8-11th July 2008, Hamburg WinW3S – Listing & Sampling Teachers.
By max guerrero,bryan hernandez,caleb Portales  Spreadsheets are set up like tables with information running across rows and down columns. You could.
Machine Learning in Practice Lecture 13 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
HTML HYPER TEXT MARKUP LANGUAGE. INTRODUCTION Normal text” surrounded by bracketed tags that tell browsers how to display web pages Pages end with “.htm”
1 Copyright © 2014 Tata Consultancy Services Limited Assessment Knowledge Center – Item Creation Training Document.
On Line Microsoft Word Tutorial & Evaluation Begin.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Open Office Writer Introduction AOSS _ Course material AOSS Master training workshop Singapore 2007.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
Dreamweaver – Setting up a Site and Page Layouts Web Design Section 7-2 Part or all of this lesson was adapted from the University of Washington’s “Web.
The Power of MS Office Using Microsoft Office in the Classroom to Support Struggling Students Bonnie Young Wendy Homlish\ Donna Hibshman CLIU 21.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 14 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Excel Tutorial 8 Developing an Excel Application
Advanced data mining with TagHelper and Weka
Data Validation and Protecting Workbook
Dreamweaver – Setting up a Site and Page Layouts
Microsoft Excel 2003 Illustrated Complete
Degree works plans training
I like to eat pie. I like to eat cake. Create Your Own!
SSI Toolbox Status Workbook Overview
Machine Learning in Practice Lecture 11
© 2016 Blackboard Inc. All rights reserved..
Tutorial for LightSIDE
Lesson 4 Creating a page with Web Matrix
Machine Learning in Practice Lecture 7
Machine Learning in Practice MidTerm Review
Word 2007 – Tips and Techniques
CBA Assessments in Eduphoria Aware
YOUR text YOUR text YOUR text YOUR text
Running a Java Program using Blue Jay.
Presentation transcript:

Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval Research, Cognitive and Neural Sciences Division

Outline  New Feature Creation  Error Analysis

New Feature Creation

Why create new features?  You may want to generalize across sets of related words  Color = {red,yellow,orange,green,blue}  Food = {cake,pizza,hamburger,steak,bread}  You may want to detect contingencies  The text must mention both cake and presents in order to count as a birthday party  You may want to combine these  The text must include a color and a food

Why create new features by hand?  More likely to capture meaningful generalizations  Build in knowledge so you can get by with less training data

Rule Language  ANY() is used to create lists  COLOR = ANY(red,yellow,green,blue,purple)  FOOD = ANY(cake,pizza,hamburger,steak,bread)  ALL() is used to capture contingencies  ALL(cake,presents)  More complex rules  ALL(COLOR,FOOD)

Group Project: Make a rule that will match against questions but not statements QuestionTell me what your favorite color is. StatementI tell you my favorite color is blue. QuestionWhere do you live? StatementI live where my family lives. QuestionWhich kinds of baked goods do you prefer StatementI prefer to eat wheat bread. QuestionWhich courses should I take? Statement You should take my applied machine learning course. QuestionTell me when you get up in the morning. StatementI get up early.

Possible Rule  ANY(ALL(tell,me),BOL_WDT,BOL_WRB)

Advanced Feature Editing * Click here

Types of Basic Features  Primitive features inclulde unigrams, bigrams, and POS bigrams

Types of Basic Features  The Options change which primitive features show up in the Unigram, Bigram, and POS bigram lists  You can choose to remove stopwords or not  You can choose whether or not to strip endings off words with stemming  You can choose how frequently a feature must appear in your data in order for it to show up in your lists

Types of Basic Features * Now let’s look at how to create new features.

Creating New Features *The feature editor allows you to create new feature definitions * Click on + to add your new feature

Examining a New Feature Right click on a feature to examine where it matches in your data

Examining a New Feature

Error Analysis

Create an Error Analysis File

Use TagHelper to Code Uncoded File The output file contains the codes TagHelper assigned. What you want to do now is to remove prediction column and insert the correct answers next to the TagHelper assigned answers.

Load Error Analysis File

Error Analysis Strategies  Look for large error cells in the confusion matrix  Locate the examples that correspond to that cell  What features do those examples share?  How are they different from the examples that were classified correctly?

Group Project  Load in the NewsGroupTrain.xls data set  What is the best performance you can get by playing with the standard TagHelper tools feature options?  Train a model using the best settings and then use it to assign codes to NewsGroupTest.xls  Copy in Answer column from NewsGroupAnswers.xls  Now do an error analysis to determine why frequent mistakes are being made  How could you do better?