Machine Learning in Practice Lecture 14 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Slides:



Advertisements
Similar presentations
The essentials managers need to know about Excel
Advertisements

Meet Manager Training Updated April Step 1: Set up your meet Download the meet template Restore the template Purge Old Data Setup Meet Information,
Machine Learning Homework
KompoZer. This is what KompoZer will look like with a blank document open. As you can see, there are a lot of icons for beginning users. But don't be.
Lecture 4 Basic Scripting. Administrative  Files on the website will be posted in pdf for compatibility  Website is now mirrored at:
 Use the Left and Right arrow keys or the Page Up and Page Down keys to move between the pages. You can also click on the pages to move forward.  To.
CP308-1L: Working with Actions and the Action Recorder Lee Ambrosius Autodesk, Inc Sr. Technical Writer.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Tailoring Needs Chapter 3. Contents This presentation covers the following: – Design considerations for tailored data-entry screens – Design considerations.
Calendar Browser is a groupware used for booking all kinds of resources within an organization. Calendar Browser is installed on a file server and in a.
InfoMagnets : Making Sense of Corpus Data Jaime Arguello Language Technologies Institute.
Welcome to Turnitin.com’s Peer Review! This introductory tour will take you through our Peer Review system and explain the steps you need to get started.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
How to use Postcron Margaret Huck. What is Postcron and How does it help me manage my online FB parties?  Postcron is an essential lifesaving, timesaving.
1 Flash Programming Introduction Script Assist. 2 Course Description This course concentrates on the teaching of Actionscript, the programming language.
 When you receive a new you will be shown a highlighted in yellow box where your can be found  To open your new just double click.
Microsoft ® Office PowerPoint ® 2003 Training Playing movies [Your company name] presents:
Microsoft ® Office Word 2007 Training Mail Merge II: Use the Ribbon and perform a complex mail merge [Your company name] presents:
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
One to One instructions Installing and configuring samba on Ubuntu Linux to enable Linux to share files and documents with Windows XP.
CHAPTER 4: INTRODUCTION TO COMPUTER ORGANIZATION AND PROGRAMMING DESIGN Lec. Ghader Kurdi.
Your User Name is the first portion of your Carleton Connect account eg. mroger4 if the was n.ca
Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.
1 Direct Manipulation Proposal 17 Direct Manipulation is when physical actions are used instead of commands. E.g. In a word document when the user inputs.
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
MFI Student Records Database Tutorial. Starting Out When you first open the database, it should look something like this: Adam M Moore coded this awesome.
TagHelper: Basics Part 1 Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval.
Introduction to Visual Basic. Quick Links Windows Application Programming Event-Driven Application Becoming familiar with VB Control Objects Saving and.
Spring Learning Statement #1 I am learning that the use of technology is not always engaging for students and does not always support student learning.
Mail merge I: Use mail merge for mass mailings Perform a complete mail merge Now you’ll walk through the process of performing a mail merge by using the.
How to use the internet The internet is a wide ranging network that thousands of people use everyday. It is a useful tool in modern society that once one.
1 Project Information and Acceptance Testing Integrating Your Code Final Code Submission Acceptance Testing Other Advice and Reminders.
Martin Dodge Practical 2, 24th March 2004, pm Social Science Research Methodologies.
Observation & Analysis. Observation Field Research In the fields of social science, psychology and medicine, amongst others, observational study is an.
Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.
Moving Around in Scratch The Basics… -You do want to have Scratch open as you will be creating a program. -Follow the instructions and if you have questions.
Downloading and Installing Autodesk Revit 2016
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
FIX Eye FIX Eye Getting started: The guide EPAM Systems B2BITS.
Downloading and Installing Autodesk Inventor Professional 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the.
 When you receive a new you will be shown a highlighted in yellow box where your can be found  To open your new just double click.
Grade Book Database Presentation Jeanne Winstead CINS 137.
Renesas Technology America Inc. 1 M16C Seminars Lab 3 Creating Projects Using HEW4 14 March 2005 M16C Seminars Lab 3 Creating Projects Using HEW4 Last.
Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.
CSD 340 (Blum)1 Starting JavaScript Homage to the Homage to the Square.
Student Perceptions of Hybrid Courses. Like about Hybrid Format Course 1 For a few weeks, can take things at your own pace Can cover more topics in less.
TypeCraft Software Evaluation 21/02/ :45 Powered by None Complete: 10 On, Partial: 0 Off, Excluded: 0 Off Country: All, Region:
1. 2 Download Windows Media Player 10: Download PhotoStory3
InfoMagnets : Making Sense of Corpus Data Jaime Arguello Language Technologies Institute.
JavaScript Introduction and Background. 2 Web languages Three formal languages HTML JavaScript CSS Three different tasks Document description Client-side.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
This was written with the assumption that workbooks would be added. Even if these are not introduced until later, the same basic ideas apply Hopefully.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 14 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
SCC P2P – Collaboration Made Easy Contract Management training
Database application MySQL Database and PhpMyAdmin
Using Excel with Google Maps
Create login screen Decide how you want you log in screen to work. I have 3 examples of different difficulty/approach, but you should have your own ideas.
Microsoft Word Reviewing Documents.
Chapter 1: Introduction to SAS
Teaching Listening Based on Active Learning.
For Computer-Based Testing
For Computer-Based Testing
Machine Learning in Practice Lecture 27
Med-Fi Prototype Presentation
Computational Models of Discourse Analysis
Presentation transcript:

Machine Learning in Practice Lecture 14 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

Plan for the Day Announcements  Questions?  Assignment 6 More about Text  Using TagHelper Tools  Discussion about Assignment 6  Museli Paper

Using TagHelper Tools

Setting Up Your Data

You can also add additional features to the right of the text column Extra Features

TagHelper Tools Process TagHelper Labeled Texts Unlabeled Texts Labeled Texts A Model that can Label More Texts

Running TagHelper Tools Click on the portal.bat executable

Training and Testing Start TagHelper tools by double clicking on the portal.bat icon in your TagHelperTools2 folder You will then see the following tool pallet The idea is that you will train a prediction model on your coded data and then apply that model to uncoded data Click on Train New Models

Loading a File First click on Add a File Then select a file

Simplest Usage Click “GO!” TagHelper will use its default setting to train a model on your coded examples It will use that model to assign codes to the uncoded examples

More Advanced Usage Another option is to modify the default settings You get to the options you can set by clicking on >> Options After you finish that, click “GO!”

Output You can find the output in the OUTPUT folder User Defined Features  UserDefinedFeatures_[name of input file].txt  E.g., UserDefinedFeatures_SimpleExample.xls.txt Performance Report  Eval_[name of coding dimension]_[name of input file].txt  E.g., Eval_Code_SimpleExample.xls.txt Output File  [name of input file]_OUTPUT.xls  E.g., SimpleExample_OUTPUT.xls

User Defined Feature File You can reuse these If you load these as the default user defined features, you don’t have to create them again by hand You do have to insert them manually

Loading Your User Defined Features Put your user defined feature file here

Loading Your User Defined Features

Double click

Loading Your User Defined Features Then click here

Loading Your User Defined Features

Or export to csv

Loading Your User Defined Features Now you can just copy columns for new features into your input file Will be treated like the extra features to the right of the text column You need to reload the long way when you create the final model

Using the Output file Prefix If you use the Output file prefix, the text you enter will be prepended to the output files Prefix1_Eval_Code_SimpleExample.xls.txt Prefix1_SimpleExample.xls

Performance report The performance report tells you:  What dataset was used  What the customization settings were  At the bottom of the file are reliability statistics and a confusion matrix that tells you which types of errors are being made

Performance report The performance report tells you:  What dataset was used  What the customization settings were  At the bottom of the file are reliability statistics and a confusion matrix that tells you which types of errors are being made

Performance report The performance report tells you:  What dataset was used  What the customization settings were  At the bottom of the file are reliability statistics and a confusion matrix that tells you which types of errors are being made

Output File The output file contains  The codes for each segment  Note that the segments that were already coded will retain their original code  The other segments will have their automatic predictions  The prediction column indicates the confidence of the prediction

Applying a Trained Model Select a model file Then select a testing file

Applying a Trained Model Testing data should be set up with ? on uncoded examples Click Go! to process file

Results

Assignment 6

Example Negative Review in this re-make of the 1954 japanese monster film, godzilla is transformed into a " jurassic park " copy who swims from the south pacific to new york for no real reason and trashes the town. although some of the destruction is entertaining for a while, it gets old fast. the film often makes no sense ( a several-hundred foot tall beast hides in subway tunnels ), sports second-rate effects ( the baby godzillas seem to be one computer effect multiplied on the screen ), lame jokes ( mayor ebert and his assistant gene are never funny ), horrendous acting ( even matthew broderick is dull ) and an unbelievable love story ( why would anyone want to get back together with maria pitillo's character ? ). there are other elements of the film that fall flat, but going on would just be a waste of good words. only for die-hard creature feature fans, this might be fun if you could check your brain at the door. i couldn't. ( michael redman has written this column for 23 years and has seldom had a more disorienting cinematic experience than seeing both " fear and loathing " and " godzilla " in the same evening. )

Example Positive Review sometimes a movie comes along that falls somewhat askew of the rest. some people call it " original " or " artsy " or " abstract ". some people simply call it " trash ". a life less ordinary is sure to bring about mixed feelings. definitely a generation-x aimed movie, a life less ordinary has everything from claymation to profane angels to a karaoke-based musical dream sequence. whew ! anyone in their 30's or above is probably not going to grasp what can be enjoyed about this film. it's somewhat silly, it's somewhat outrageous, and it's definitely not your typical romance story, but for the right audience, it works. a lot of hype has been surrounding this film due to the fact that it comes to us from the same team that brought us trainspotting. well sorry folks, but i haven't seen trainspotting so i can't really compare. whether that works in this film's favor or not is beyond me. but i do know this : ewan mcgregor, whom i had never had the pleasure of watching, definitely charmed me. he was great ! cameron diaz's character was uneven and a bit hard to grasp. the audience may find it difficult to care about her, thus discouraging the hopes of seeing her unite with mcgregor

Positive Review Continued after we are immediately sucked into caring about and identifying with him. misguided? you bet. loveable ? you bet. a life less ordinary was a delight and even had a bonus for me when i realized it was filmed in my hometown of salt lake city, utah. this was just one more thing i didn't know about this movie when i sat down with a five dollar order of nachos and a three dollar coke. maybe not knowing the premise behind this film made for a pleasant surprise, but i think even if i had known, i would have been just as happy. a life less ordinary is quirky, eccentric, and downright charming ! not for everyone, but a definite change of pace for your typical night at the movies.

Note that the texts are LONG!!!

Takes about 15 minutes on my machine!

Using the Display Option

Helpful Hints Use Feature Selection! Limit the number of times you use the Advanced Feature Editing interface  Export the features you create to CSV so you can reuse the already created versions You can use Weka once you dump out a.arff file from TagHelper tools Do your experimentation strategically  Note that POS tagging is slow

Museli

Definition of “Topic” in Dialogue Discourse Segment Purpose (Passonneau and Litman, 1994), based on (Grosz and Sidner, 1984)  TOPIC SHIFT = SHIFT IN PURPOSE that is acknowledge and acted upon by both dialogue participants  Example: T: Let me know once you are done reading. T: I’ll be back in a min. T: Are you done reading? S: not yet. T: ok T: Do you know where to enter all the values? S: I think so. S: I’ll ask if I get stuck though.... Tutor wants to know when student is ready to start the session. Tutor checks if student knows how to setup the analysis

Definition of “Topic” in Dialogue Discourse Segment Purpose (Passonneau and Litman, 1994), based on (Grosz and Sidner, 1984)  TOPIC SHIFT = SHIFT IN PURPOSE that is acknowledge and acted upon by both dialogue participants  Example: T: Let me know once you are done reading. T: I’ll be back in a min. T: Are you done reading? S: not yet. T: ok T: Do you know where to enter all the values? S: I think so. S: I’ll ask if I get stuck though.... Tutor wants to know when student is ready to start the session. Tutor checks if student knows how to setup the analysis

Overview of Single Evidence Source Approaches Models based on lexical cohesion  TextTiling (Hearst, 1997)  Foltz (Foltz, 1998)  Olney & Cai (Olney & Cai, 2005) Models relying on regularities in topic sequencing  Barzilay & Lee (Barzilay & Lee, 2004)

MUSELI Integrates multiple sources of evidence of topic shift Features:  Lexical Cohesion (via cosine correlation)  Time lag between contributions  Unigrams (previous and current contribution)  Bigrams (previous and current cont.)  POS Bigrams (previous and current cont.)  Contribution Length  Previous/Current Speaker  Contribution of Content Words

* P <.005 Experimental Corpora Olney and Cai Corpus Thermo Corpus # Dialogues 4222 Conts./Dialogue Words/Cont *5.12* Conts./Topic 24*13.31* Topics/ Dialogue 8.14*16.36* Tutor Conts./ Dialogue * Student Conts./ Dialogue * Our thermo corpus:  Is more terse!  Has fewer Contributions!  Has more Topics/Dialogue!  Strict turn-taking not enforced! Olney and Cai (Olney and Cai, 2005) Thermo corpus: student/tutor optimization problem, unrestricted interaction, virtually co-present

Baseline Degenerate Approaches ALL: every contribution = NEW_TOPIC EVEN: every n th contribution = NEW_TOPIC NONE: no NEW_TOPIC

Two Evaluation Metrics A metric commonly used to evaluate topic segmentation algorithms (Olney & Cai, 2005)  F-measure: Precision (P): # correct predictions / # predictions Recall (R): # correct predictions / # boundaries An additional metric designed specifically for segmentation problems (Beeferman et al., 1999)  P k : Pr(error|k) The probability that two contributions, separated by k contributions, are misclassified Effective if k = ½ average topic length

Experimental Results Olney and Cai Corpus Thermodynamics Corpus PkPk FPkPk F NONE ALL EVEN TT B&L Foltz Ortho Museli Compared to degenerates: > NO DEG. > 1 DEG. > ALL 3 DEG. P <.05

Experimental Results Olney and Cai Corpus Thermodynamics Corpus PkPk FPkPk F NONE ALL EVEN TT B&L Foltz Ortho Museli Museli > all approaches in BOTH corpora P <.05

Take Home Message We explored some of TagHelper tools’s functionality TagHelper provides simple linguistic features like bigrams and POS bigrams that can be useful for classification Assignment 6 will give you realistic experience working with text on a non- trivial classification task The most important thing for Assignment 6 is to be strategic!