Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 I256: Applied Natural Language Processing Marti Hearst October 18, 2006.

Similar presentations


Presentation on theme: "1 I256: Applied Natural Language Processing Marti Hearst October 18, 2006."— Presentation transcript:

1 1 I256: Applied Natural Language Processing Marti Hearst October 18, 2006

2 2 Community-based Summarizer Results on training data with cross-validation?

3 3 Community-based Summarizer Results on test data:

4 4 Problems with Community Code Not reading the instructions: Hardcoding directory paths Hardcoding filenames of testing files Here is an easy way to do it generally: import os files = os.listdir(“dirname”) So the code should take two parameters: –Directory name containing the documents –Filename in which to write the output

5 5 Problems with Community Code Not reading the instructions: Hardcoding directory paths within the code Hardcoding filenames of testing files Here is an easy way to do it generally: import os files = os.listdir(“dirname”) So the code should take two parameters: –Directory name containing the documents –Filename in which to write the output

6 6 Problems with Community Code What I did wrong: Had said in class that the files should be self- contained but didn’t put that into the assignment description. Should have said explicitly that you should take as input a directory name and an output filename. Should have made an easy way to indicate if external files were needed, and what they were. Should have added another task: analyze the individual features contribution.

7 7 Final Projects I’d like proposals in two weeks (Nov 1) Gives me a week to give you feedback We’ll spend about 5 weeks on the projects I want to give you one or two more homeworks Class presentations the week of Dec 5, but projects due the following week You can work in teams of 2 (maybe 3, depends on the project)

8 8 Final Project Ideas Blog analysis Categorize blog topics (maybe including link analysis) Segment blogs into pieces based on topics Do blog author analysis Summarize blog reaction to some event, e.g., what did people think of “An Inconvenient Truth” There is a contest on this: http://www.icwsm.org/ Do analysis as input for an interesting viz: http://benfry.com/linking/

9 9 Final Project Ideas Analyze the accuracy of best-paper awards* Often given out for conferences How prescient are these awards?

10 10 Final Project Ideas Create a Negativity/Emotion/Flame Recognizer There is some related work, but this is somewhat under-explored

11 11 Final Project Ideas Improve an Automatic Faceted Hierarchy Creation Tool* Students used this two years ago for making a hierarchy for photo text Sample output on two collections: –http://orange.sims.berkeley.edu/cgi- bin/flamenco.cgi/recipes-automated/Flamencohttp://orange.sims.berkeley.edu/cgi- bin/flamenco.cgi/recipes-automated/Flamenco –http://orange.sims.berkeley.edu/cgi- bin/flamenco.cgi/recipes-automated/Flamencohttp://orange.sims.berkeley.edu/cgi- bin/flamenco.cgi/recipes-automated/Flamenco

12 12 Final Project Ideas Analyze profiles for online dating* Use characteristics from social psychology to score them Use other metrics as well.

13 13 Final Project Ideas Work on a timeline comparison project One idea: use output of the new Google news archive Create input for a visualizer built by students last semester: http://www2.sims.berkeley.edu/courses/is247/f05/projects/timelinecompare/


Download ppt "1 I256: Applied Natural Language Processing Marti Hearst October 18, 2006."

Similar presentations


Ads by Google