Presentation is loading. Please wait.

Presentation is loading. Please wait.

I256 Applied Natural Language Processing Fall 2009 Lecture 12 Projects Barbara Rosario.

Similar presentations


Presentation on theme: "I256 Applied Natural Language Processing Fall 2009 Lecture 12 Projects Barbara Rosario."— Presentation transcript:

1 I256 Applied Natural Language Processing Fall 2009 Lecture 12 Projects Barbara Rosario

2 2 Today Special guest: Rob Ennals, Intel Labs Berkeley More project ideas Next class –Finish up classification –Information extraction

3 3 Announcements Tuesday October 20 assignment 4 due –5% more if submitted at least 24 hours in advance –We’ll accept late submissions if: 1)You haven’t submitted late a previous homework And 2) You let me know in advance (by the day before) Thursday October 15 project proposal due http://courses.ischool.berkeley.edu/i256/f09/assign ments/project_proposal.html http://courses.ischool.berkeley.edu/i256/f09/assign ments/project_proposal.html –1 page –General idea/topic –(If you know already) what kind of data/resources would you like to use? –(If you know already) what methods do you think you'll use?

4 4 Projects important dates Thursday Oct 15: Proposal Due Thursday October 22: Receive Feedback on Proposal Thursday October 29: Turn in revised proposal (if required) Thursday November 12: Check point (more information later) Dec 1 and 3: Class Presentations Thursday Dec 10 (subject to change): Final Project Write- up due

5 5 Rob Ennals

6 6 Project ideas Whatever you like and are interested in! Ideally, it should have at least one of the following elements: Interesting, novel application and/or data –i.e. topic classification for reuter wouldn’t count…. –Twitter? New algorithm –Then you can use reuter data… Linguistic analysis –To inform the NLP! (i.e. analysis to be useful to a NLP algorithm task/algorithm) Implementation for novel use (iPhone?)

7 7 Scaling Up to Large Datasets System calls to external software Python is not able to perform the numerically intensive calculations required by machine learning methods nearly as quickly as lower-level languages such as C. On large datasets, you may find that the learning algorithm takes an unreasonable amount of time and memory to complete if you use the pure-Python machine learning implementations NLTK's facilities for interfacing with external machine learning packages. Once these packages have been installed, NLTK can transparently invoke them (via system calls) to train classifier models significantly faster than the pure-Python classifier implementations. See the NLTK webpage for a list of recommended machine learning packages that are supported by NLTK.

8 8 Software If you need some fancy (i.e. expensive) software, let me know asap –I may be able to buy it and let you use it for the projects An annotated list of resources http://nlp.stanford.edu/links/statnlp.html

9 9 Final Project Ideas NLP with me all the time: Interfaces 90% useful 90% of the time What are the NLP problems for a speech interfaces that is always with me? Take an audio recorder with you for a whole day. Record all the speech commands you would give to your perfect interface –Call mike –Write this message to sally hi sally movie tonight? –Remind me to buy milk when I go to the store –Put dentist on tue on the calendar –Where can I buy a bluetooth device nearby? –Set facebook status class today sucked glad is over –Twitter class today sucked glad is over

10 10 NLP with me all the time Analysis –Analyze the commands –How many types of actions/classes? –What NLP apps (translations? extractions, etc) –Call [Mike]: action/class = phone, argument = Mike NLP tasks: classification and extraction –Set Facebook status [class today sucked glad is over]: action/class = facebook, argument = [class today sucked glad is over] NLP tasks: classification and extraction Build a NLP algorithm for this data

11 11 NLP with me all the time Additional: note the context of what you were doing while you said the commands (we are interested in how the context can inform the NLP) –For example: send this picture to Annette –Context: Annette is in front of me

12 12 Final Project Ideas NLP summarization for audio interfaces –Summarize email, blogs, news article –Different lengths or incremental (tell me more, or tell me less –get to the point!) –(Are audio summaries different from written ones?)

13 13 Final Project Ideas Intel® Reader To assist people with various disabilities (blindness, dyslexia) The Intel Reader performs text-to-speech (TTS) on captured images (with OCR) and downloaded text files

14 14

15 15 Intel® Reader Text to speech: Improved Speech Output –Contextual Pronunciation TTS engines still relatively poor on context-based pronunciation variations –Examples: “LIVE” “LEAD” “I live in California” vs. “I watched the live performance of the concert” “That battery is made from lead” vs. “I will lead the troops into battle”

16 16 Final Project Ideas Two NLP problems for Intel® Reader Contextual Pronunciation –Identify words that have ambiguous pronunciation –Choose the right pronunciation OCR errors –Identify words that are mistakes (o-c, miso, misc) –Choose the right words

17 17 Final Project Ideas Blog analysis –Categorize blog topics (maybe including link analysis) –Segment blogs into pieces based on topics –Do blog author analysis –Summarize blog reaction to some event, e.g., what did people think of “An Inconvenient Truth” There is a contest on this: –http://www.icwsm.org/http://www.icwsm.org/

18 18 Final Project Ideas Create a Negativity/Emotion/Flame Recognizer –There is some related work, but this is somewhat under-explored –Emotions in email, blogs, facebook statuses…

19 19 Previous Final Project HomeSkim (2005) –Chan, Lib, Mittal, Poon –Apartment search mashup –Extracted fields from Craigslist listings –http://www.ischool.berkeley.edu/programs/masters/projects/2006/homes kimhttp://www.ischool.berkeley.edu/programs/masters/projects/2006/homes kim Orpheus (2004) –Maury, Viswanathan, Yang –Tool for discovering new and independent recording artists –Extracted artists, links, reviews from music websites –http://groups.sims.berkeley.edu/orpheus/demo/orpheus_demo.swfhttp://groups.sims.berkeley.edu/orpheus/demo/orpheus_demo.swf Breaking Story (2002) –Reffell, Fitzpatrick, Aydelott –Summarize trends in news feeds –Categories and entities assigned to all news articles –http://dream.sims.berkeley.edu/newshound/http://dream.sims.berkeley.edu/newshound/

20 20

21 21

22 22 HomeSkim Craigslist Analysis

23 23

24 24

25 25

26 26


Download ppt "I256 Applied Natural Language Processing Fall 2009 Lecture 12 Projects Barbara Rosario."

Similar presentations


Ads by Google