Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 124/LINGUIST 180 From Languages to Information

Similar presentations


Presentation on theme: "CS 124/LINGUIST 180 From Languages to Information"— Presentation transcript:

1 CS 124/LINGUIST 180 From Languages to Information
Dan Jurafsky Stanford University Introduction and Course Overview

2 What this course is about
Automatically extracting meaning and structure from: Natural language text Speech Web pages Social networks (and other networks) Genome sequences

3 Commercial World Lots of exciting stuff going on…

4 Question Answering: IBM’s Watson

5 Information Extraction and Sentiment Analysis
Sentiment analysis Attribute detection Relation extraction

6 Sentiment Emotional Spell Check New York Times “10 big ideas of 2010”

7 Blog Analytics Data-mining of blogs, discussion forums, message boards, user groups, and other forms of user generated media Product marketing information Political opinion tracking Social network analysis Buzz analysis (what’s hot, what topics are people talking about right now).

8 Livejournal.com: I, me, my on or after Sep 11, 2001
Cohn, Mehl, Pennebaker Linguistic markers of psychological change surrounding September 11, Psychological Science 15, 10: Graph from Pennebaker slides

9 September 11 LiveJournal.com study: We, us, our
Cohn, Mehl, Pennebaker Linguistic markers of psychological change surrounding September 11, Psychological Science 15, 10: Graph from Pennebaker slides

10 Machine Translation Helping human translators Fully automatic
Enter Source Text:  这 不过 是 一 个 时间 的 问题 . Translation from Stanford’s Phrasal: This is only a matter of time.

11 Google Translate Fried ripe plantains:

12 Information Extraction
Event: Curriculum mtg Date: Jan Start: 10:00am End: 11:30am Where: Gates 159 Subject: curriculum meeting Date: January 15, 2012 To: Dan Jurafsky Hi Dan, we’ve now scheduled the curriculum meeting. It will be in Gates 159 tomorrow from 10:00-11:30. -Chris Create new Calendar entry

13 Computational Biology: Finding Genes
Start codon ATG 5’ 3’ Exon 1 Exon 2 Exon 3 Intron 1 Intron 2 Stop codon TAG/TGA/TAA Splice sites Pictures from Serafim Batzoglou

14 Computational Biology: Comparing Sequences
AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- | | | | | | | | | | | | | x | | | | | | | | | | | TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Sequence comparison is key to Finding genes Determining function Uncovering the evolutionary processes Slide stuff from Serafim Batzoglou

15 Ambiguity Resolving ambiguity is a crucial goal throughout string and language processing

16 Ambiguity Find at least 5 meanings of this sentence: I made her duck

17 Ambiguity Find at least 5 meanings of this sentence: I made her duck
I cooked waterfowl for her benefit (to eat) I cooked waterfowl belonging to her I created the (plaster?) waterfowl she owns I caused her to quickly lower her head or body I waved my magic wand and turned her into undifferentiated waterfowl

18 Ambiguity is Pervasive
I caused her to quickly lower her head or body Syntactic category: “duck” can be a Noun or Verb I cooked waterfowl belonging to her. Syntactic category: “her” can be a possessive (“of her”) or dative (“for her”) pronoun I made the (plaster) duck statue she owns Word Meaning : “make” can mean “create” or “cook”

19 Ambiguity is Pervasive
Grammar: make can be: Transitive: (verb has a noun direct object) I cooked [waterfowl belonging to her] Ditransitive: (verb has 2 noun objects) I made [her] (into) [undifferentiated waterfowl] Action-transitive (verb has a direct object + verb) I caused [her] [to move her body]

20 Ambiguity is Pervasive: Phonetics!!!!!
I mate or duck I’m eight or duck Eye maid; her duck Aye mate, her duck I maid her duck I’m aid her duck I mate her duck I’m ate her duck I’m ate or duck

21 Why else is natural language understanding difficult?
non-standard English segmentation issues idioms Great Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥ dark horse get cold feet lose face throw in the towel the New York-New Haven Railroad neologisms world knowledge tricky entity names unfriend Retweet bromance Mary and Sue are sisters. Mary and Sue are mothers. Where is A Bug’s Life playing … Let It Be was recorded … … a mutation on the for gene … But that’s what makes it fun!

22 Making progress on this problem…
The task is difficult! What tools do we need? Knowledge about language Knowledge about the world A way to combine knowledge sources How we generally do this: probabilistic models built from language data P(“maison”  “house”) high P(“L’avocat général”  “the general avocado”) low Luckily, rough text features can often do half the job.

23 Models Finite state machines Markov models Alignment models
Genome alignment Alignment of sentence in L1 to sentence in L2 Alignment of text to speech Vector space model of IR Network models

24 Dynamic Programming Minimum Edit Distance The Viterbi Algorithm
Don’t do the same work over and over. Avoid this by building and making use of solutions to sub-problems that must be invariant across all parts of the space. Minimum Edit Distance The Viterbi Algorithm Baum-Welch/Forward-Backward (In parsing: CKY, Earley, charts, etc)

25 Machine Learning Machine learning based classifiers that are trained to make decisions based on features extracted from the context Simple Classifiers: Naïve Bayes Decision Trees Sequence Models: Hidden Markov Models Maximum Entropy Markov Models Conditional Random Fields

26 Course logistics in brief
Instructor: Dan Jurafsky TAs: Leon Lin, Robin Melnick, Evan Rosen, Alden Timme, Adam Vogel Time: TuTh 9:30-10:45, Braunlec Requirements: Online Video Lectures with embedded quizzes Homeworks: In Java or Python Online Review Exercises Final Exam Class sessions: Tuesdays: Discussions/Guest Lectures Thursdays: Open group working hours

27 Overview of the course


Download ppt "CS 124/LINGUIST 180 From Languages to Information"

Similar presentations


Ads by Google