Download presentation
Published byEsther Simon Modified over 9 years ago
1
CS 124/LINGUIST 180 From Languages to Information
Dan Jurafsky Stanford University Introduction and Course Overview
2
What this course is about
Automatically extracting meaning and structure from: Natural language text Speech Web pages Social networks (and other networks) Genome sequences
3
Commercial World Lots of exciting stuff going on…
4
Question Answering: IBM’s Watson
5
Information Extraction and Sentiment Analysis
Sentiment analysis Attribute detection Relation extraction
6
Sentiment Emotional Spell Check New York Times “10 big ideas of 2010”
7
Blog Analytics Data-mining of blogs, discussion forums, message boards, user groups, and other forms of user generated media Product marketing information Political opinion tracking Social network analysis Buzz analysis (what’s hot, what topics are people talking about right now).
8
Livejournal.com: I, me, my on or after Sep 11, 2001
Cohn, Mehl, Pennebaker Linguistic markers of psychological change surrounding September 11, Psychological Science 15, 10: Graph from Pennebaker slides
9
September 11 LiveJournal.com study: We, us, our
Cohn, Mehl, Pennebaker Linguistic markers of psychological change surrounding September 11, Psychological Science 15, 10: Graph from Pennebaker slides
10
Machine Translation Helping human translators Fully automatic
Enter Source Text: 这 不过 是 一 个 时间 的 问题 . Translation from Stanford’s Phrasal: This is only a matter of time.
11
Google Translate Fried ripe plantains:
12
Information Extraction
Event: Curriculum mtg Date: Jan Start: 10:00am End: 11:30am Where: Gates 159 Subject: curriculum meeting Date: January 15, 2012 To: Dan Jurafsky Hi Dan, we’ve now scheduled the curriculum meeting. It will be in Gates 159 tomorrow from 10:00-11:30. -Chris Create new Calendar entry
13
Computational Biology: Finding Genes
Start codon ATG 5’ 3’ Exon 1 Exon 2 Exon 3 Intron 1 Intron 2 Stop codon TAG/TGA/TAA Splice sites Pictures from Serafim Batzoglou
14
Computational Biology: Comparing Sequences
AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- | | | | | | | | | | | | | x | | | | | | | | | | | TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Sequence comparison is key to Finding genes Determining function Uncovering the evolutionary processes Slide stuff from Serafim Batzoglou
15
Ambiguity Resolving ambiguity is a crucial goal throughout string and language processing
16
Ambiguity Find at least 5 meanings of this sentence: I made her duck
17
Ambiguity Find at least 5 meanings of this sentence: I made her duck
I cooked waterfowl for her benefit (to eat) I cooked waterfowl belonging to her I created the (plaster?) waterfowl she owns I caused her to quickly lower her head or body I waved my magic wand and turned her into undifferentiated waterfowl
18
Ambiguity is Pervasive
I caused her to quickly lower her head or body Syntactic category: “duck” can be a Noun or Verb I cooked waterfowl belonging to her. Syntactic category: “her” can be a possessive (“of her”) or dative (“for her”) pronoun I made the (plaster) duck statue she owns Word Meaning : “make” can mean “create” or “cook”
19
Ambiguity is Pervasive
Grammar: make can be: Transitive: (verb has a noun direct object) I cooked [waterfowl belonging to her] Ditransitive: (verb has 2 noun objects) I made [her] (into) [undifferentiated waterfowl] Action-transitive (verb has a direct object + verb) I caused [her] [to move her body]
20
Ambiguity is Pervasive: Phonetics!!!!!
I mate or duck I’m eight or duck Eye maid; her duck Aye mate, her duck I maid her duck I’m aid her duck I mate her duck I’m ate her duck I’m ate or duck
21
Why else is natural language understanding difficult?
non-standard English segmentation issues idioms Great Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥ dark horse get cold feet lose face throw in the towel the New York-New Haven Railroad neologisms world knowledge tricky entity names unfriend Retweet bromance Mary and Sue are sisters. Mary and Sue are mothers. Where is A Bug’s Life playing … Let It Be was recorded … … a mutation on the for gene … But that’s what makes it fun!
22
Making progress on this problem…
The task is difficult! What tools do we need? Knowledge about language Knowledge about the world A way to combine knowledge sources How we generally do this: probabilistic models built from language data P(“maison” “house”) high P(“L’avocat général” “the general avocado”) low Luckily, rough text features can often do half the job.
23
Models Finite state machines Markov models Alignment models
Genome alignment Alignment of sentence in L1 to sentence in L2 Alignment of text to speech Vector space model of IR Network models
24
Dynamic Programming Minimum Edit Distance The Viterbi Algorithm
Don’t do the same work over and over. Avoid this by building and making use of solutions to sub-problems that must be invariant across all parts of the space. Minimum Edit Distance The Viterbi Algorithm Baum-Welch/Forward-Backward (In parsing: CKY, Earley, charts, etc)
25
Machine Learning Machine learning based classifiers that are trained to make decisions based on features extracted from the context Simple Classifiers: Naïve Bayes Decision Trees Sequence Models: Hidden Markov Models Maximum Entropy Markov Models Conditional Random Fields
26
Course logistics in brief
Instructor: Dan Jurafsky TAs: Leon Lin, Robin Melnick, Evan Rosen, Alden Timme, Adam Vogel Time: TuTh 9:30-10:45, Braunlec Requirements: Online Video Lectures with embedded quizzes Homeworks: In Java or Python Online Review Exercises Final Exam Class sessions: Tuesdays: Discussions/Guest Lectures Thursdays: Open group working hours
27
Overview of the course
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.