Download presentation
Presentation is loading. Please wait.
Published byRebecca Goodwin Modified over 9 years ago
1
Lecture #32 WWW Search
2
Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name, address, & phone number, or a car ’ s year, make, & model)
3
Review: Data Organization Three ways to find things: –Lists (in-order search, binary search) –Trees (balance number of branches with time to decide which is correct branch) –Search
4
WWW Search
5
Search issues How do we say what we want? –I want a story about pigs –I want a picture of a rooster –How many televisions were sold in Vietnam during 2000? –Find a movie like this one How does the computer find what we said?
6
Things to search for Records Text Images Audio Video
7
Records Car –Price –Miles –Year –Make –Doors Queries Price < 6000 & Miles<100000 Make == Toyota & Year > 1993
8
Queries Make == Toyota & Year >1993
9
Queries Make == Toyota & Year >1993
10
Queries Year >1993 or Price < $3,000
11
Queries Year >1993 or Price < $3,000
12
Databases Large collections of records Accessed by queries
13
Things to search for Records Text Images Audio Video
14
Text searching How do I say what I want? –Type some phrase I want a story about pigs How will the computer match this? –What is text? An array of characters –What can can a computer do with text? Match characters
15
Text searching People think in words not characters How do I convert an array of characters into an array of words? –Collect together sequences of letters –How do I know if character C is a letter? C>= “ a ” & C = “ A ” & C<= “ Z ”
16
Convert to words Because people think in words
17
Every document is an array of words I want a story about pigs How will I find the right documents? –Find all documents that have the word “ pigs ”
18
Searching text How will I find pigs fast? –Create an index of all words With each word store the name or address of each document that contains that word –Search the index for “ pigs ” Return the list of documents Use a binary search on the word list (50,000 words)
19
Problems What if a document has the word “ Pig ” but not “ pigs ” ? Normalize –Case - make all words lower case Pig -> pig –Stemming - remove all suffixes and prefixes before putting a word into the index pigs -> pig piggy -> pig
20
Problems I want a story about pigs? –How does the computer know to search for pigs? It doesn ’ t –How does the computer know what a story is? It doesn ’ t
21
Searching I want a story about pigs Pick out the important words and search for them –Which words are important? –D = number of times a word appears in a document –A = average number of times a word appears in all documents –Importance = D/A Why?
22
How do we create an index of all documents on the Web? Try = a list of URLs Seen = all URLs you have seen While (Try is not empty) {Page = take a URL from Try Words = all the “ important ” words in Page add Page to the index using all of Words Links = all URLs in Page for every Link that is not in Seen add Link to Try and to Seen }
23
Other ways to find important words and important documents A Document is important if many other documents point to it A word is important in document D if that word occurs frequently in documents that link to document D.
24
Images What will I say when searching for an image? –I want a rooster picture –Draw a picture of a rooster?
25
Search by picture? ? Is this possible? If so, how?
26
What ’ s in a picture? Computers don ’ t understand the contents of images To a computer an image is a bunch of colored pixels
27
I want a picture of a rooster Label all of the pictures How does Google Images do it? –File name of the picture “ rooster-crossingSt.jpg ” –Words around the picture in the HTML Use “Safe Search” and set filters appropriately (http://www.youtube.com/watch?v=maWx-ApkBCs)http://www.youtube.com/watch?v=maWx-ApkBCs
28
Audio Talking –Use speech recognition to convert audio to text –With each recognized word keep track of where in the audio it was recognized. Build an index using the recognized text –Normalize based on how words sound rather than are spelled.
29
Video Where in “ Casablanca ” does Bogart say “ Play it again Sam ” ? –he never does, he just says “ play it ” How can the computer find that? –Transcribe the audio –Speech recognition on the audio
30
Video Does Woody ever kiss Bo Peep? Exactly what color is a kiss?
31
Video Does Woody ever kiss Bo Peep? Annotate every frame with who is in the frame and search for frames with both Woody and Bo Peep.
32
So what ’ s with this?
33
Or this?
34
Is Woody cheating?
35
Search Records –Queries = And Or Text –Normalized words (case, stemming, thesaurus) Images –Add words Audio –Transcribe or recognize as words Video –Transcribe –Annotate
36
“Re-Search” Directions in Image Recognition, Search and Retrieval
37
From R. Szeliski, Computer Vision Algorithms and Application, Course Notes CSE 576, U. Washington Face Detection – Viola & Jones Face Detection In Commercial Digital Cameras Train on -1000’s of faces -Millions of non-faces
38
Face Recognition ( Eigenfaces [Turk and Pentland 1991]) N N N 2 071250682104412853 Project image into higher- dimensional space “Recognize” by grouping unknown image with closest training example
39
Face Recognition ( Picasa - Google) Image search/organization Automatically finds, crops and groups images of the same person from a collection of photos Allows user feedback (trainable) - user can indicate if it found the wrong person.
40
From R. Szeliski, Computer Vision Algorithms and Application, Course Notes CSE 576, U. Washington Create visual “words” from image features. Face/Object Recognition/Search: Feature-Based Technology Object Bag of “words”* Extract Features *Li Fei-Fei (Princeton)
41
From R. Szeliski, Computer Vision Algorithms and Application, Course Notes CSE 576, U. Washington Do this for multiple objects Face/Object Recognition/Search: Feature-Based Technology *Li Fei-Fei (Princeton)
42
From R. Szeliski, Computer Vision Algorithms and Applications, p. 605 How to get matching images/documents?: Use “word” frequencies = where n id = # times word i occurs in document d n d = total # words in document d Then combine word frequency with inverse document frequency weighting to downweight words that occur frequently (D = # of occurrences; A = average # of occurrences) Face/Object Recognition/Search: Bag of Words
43
From R. Szeliski, Computer Vision Algorithms and Application, Course Notes CSE 576, U. Washington Drop word features through a “vocabulary tree” to classify Face/Object Recognition/Search: Feature-Based Technology *Li Fei-Fei (Princeton)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.