Presentation is loading. Please wait.

Presentation is loading. Please wait.

Search. Search issues How do we say what we want? –I want a story about pigs –I want a picture of a rooster –How many televisions were sold in Vietnam.

Similar presentations


Presentation on theme: "Search. Search issues How do we say what we want? –I want a story about pigs –I want a picture of a rooster –How many televisions were sold in Vietnam."— Presentation transcript:

1 Search

2 Search issues How do we say what we want? –I want a story about pigs –I want a picture of a rooster –How many televisions were sold in Vietnam during 2000? –Find a movie like this one How does the computer find what we said?

3 Things to search for Records Text Images Audio Video

4 Records Car –Price: $5,000 –Miles: 20,000 –Year: 1994 –Make: Toyota –Doors: 2 Queries Price < 6000 & Miles<100000 Make == Toyota & Year > 1993

5 Queries Make == Toyota & Year >1993

6 Queries Make == Toyota & Year >1993

7 Queries Year >1993 or Price < $3,000

8 Queries Year >1993 or Price < $3,000

9 Databases Large collections of records Accessed by queries

10 Things to search for Records  Text Images Audio Video

11 Text searching How do I say what I want? –Type some phrase I want a story about pigs How will the computer match this? –What is text? An array of characters –What can can a computer do with text? Match characters

12 Text searching People think in words not characters How do I convert an array of characters into an array of words? –Collect together sequences of letters –How do I know if character C is a letter? C>=“a” & C =“A” & C<=“Z”

13 Convert to words Because people think in words

14 Every document is an array of words I want a story about pigs How will I find the right documents? –Find all documents that have the word “pigs”

15 Searching text How will I find pigs fast? –Hint: the “URL Lookup” assignment –Create an index of all words With each word store the name or address of each document that contains that word –Search the index for “pigs” Return the list of documents Use a binary search on the word list (50,000 words)

16 Problems What if a document has the word “Pig” but not “pigs”? Normalize –Case - make all words lower case Pig -> pig –Stemming - remove all suffixes and prefixes before putting a word into the index pigs -> pig piggy -> pig

17 Problems I want a story about pigs? –How does the computer know to search for pigs? It doesn’t –How does the computer know what a story is? It doesn’t

18 Searching I want a story about pigs Pick out the important words and search for them –Which words are important? –D = number of times a word appears in a document –A = average number of times a word appears in all documents –Importance = D/A Why?

19 How do we create an index of all documents on the Web? Try = a list of URLs Seen = all URLs from Seen While (Try is not empty) {Page = take a URL from Try Words = all the “important” words in Page add Page to the index using all of Words Links = all URLs in Page for every Link that is not in Seen add Link to Try and to Seen }

20 Other ways to find important words and important documents A Document is important if many other documents point to it A word is important in document D if that word occurs frequently in documents that link to document D.

21 Images What will I say when searching for an image? –I want a rooster picture –Draw a picture of a rooster?

22 Search by picture? ?

23 What’s in a picture? Computers don’t understand the contents of images To a computer an image is an array of colors

24 I want a picture of a rooster Label all of the pictures How does Google do it? –File name of the picture “rooster-crossingSt.jpg” –Words around the picture in the HTML

25 Audio Talking –Use speech recognition to convert audio to text –With each recognized word keep track of where in the audio it was recognized. Build an index using the recognized text –Normalize based on how words sound rather than are spelled.

26 Video Where in “Casablanca” does Bogart say “Play it again Sam” ? –he never does, he just says “play it” How can the computer find that? –Transcribe the audio –Speech recognition on the audio

27 Video Does Woody ever kiss Bo Peep? Exactly what color is a kiss?

28 Video Does Woody ever kiss Bo Peep? Annotate every frame with who is in the frame and search for frames with both Woody and Bo Peep.

29 So what’s with this?

30 Or this?

31 Is Woody cheating?

32 Search Records –Queries = And Or Text –Normalized words (case, stemming, thesaurus) Images –Add words Audio –Transcribe or recognize as words Video –Transcribe –Annotate


Download ppt "Search. Search issues How do we say what we want? –I want a story about pigs –I want a picture of a rooster –How many televisions were sold in Vietnam."

Similar presentations


Ads by Google