Download presentation
Presentation is loading. Please wait.
Published byCory McBride Modified over 9 years ago
1
UC Berkeley CS294-9 Fall 200012b- 1 Document Image Analysis Lecture 12b: Integrating other info Richard J. Fateman Henry S. Baird University of California – Berkeley Xerox Palo Alto Research Center
2
UC Berkeley CS294-9 Fall 200012b- 2 Srihari/Hull/Choudhari (1982): Merge sources Bottom-up refinement: transition probabilities at the character sequence level Top-down process based on searching in a lexicon Standard (now) presentation of usual methods –Viterbi algorithm and variations –Trie representation of dictionary
3
UC Berkeley CS294-9 Fall 200012b- 3 Tao Hong (1995)
4
UC Berkeley CS294-9 Fall 200012b- 4 Verifying recognition!
5
UC Berkeley CS294-9 Fall 200012b- 5 Lattice-based matchings…
6
UC Berkeley CS294-9 Fall 200012b- 6 Word collocation: the idea Given the choice [ripper, rover, river], you look at +/- ten words on each side. If you find “boat” then choose “river”. Useful for low ( 80% Not too useful for improving highly reliable recognition (may degrade)
7
UC Berkeley CS294-9 Fall 200012b- 7 Basis for collocation data Word collocation = mutual information ; P(x,y) is probability of x and y occurring within a given distance in a corpus. P(x) is probability of x occurring in the corpus, resp. P(y); (probability frequency). Measure this for a test corpus. In the target text, repeatedly re-rank based on top choices until no more changes occur.
8
UC Berkeley CS294-9 Fall 200012b- 8 Using Word Collocation via Relaxation Algorithm The sentence is “Please show me where Hong Kong is!”
9
UC Berkeley CS294-9 Fall 200012b- 9 Results on collocation
10
UC Berkeley CS294-9 Fall 200012b- 10 Lattice Parsing
11
UC Berkeley CS294-9 Fall 200012b- 11 Back to the flowchart…
12
UC Berkeley CS294-9 Fall 200012b- 12 Not very encouraging
13
UC Berkeley CS294-9 Fall 200012b- 13 Experimental results (Hong, 1995) Word types from Wordnet Home-grown parser Data from Wall St. Journal, other sources Perhaps 80% of sentences could be parsed, not all correctly Cost was substantial (minutes) to parse a sentence given the (various) choices of word identification.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.