Presentation is loading. Please wait.

Presentation is loading. Please wait.

UC Berkeley CS294-9 Fall 200012b- 1 Document Image Analysis Lecture 12b: Integrating other info Richard J. Fateman Henry S. Baird University of California.

Similar presentations


Presentation on theme: "UC Berkeley CS294-9 Fall 200012b- 1 Document Image Analysis Lecture 12b: Integrating other info Richard J. Fateman Henry S. Baird University of California."— Presentation transcript:

1 UC Berkeley CS294-9 Fall 200012b- 1 Document Image Analysis Lecture 12b: Integrating other info Richard J. Fateman Henry S. Baird University of California – Berkeley Xerox Palo Alto Research Center

2 UC Berkeley CS294-9 Fall 200012b- 2 Srihari/Hull/Choudhari (1982): Merge sources Bottom-up refinement: transition probabilities at the character sequence level Top-down process based on searching in a lexicon Standard (now) presentation of usual methods –Viterbi algorithm and variations –Trie representation of dictionary

3 UC Berkeley CS294-9 Fall 200012b- 3 Tao Hong (1995)

4 UC Berkeley CS294-9 Fall 200012b- 4 Verifying recognition!

5 UC Berkeley CS294-9 Fall 200012b- 5 Lattice-based matchings…

6 UC Berkeley CS294-9 Fall 200012b- 6 Word collocation: the idea Given the choice [ripper, rover, river], you look at +/- ten words on each side. If you find “boat” then choose “river”. Useful for low ( 80% Not too useful for improving highly reliable recognition (may degrade)

7 UC Berkeley CS294-9 Fall 200012b- 7 Basis for collocation data Word collocation = mutual information ; P(x,y) is probability of x and y occurring within a given distance in a corpus. P(x) is probability of x occurring in the corpus, resp. P(y); (probability  frequency). Measure this for a test corpus. In the target text, repeatedly re-rank based on top choices until no more changes occur.

8 UC Berkeley CS294-9 Fall 200012b- 8 Using Word Collocation via Relaxation Algorithm The sentence is “Please show me where Hong Kong is!”

9 UC Berkeley CS294-9 Fall 200012b- 9 Results on collocation

10 UC Berkeley CS294-9 Fall 200012b- 10 Lattice Parsing

11 UC Berkeley CS294-9 Fall 200012b- 11 Back to the flowchart…

12 UC Berkeley CS294-9 Fall 200012b- 12 Not very encouraging

13 UC Berkeley CS294-9 Fall 200012b- 13 Experimental results (Hong, 1995) Word types from Wordnet Home-grown parser Data from Wall St. Journal, other sources Perhaps 80% of sentences could be parsed, not all correctly Cost was substantial (minutes) to parse a sentence given the (various) choices of word identification.


Download ppt "UC Berkeley CS294-9 Fall 200012b- 1 Document Image Analysis Lecture 12b: Integrating other info Richard J. Fateman Henry S. Baird University of California."

Similar presentations


Ads by Google