Download presentation
Presentation is loading. Please wait.
Published byPaul Warren Modified over 9 years ago
1
Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison with Singular Value Decomposition
2
Chinese Firewall We want to monitor what words/phrases are being censored in China We find out which words are being filtered by probing the ”Chinese Firewall” with words that are likely to be censored Our main problem is finding the words that are likely to be censored Challenge: Chinese characters are not like English letters, we are dealing with Chinese text Ex: 馬
3
Maximum Entropy Used for Named Entity Extraction Ex: ”Chinese government passes new law” [Beginning of Named Entity][End of Named Entity] [other] [other] [Unique Named Entity] Build a model from a training set: our training set is the Chinese Wikipedia Training set needs to have a specific format: Assign each word a set of features Label each word as a [unique named entity], [other], etc... Using Maximum Entropy, we can assign a probability P(named entity) to new words based on features describing those words
4
Once we extract named entities from news sources, we can test whether new words are added to the ”blacklist” Problem: Chinese text that is similar, but not exactly, the keyword we want to test Ex: 法轮功法十轮十功
5
Feature Correspondence by Singular Value Decomposition Point Features 1:1 mapping SVD Given the point features in two images I and J, build a proximity matrix G: G(ij) = exp(-r(ij)/2 σ^2) SVD of G => G = TDU' P = TEU' If P(ij) determines whether I(i) maps to J(j)
6
Current Status We are almost done labeling Chinese Wikipedia to use as a training set for our maximum entropy program Chinese character images Point feature extraction
7
(Near) Future Work Finish and test our maximum entropy model Point feature extraction Ideas: Zip files, Relaxation-based pattern matching, Segmentation
8
Questions? Longuet-Higgins H. Christopher and Scott, Guy L. (1991). An Algorithm for Associating the Features of Two Images. Proc. R. Soc. Lond. B 244, 21-26. doi: 10.1098/rspb.1991.0045 Pilu, Maurizio. (1997) Uncalibrated Stereo Correspondence by Singular Value Decomposition. HP Laboratories Bristol, Digital Media Department, HPL -97-96, August 1997 Nagasaki, Takeshi, Yanagida, Tadashi, Nakagawa, Masaki. () Relaxation- Based Pattern Matching Using Automatic Differentiation for Off-line Character Recognition Borthwick, Andrew. Sterling, John. Agichten, Eugene. Grishman, Ralph. () Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition. New York University.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.