Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison.

Slides:



Advertisements
Similar presentations
Two-dimensional pattern matching M.G.W.H. van de Rijdt 23 August 2005.
Advertisements

RGB-D object recognition and localization with clutter and occlusions Federico Tombari, Samuele Salti, Luigi Di Stefano Computer Vision Lab – University.
Finger print classification. What is a fingerprint? Finger skin is made of friction ridges, with pores (sweat glands). Friction ridges are created during.
Identifying Image Spam Authorship with a Variable Bin-width Histogram-based Projective Clustering Song Gao, Chengcui Zhang, Wei Bang Chen Department of.
Personal Name Classification in Web queries Dou Shen*, Toby Walker*, Zijian Zheng*, Qiang Yang**, Ying Li* *Microsoft Corporation ** Hong Kong University.
Multilingual Text Retrieval Applications of Multilingual Text Retrieval W. Bruce Croft, John Broglio and Hideo Fujii Computer Science Department University.
A shot at Netflix Challenge Hybrid Recommendation System Priyank Chodisetti.
CSE 291 Seminar Presentation Andrew Cosand ECE CVRR
Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.
Image Segmentation A Hybrid Method Using Clustering & Region Growing
Toward Automatic Processing and Indexing of Microfilm.
KNOWLEDGE MIS-MANAGEMENT USF The University of Sigmund Freud.
Chinese Character Recognition for Video Presented by: Vincent Cheung Date: 25 October 1999.
Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition Author: Andrew Borthwick John Sterling Eugene Agichtein Ralph Grishman.
Text Classification: An Implementation Project Prerak Sanghvi Computer Science and Engineering Department State University of New York at Buffalo.
Presented by Zeehasham Rasheed
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
APPLICATION OF K-MEANS CLUSTERING The Matlab function “kmeans()” was used for clustering The parameters to the function were : 1. The matrix of entire.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Feature extraction Feature extraction involves finding features of the segmented image. Usually performed on a binary image produced from.
Brief overview of ideas In this introductory lecture I will show short explanations of basic image processing methods In next lectures we will go into.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
CS 376b Introduction to Computer Vision 04 / 29 / 2008 Instructor: Michael Eckmann.
Multimedia Databases (MMDB)
Pattern Recognition & Machine Learning Debrup Chakraborty
Chinese Firewall Update Leif Guillermo, Veronika Strnadova This material is based upon work supported by the National Science Foundation under Grant No.
Teuvo Kohonen Dr. Eng., Emeritus Professor of the Academy of Finland; Academician Since the 1960s, Professor Kohonen has introduced several new concepts.
S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Texture scale and image segmentation using wavelet filters Stability of the features Through the study of stability of the eigenvectors and the eigenvalues.
Expressing Implicit Semantic Relations without Supervision ACL 2006.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton.
Face Recognition: An Introduction
NYU: Description of the Proteus/PET System as Used for MUC-7 ST Roman Yangarber & Ralph Grishman Presented by Jinying Chen 10/04/2002.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Reporter: 資訊所 P Yung-Chih Cheng ( 鄭詠之 ).  Introduction  Data Collection  System Architecture  Feature Extraction  Recognition Methods  Results.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Domain Adaptation for Biomedical Information Extraction Jing Jiang BeeSpace Seminar Oct 17, 2007.
Performance Comparison of Speaker and Emotion Recognition
V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Content-Based MP3 Information Retrieval Chueh-Chih Liu Department of Accounting Information Systems Chihlee Institute of Technology 2005/06/16.
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
CS315 Multimedia Search and Retrieval. Types of Multimedia Multimedia: Beyond text communication Stored in a variety of formats Audio Today most popular.
Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before.
CSSE463: Image Recognition Day 25 Today: introduction to object recognition: template matching Today: introduction to object recognition: template matching.
Morphological Image Processing
Tools for Tracking Keyword-based Censorship: Character Comparison This material is based upon work supported by the National Science Foundation under Grant.
Named Entity Extraction: A tool for tracking Internet censorship Tony Espinoza, Leif A Guillermo, Ronnie Garduño, Veronika Strnadova, Jed Crandall This.
GCSE Computing – Topic 4 Lesson 1 – Units ASPIRE TO: describe why computers store data in the format they do CHALLENGE TO: define the different the types.
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
MAV Optical Navigation
Car License Plate Recognition
CSSE463: Image Recognition Day 25
Local Feature Extraction Using Scale-Space Decomposition
Tractable MAP Problems
Binocular Stereo Vision
Dependency Model Using Posterior Context
Singular Value Decompsition The Chinese University of Hong Kong
Scale-Space Representation for Matching of 3D Models
Source: Pattern Recognition Vol. 38, May, 2005, pp
CSSE463: Image Recognition Day 25
Text Mining Application Programming Chapter 9 Text Categorization
Presentation transcript:

Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison with Singular Value Decomposition

Chinese Firewall We want to monitor what words/phrases are being censored in China We find out which words are being filtered by probing the ”Chinese Firewall” with words that are likely to be censored  Our main problem is finding the words that are likely to be censored Challenge: Chinese characters are not like English letters, we are dealing with Chinese text Ex: 馬

Maximum Entropy Used for Named Entity Extraction Ex: ”Chinese government passes new law” [Beginning of Named Entity][End of Named Entity] [other] [other] [Unique Named Entity]  Build a model from a training set: our training set is the Chinese Wikipedia  Training set needs to have a specific format: Assign each word a set of features Label each word as a [unique named entity], [other], etc...  Using Maximum Entropy, we can assign a probability P(named entity) to new words based on features describing those words

Once we extract named entities from news sources, we can test whether new words are added to the ”blacklist” Problem:  Chinese text that is similar, but not exactly, the keyword we want to test  Ex: 法轮功法十轮十功

Feature Correspondence by Singular Value Decomposition Point Features 1:1 mapping SVD Given the point features in two images I and J, build a proximity matrix G: G(ij) = exp(-r(ij)/2 σ^2)‏ SVD of G => G = TDU' P = TEU' If P(ij) determines whether I(i) maps to J(j)‏

Current Status We are almost done labeling Chinese Wikipedia to use as a training set for our maximum entropy program Chinese character images Point feature extraction

(Near) Future Work Finish and test our maximum entropy model Point feature extraction Ideas: Zip files, Relaxation-based pattern matching, Segmentation

Questions? Longuet-Higgins H. Christopher and Scott, Guy L. (1991). An Algorithm for Associating the Features of Two Images. Proc. R. Soc. Lond. B 244, doi: /rspb Pilu, Maurizio. (1997) Uncalibrated Stereo Correspondence by Singular Value Decomposition. HP Laboratories Bristol, Digital Media Department, HPL , August 1997 Nagasaki, Takeshi, Yanagida, Tadashi, Nakagawa, Masaki. () Relaxation- Based Pattern Matching Using Automatic Differentiation for Off-line Character Recognition Borthwick, Andrew. Sterling, John. Agichten, Eugene. Grishman, Ralph. () Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition. New York University.