Identifying Image Spam Authorship with a Variable Bin-width Histogram-based Projective Clustering Song Gao, Chengcui Zhang, Wei Bang Chen Department of.

Slides:

Advertisements

Similar presentations

Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.

Advertisements

Applications of one-class classification

CVPR2013 Poster Modeling Actions through State Changes.

A Unified Framework for Context Assisted Face Clustering

QR Code Recognition Based On Image Processing

DREAM PLAN IDEA IMPLEMENTATION Introduction to Image Processing Dr. Kourosh Kiani

July 27, 2002 Image Processing for K.R. Precision1 Image Processing Training Lecture 1 by Suthep Madarasmi, Ph.D. Assistant Professor Department of Computer.

電腦視覺 Computer and Robot Vision I Chapter2: Binary Machine Vision: Thresholding and Segmentation Instructor: Shih-Shinh Huang 1.

Computer Vision Lecture 16: Region Representation

A Low-cost Attack on a Microsoft CAPTCHA Yan Qiang,

Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)

Human-Computer Interaction Human-Computer Interaction Segmentation Hanyang University Jong-Il Park.

Quadtrees, Octrees and their Applications in Digital Image Processing

Mining for High Complexity Regions Using Entropy and Box Counting Dimension Quad-Trees Rosanne Vetro, Wei Ding, Dan A. Simovici Computer Science Department.

São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.

A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.

Text Detection in Video Min Cai Background  Video OCR: Text detection, extraction and recognition  Detection Target: Artificial text  Text.

Quaternion Colour Constancy

Segmentation Divide the image into segments. Each segment:

LYU0203 Smart Traveller with Visual Translator for OCR and Face Recognition Supervised by Prof. LYU, Rung Tsong Michael Prepared by: Wong Chi Hang Tsang.

Quadtrees, Octrees and their Applications in Digital Image Processing

A Wrapper-Based Approach to Image Segmentation and Classification Michael E. Farmer, Member, IEEE, and Anil K. Jain, Fellow, IEEE.

Chinese Character Recognition for Video Presented by: Vincent Cheung Date: 25 October 1999.

A Probabilistic Framework For Segmentation And Tracking Of Multiple Non Rigid Objects For Video Surveillance Aleksandar Ivanovic, Tomas S. Huang ICIP 2004.

Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?

A Real-Time for Classification of Moving Objects

A Novel 2D To 3D Image Technique Based On Object- Oriented Conversion.

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

California Car License Plate Recognition System ZhengHui Hu Advisor: Dr. Kang.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.

Navigating and Browsing 3D Models in 3DLIB Hesham Anan, Kurt Maly, Mohammad Zubair Computer Science Dept. Old Dominion University, Norfolk, VA, (anan,

Computer vision.

Computer Vision James Hays, Brown

1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.

Presented by Tienwei Tsai July, 2005

Computer Vision Lecture 5. Clustering: Why and How.

S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.

CSE 185 Introduction to Computer Vision Pattern Recognition 2.

Combined Central and Subspace Clustering for Computer Vision Applications Le Lu 1 René Vidal 2 1 Computer Science Department, Johns Hopkins University,

COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.

MAFIA: Adaptive Grids for Clustering Massive Data Sets Harsha Nagesh, Sanjay Goil, Alok Choudhury -Udeepta Bordoloi.

EDGE DETECTION USING MINMAX MEASURES SOUNDARARAJAN EZEKIEL Matthew Lang Department of Computer Science Indiana University of Pennsylvania Indiana, PA.

November 13, 2014Computer Vision Lecture 17: Object Recognition I 1 Today we will move on to… Object Recognition.

MedIX – Summer 07 Lucia Dettori (room 745)

COMP322/S2000/L171 Robot Vision System Major Phases in Robot Vision Systems: A. Data (image) acquisition –Illumination, i.e. lighting consideration –Lenses,

A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )

Evaluation of Image Segmentation algorithms By Dr. Rajeev Srivastava.

Presented by David Lee 3/20/2006

Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.

Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.

1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.

NLP&CC 2012 报告人：许灿辉单位：北京大学计算机科学技术研究所 Integration of Text Information and Graphic Composite for PDF Document Analysis 基于复合图文整合的 PDF 文档分析 Integration of.

An intelligent strategy for checking the annual inspection status of motorcycles based on license plate recognition Yo-Ping Huang a, Chien-Hung Chen b,

Optical Character Recognition

Student Gesture Recognition System in Classroom 2.0 Chiung-Yao Fang, Min-Han Kuo, Greg-C Lee, and Sei-Wang Chen Department of Computer Science and Information.

Presented by David Lee 3/20/2006

Mean Shift Segmentation

Introduction Computer vision is the analysis of digital images

A segmentation and tracking algorithm

Computer Vision Lecture 16: Texture II

Fall 2012 Longin Jan Latecki

Scale-Space Representation of 3D Models and Topological Matching

Object Recognition Today we will move on to… April 12, 2018

Wavelet-based texture analysis and segmentation

Presentation transcript:

Identifying Image Spam Authorship with a Variable Bin-width Histogram-based Projective Clustering Song Gao, Chengcui Zhang, Wei Bang Chen Department of Computer and Information Sciences The University of Alabama at Birmingham {gaos, zhang, In this paper we present a two-phase spam image clustering framework. The proposed framework performs a histogram based projective clustering on visual features in the first phase, followed by a text-based clustering in the second phase. There are several contributions in this study. First, we address the complex nature of spam image obfuscation techniques. Second, a multi-clue framework is developed to profile spam images of common spamming sources which provide evidence for tracking spam gangs. Third, projective clustering eliminates the need to choose among distance metrics for clustering analysis, while systematically exploring subspaces that correspond to clusters. 1.Introduction and Motivation “Image spam is a kind of spam where the message text of the spam is presented as a picture in an image file” – Wikipedia. Occurrence rate of spam image in all spam s is more than 30% in Look similar, but essentially not! Wavy images – failed to be detected by text recognition algorithm, such as optical character recognition (OCR). Challenges: Current state of anti-spam. The filtering techniques, such as text classification and image classification. Disadvantages: CANNOT tell the origins of spam. Goal: Provide scientific evidence to the origins of spam. Assist in tracking down the common sources of the spam based on spam image clustering. Group 1 Group 2Group 3 2.Multi-clue Framework A histogram-based clustering framework: 1. Image preprocessing a. Wavy correction. b. Spam image segmentation – foreground and background. 2. Feature extraction Color features: 6-bit color-code histogram. Texture features: histogram of gradient direction with each bin representing k degrees among 360 degrees. Layout features: proportion of the foreground object pixels in each 9-grid cell. Text contents: recognized by performing OCR. 3. Two-phase clustering a. Histogram-based projective clustering on visual features. b. Text-based clustering on extracted text information. To extract the embedded texts from wavy images, correction needs to be done by realigning each vertical line to its correct position. Two perceivable approaches are proposed to find the guideline based on which realignment can be done: 3.Wavy Image Correction Edge 4.Projective Clustering O1O1 * Signature: O 1 [ ] * * * A histogram-based projective clustering algorithm REVBH (Relative Entropy on Variable Bin-width Histogram): 1.Constructing a variable bin width histogram for each k-dimensional subspace. (e.g. k=2) 2.Detecting dense areas iteratively in each histogram by using our proposed density threshold. 3.Converting each object into a signature that describes how that data object is projected into different subspaces. 4.Merging similar object signature entries. 5.Assigning data objects to corresponding clusters. Partition on one dimension by using original histogram and equalized histogram. The bin-width of each sub-range along one dimension is determined by using Freedman and Diaconis’s rule or Scott’s rule: h = max{2×IQR×n -1/3, 3.5 × σ × n -1/3 } Dense bins are detected in terms of relative entropy metric: h r_low (x) ≤ (1/T)H r (x) ≤ h r_high (x) h r (x) and H r (X) represents the relative entropy of a single bin and its corresponding k- dimensional histogram: Color Edge-based method: Curve lines that are originally horizontal lines in the undistorted image are served as a guideline for image correction. Color-matching method: This approach finds the best color match of two adjacent vertical lines by fixing one line and slightly shifting the other line upward or downward. 5.Experimental Results 2100 spam images including 37 wavy images. 476 classes labeled manually. All feature values are normalized into z-score. Clustering results are evaluated by V-measure and the number of produced clusters. Dataset V1-measureCluster #(class #: 476) With corrected wavy images With original wavy images Effectiveness of wavy image correction Performance comparison between proposed approach and hierarchical clustering a)Original imageb)Foreground mask after segmentation c)Resized illustration mask for layout feature extraction 1.a 1.b 2 3.a 3.b