Detecting Sexually Provocative Images

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

A Unified Framework for Context Assisted Face Clustering
Model-based Image Interpretation with Application to Facial Expression Recognition Matthias Wimmer
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
UPM, Faculty of Computer Science & IT, A robust automated attendance system using face recognition techniques PhD proposal; May 2009 Gawed Nagi.
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Presented by Zeehasham Rasheed
Spatial Pyramid Pooling in Deep Convolutional
Speaker: Chi-Yu Hsu Advisor: Prof. Jian-Jung Ding Leveraging Stereopsis for Saliency Analysis, CVPR 2012.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
Thien Anh Dinh1, Tomi Silander1, Bolan Su1, Tianxia Gong
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
Hierarchical Annotation of Medical Images Ivica Dimitrovski 1, Dragi Kocev 2, Suzana Loškovska 1, Sašo Džeroski 2 1 Department of Computer Science, Faculty.
Object Recognition in Images Slides originally created by Bernd Heisele.
Access Control Via Face Recognition. Group Members  Thilanka Priyankara  Vimalaharan Paskarasundaram  Manosha Silva  Dinusha Perera.
Image Classification for Automatic Annotation
Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Multi-view Traffic Sign Detection, Recognition and 3D Localisation Radu Timofte, Karel Zimmermann, and Luc Van Gool.
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.
Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Recent developments in object detection
Data Mining for Surveillance Applications Suspicious Event Detection
Automatically Labeled Data Generation for Large Scale Event Extraction
Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images
The Relationship between Deep Learning and Brain Function
Guillaume-Alexandre Bilodeau
Compact Bilinear Pooling
AHED Automatic Human Emotion Detection
Data Driven Attributes for Action Detection
Saliency-guided Video Classification via Adaptively weighted learning
Personalized Social Image Recommendation
Hybrid Features based Gender Classification
Object detection as supervised classification
Thesis Advisor : Prof C.V. Jawahar
Pearson Lanka (Pvt) Ltd.
State-of-the-art face recognition systems
A Convolutional Neural Network Cascade For Face Detection
Advanced Techniques for Automatic Web Filtering
Tremor Detection Using Motion Filtering and SVM Bilge Soran, Jenq-Neng Hwang, Linda Shapiro, ICPR, /16/2018.
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
Data Mining for Surveillance Applications Suspicious Event Detection
A User Attention Based Visible Watermarking Scheme
Categorization by Learning and Combing Object Parts
Advanced Techniques for Automatic Web Filtering
AHED Automatic Human Emotion Detection
iSRD Spam Review Detection with Imbalanced Data Distributions
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Multimedia Information Retrieval
Ying Dai Faculty of software and information science,
Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu
John H.L. Hansen & Taufiq Al Babba Hasan
AHED Automatic Human Emotion Detection
Data Mining for Surveillance Applications Suspicious Event Detection
AHED Automatic Human Emotion Detection
Human-object interaction
Emotion Detection Using Facial Recognition With OpenCV
Image Processing and Multi-domain Translation
Unrolling the shutter: CNN to correct motion distortions
Privacy-Aware Tag Recommendation for Image Sharing
THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU
Presentation transcript:

Detecting Sexually Provocative Images Debashis Ganguly, Mohammad H. Mofrad, Adriana Kovashka Department of Computer Science, University of Pittsburgh Hi, I am Debashis Ganguly from the University of Pittsburgh. This is a work with my colleague Mohammad and professor Kovashka.

Real Life Challenges Overwhelming amount of visual data on the Internet Parents may want to restrict the visual contents which their children can see. Lots of manual effort is invested by digital content administrators to classify images in age restricted categories. Abundance of visual content available on the Internet, and the easy access to such contents by all users pose a lot of challenges to the content management process. For example, if a parent wants to restrict what his or her child can see, the web contents need to be tagged as offensive or not. Currently, automatic filtering is far from perfect and often allows offensive imagery in the “safe zone” results. Thus, it takes a lot of manual effort to classify images under age restricted categories by the web content owners.

Limitations of Existing Approaches Existing approaches detect pornographic contents based on percentage of skin area exposed by the subjects in such images. Skin detection algorithms are among the most basic methods for filtering sexual contents in images. ***Researchers are working on this problem from the early 2000’s. Jiao et. al., “Detecting adult image using multiple features”, Info-tech and Info-net 2001 Duan et. al., “Adult image detection method based on skin color model and support vector machine”, Asian Conference on Computer Vision 2002 Zheng et. al., “Shape based adult image detection”, International Journal on Image and Graphics 2006 Lee et. al., “Naked image detection based on adaptive and extensible skin color model”, Pattern recognition 2007

Limitations of Existing Approaches (contd.) Current methods can not differentiate between pornographic content, portrait or harmless body shot like below. Existing works pay little attention to the intention behind the image composition and the goals of the photographic subject with respect to how the photo should be perceived. They cannot differentiate between “naked” subjects and “sexual” subjects. For example, these schemes would fail to distinguish between a portrait like “Mona Lisa”, a harmless photo of a bodybuilder, and a person posing with sexual intent. ***Because mostly their assumption is relied on the amount of skin ratio, they extracted from the image

Approach: Identifying Features 17 types of Attributes composed from: Posture and gesture Posture, gesture with fingers, movement, head position, direction of body and face relative to camera, etc. Facial expression Mouth open or closed, type of smile, biting lips, eyebrows, eyelids, looking direction Scene context Outdoor scene, outdoor events, indoor scenes with props or with flat background Skin exposure Fully clothed, bare bodied, private body parts exposed 5 types of Moods and Emotions: Defensive, suggestive, playful, relaxed, upset 3 Sexual Intents Yes, maybe, no We identify 17 different attributes based on posture, gesture, facial expression, scene context and skin exposure. Earlier work suggested that they can help identify intents behind images. We also identified moods and emotions of the subject to predict the sexual intents.

Hierarchical Framework Y M N Sexual Intent D S P R U Moods and Emotions E F Image Background C B Skin Exposure L Facial Expressions H W G Posture and Gesture Color/SIFT/HOG/FC6/FC7/FC8 Automatically Extracted Features We propose a hierarchical framework where from the input images, the low level features like CNN FC7 (using Caffe tools) or SIFT (using VLFEAT) are extracted. Then a set of multi-class SVMs are trained from these extracted features to learn each attribute (remember we have 17 attributes), from attributes to moods and emotions, and then from moods to final global classes of sexual intents. Knowledge is propagated through the layers of hierarchical framework. 1st level SVMs are trained using automatic features (e.g. CNN FC7, SIFT, etc.) to predict attributes 2nd level SVMs are trained using the 17 attributes to predict moods and emotions 3rd level SVMs are trained using 5 moods and emotions to predict global sexual intent.

Experiments: Dataset 1,146 celebrity images 203 Hollywood celebrities from people.com 892 and 254 images of female and male candidates respectively 5.6 images per person ratio 19 questions per image for annotations Amazon Mechanical Turk by majority voting of 3 annotators per image 70.5% annotator consensus We introduce a new dataset of 1146 images. The novelty of the dataset is in its annotations. Identifying 19 relevant questions that can help develop insight of sexual intents behind images.

Experiments: Baseline Automatically extracted features Low level features: Color histogram, SIFT, HOG using VLFeat CaffeNet Features: FC6, FC7, FC8 using Caffe Direct model Single level of classification hierarchy trained from automatically extracted features to predict sexual intent Joo et. al., “Visual persuasion: inferring communicative intents of images”, CVPR 2014 Subset of features mapped based on relevance to our problem domain This is a method to predict intents behind photographs of politicians.

Results: Overview In general hierarchical framework has higher F-measure than the corresponding direct model. FC7 performs the best among all low level features and baseline of Joo et. al. ***Precision = TP/TP + FP ***Sensitivity (true positive rate) = Recall = TP / P = TP / (TP + FN) ***F-measure = (2 * Precision * Recall) / (Precision + Recall)

Results: Overview In general hierarchical framework has comparable accuracy with the corresponding direct model. ***Accuracy = TP + TN / (TP + TN + FP + FN) ***From the results, the hierarchical framework accuracy is comparable to direct approach (our rival), but at the same page, we are having a higher F-measure compare to direct model which shows the superiority of hierarchical framework for the task of sexual content screening.

Results: Overview In general hierarchical framework has higher sensitivity than the corresponding direct model. Higher sensitivity allows the ability to catch any and all sexually provocative images. ***Sensitivity (true positive rate) = Recall = TP / P = TP / (TP + FN)

Results: Overview In general hierarchical framework has lower specificity compared to the corresponding direct model. Lower specificity means higher false positive which leads to classify non-provocative images as sexually provocative images. In a real world application of this model, the lower specificity imposes a manual refining process for detecting non-sexual contents out of all sexually classified contents, which is less problematic than lower sensitivity. ***Specificity (true negative rate) = TN / N = TN / (TN + FP)

Conclusion Our method enables automated contents classification based on behaviors and intents of the portrayed subjects. It allows prompt intervention of human experts upon integrating the proposed methodology with mobile apps, social media websites, and media streaming websites. Please come see our poster for more info.