The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions Tao Zhu 1 ; David Phipps 2 ; Adam Pridgen 3 ; Jedidiah R. Crandall 4 ;

Slides:



Advertisements
Similar presentations
2011 NetIS Presentation The Complete ePublishing Platform Designed for the 21 st Century.
Advertisements

Configuration management
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
Behaviour and expectations of single piece mail users : A French study Behaviour and expectations of single piece mail users : A French study ARCEP - TNS.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Location Cheating: A Security Challenge to Location- based Social Network Services Wenbo He 1, Xue Liu 2, Mai Ren 1 1 University of Nebraska-Lincoln 2.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
DSPIN: Detecting Automatically Spun Content on the Web Qing Zhang, David Y. Wang, Geoffrey M. Voelker University of California, San Diego 1.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 9: Implementing and Using Group Policy.
Xyleme A Dynamic Warehouse for XML Data of the Web.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
University of Jyväskylä An Observation Framework for Multi-Agent Systems Joonas Kesäniemi, Artem Katasonov * and Vagan Terziyan University of Jyväskylä,
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
A Mobile World Wide Web Search Engine Wen-Chen Hu Department of Computer Science University of North Dakota Grand Forks, ND
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
TwitterSearch : A Comparison of Microblog Search and Web Search
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
Supporting the Automatic Construction of Entity Aware Search Engines Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti Dipartimento di Informatica.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Master Thesis Defense Jan Fiedler 04/17/98
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
1 Characterizing Botnet from Spam Records Presenter: Yi-Ren Yeh ( 葉倚任 ) Authors: L. Zhuang, J. Dunagan, D. R. Simon, H. J. Wang, I. Osipkov, G. Hulten,
BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing Yilei Zhang, Zibin Zheng, and Michael R. Lyu
Requirements Analysis Visual Modeling] Lab 02 Visual Modeling (from Visual Modeling with Rational Rose and UML) A way of thinking about problems using.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
© 2010 Verizon. All Rights Reserved. PTE / DBIR.
Microblogs: Information and Social Network Huang Yuxin.
Terminology and Use Cases Status Report David Harrington IETF 88 – Nov Security Automation and Continuous Monitoring WG.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Intelligent Web Topics Search Using Early Detection and Data Analysis by Yixin Yang Presented by Yixin Yang (Advisor Dr. C.C. Lee) Presented by Yixin Yang.
CSE 592 INTERNET CENSORSHIP (FALL 2015) LECTURE 12.
National Taiwan University, Taiwan
Twitter Games: How Successful Spammers Pick Targets Vasumathi Sridharan, Vaibhav Shankar, Minaxi Gupta School of Informatics and Computing, Indiana University.
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Xinyu Xing, Wei Meng, Dan Doozan, Georgia Institute of Technology Alex C. Snoeren, UC San Diego Nick Feamster, and Wenke Lee, Georgia Institute of Technology.
Site Technology TOI Fest Q Celebration From Keyword-based Search to Semantic Search, How Big Data Enables That?
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Session 1 Module 1: Introduction to Data Integrity
ITE District 6 Annual Meeting 1 Implementing a Web-based Transportation Data Management System Prepared for: ITE District 6 Annual Meeting Honolulu, Hawaii.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Lecturer: Eng. Mohamed Adam Isak PH.D Researcher in CS M.Sc. and B.Sc. of Information Technology Engineering, Lecturer in University of Somalia and Mogadishu.
Seminar on seminar on Presented By L.Nageswara Rao 09MA1A0546. Under the guidance of Ms.Y.Sushma(M.Tech) asst.prof.
14. June 2016 Mapping democracy Indira Ishmurzina
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
SQL Database Management
Project Management: Messages
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Efficient Multi-User Indexing for Secure Keyword Search
SEARCH ENGINES & WEB CRAWLER Akshay Ghadge Roll No: 107.
Chapter 19: Architecture, Implementation, and Testing
MID-SEM REVIEW.
CS590B/690B Detecting network interference (Spring 2018)
Interdisciplinary Program in Cognitive Science Lee, Jung-Woo
TEXTAND WEB MINING.
TEXT and WEB MINING.
Changes in the Canadian Census of Population Program
CSE591: Data Mining by H. Liu
Security in SharePoint and Teams with DLP, IRM, and AIP
Presentation transcript:

The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions Tao Zhu 1 ; David Phipps 2 ; Adam Pridgen 3 ; Jedidiah R. Crandall 4 ; Dan S. Wallach 3 1 Independent Researcher 2 Bowdoin College 3 Rice University 4 University of New Mexico 22 nd USENIX Security Symposium (USENIX Security '13) 左昌國 2013/09/10 ADLab, CSIE, NCU

Outline Introduction Methodology Hypotheses Topic Extraction Discussion Conclusion 2

Introduction Microblogs in China : Weibo Sina Weibo ( ) 503 million registered users (Dec. 2012) 100 million messages sent daily Promoting visibility of social issues China employs both backbone-level filtering of IP packets and higher level filtering implemented in the software Many works focus on how and what to filter This paper focuses on how quickly microblog posts are removed 3

Introduction Contributions: The implementation of a method that detect a censorship event within 1-2 mins of its occurrence To understand how Weibo can react so quickly in terms of deleting posts with sensitive content 4 hypotheses To overcome the usage of neologisms, named entities, and informal language in Chinese for topical analysis 4

Methodology Identifying the sensitive user group Crawling posts of sensitive user group Detecting deletions 5

Methodology – Identifying the Sensitive User Group Search the outdated sensitive keywords in China Digital Times ( sensitive-words-grass-mud-horse-list/) sensitive-words-grass-mud-horse-list/ Using the keywords like “ 党产共 ”; ~ Starting with 25 sensitive users (manually selected) 6 > 5 reposts for each user 25 sensitive users > 5 deletion 26

Methodology - Identifying the Sensitive User Group Sensitive group reaches 3567 users after 15 days More than 4500 post deletions daily 1500 “permission denied” posts 12% of the total posts from the group were eventually deleted This methodology cannot a representative sample of the whole Weibo 7

Methodology - Crawling User timeline : Weibo user timeline API returns the most recent 50 posts of the specified user. Querying 3567 sensitive users one per minute 100 accounts for API call 300 concurrent Tor circuit Four-node cluster running Hadoop and HBase 8

Methodology – Detecting Deletions If a post is in the database but is not returned from Weibo  issue a secondary query for that post  to determine what error message is returned Permission-denied or system deletion “Permission-Denied” error Caused by censorship event The post still exists but cannot be accessed by users General deletion “Post does not exist” error May caused by user self deletion or censorship events The post does not exist. 9

Methodology – Detecting Deletions This paper focuses on system deletions Apparently not by users From July 2012 to September 2012, 2.38 million posts were collected, with a 12.8% total deletion rate (4.5% for system deletions and 8.3% for general deletions). The lifetime of a post is the time difference between the time the system detected the post being deleted and the creation time. The measurement fidelity is on the order of minutes 10

Distribution of Deleted Posts 11

Hypotheses How can the Weibo system find sensitive posts and remove them so quickly? How are those sensitive posts located by the moderators after a month in the huge database? Weibo has different strategies to target sensitive contents 12

Hypotheses Hypothesis 1: Weibo has filtering mechanisms as a proactive, automated defense Explicit filtering Implicit filtering “shishikanfalunhowle” Camouflaged posts 13

Hypotheses Hypothesis 2: Weibo targets specific users, such as those who frequently post sensitive content 14

15 Hypothesis 3: When a sensitive post is found, a moderator will use automated searching tools to find all of its related reposts (parent, child, etc.), and delete them all at once Hypotheses

Hypothesis 4: Deletion speed is related to the topic. That is, particular topics are targeted for deletion based on how sensitive they are. Main 5 topics: Qidong Qian Yunhui Beijing Rainstorm Diaoyu Island Group Sex 16

Topic Extraction Automatic methods are needed to classify the posts TF*IDF ( Assign weights to the terms (n-grams) of a document Pointillism approach [27] Reconstruction from grams to words and phrases using external information 17

Topic Extraction 李 W 阳 (Li Wangyang, from 李旺阳 ) 六圌四 (June Fourth, from 六四 ) 胡 () 涛 (Hu Jintao, from 胡锦涛 ) 启 - 东, 启 \ 东 and 启 / 东 (Qidong, from 启东 ) 18

Topic Extraction Which topics among these have been discussed for the longest period of time? Independent Component Analysis (ICA) Beijing, government, China, country, policeman, and people These 6 terms appear in almost every individual topic 19

Discussion – Filtering Mechanisms Proactive mechanisms Hypothesis 1 Backwards reposts search Hypothesis 3: chain reposts deletion Backwards keyword search Similar to hypothesis 3: relative keywords deletion 兲朝 37 人 ( Monitoring specific users Hypothesis 2 20

Discussion – Filtering Mechanisms Account closures 300 user accounts closed Search filtering Public timeline filtering User credit point Users can report sensitive or rumor-based posts to earn points 21

Discussion – Time-of-day Behavior 22

Discussion – Time-of-day Behavior 23

Conclusion Deletions happen most heavily in the first hour 90% of the deletions happen within the first 24 hours The 4 hypotheses 24