Presentation is loading. Please wait.

Presentation is loading. Please wait.

ConceptDoppler : A Weather Tracker for Internet censorship Presenter : 장 공 수.

Similar presentations


Presentation on theme: "ConceptDoppler : A Weather Tracker for Internet censorship Presenter : 장 공 수."— Presentation transcript:

1 ConceptDoppler : A Weather Tracker for Internet censorship Presenter : 장 공 수

2 Hanyang Univ. Computer Security Lab. Paper Information Title : ConceptDoppler : A Weather Tracker for Internet Censorship Authors : Jedidiah R. Crandall, Daniel Zinn, Michael Byrd Publish : ACM 2007

3 Hanyang Univ. Computer Security Lab. Content 1. INTRODUCTION 3. LSA-BASED THE PROBING 2. PROBING THE GFC 4. FUTURE WORK 5. CONCULSION

4 Hanyang Univ. Computer Security Lab. Called the “Great Firewall of China,” or “Golden Shield” –IP address blocking –DNS redirection –Legal restrictions –etc… –Keyword filtering Blog servers, chat, HTTP traffic All probing can be performed from outside of China 1. Introduction(1/3) ■ Internet Censorship in China

5 Hanyang Univ. Computer Security Lab. Where is the keyword filtering implemented? –Internet measurement techniques to locate the filtering routers What words are being censored? –Efficient probing via document summary techniques 1. Introduction(2/3) ■ This Research has Two Parts

6 Hanyang Univ. Computer Security Lab. ■ Keyword-based Censorship ● The ability to filter keywords is an effective tool for governments that censor the Internet. - Numerous techniques comprise censorship, including IP address blocking, DNS redirection, and a myriad of legal restictions, but the ability to filter keywords in URL requests or HTML responses allows a high granularity of control that achieves the censor’s goal with low cost. ( ※ Manually filtering web content can also be precise but is prohibitively expensive.). ● Censorship is an economic activity. - The Internet has economic benefits and more blunt methods of censorship than keyword filtering, such as blocking entire web sites or services, decrease those benefits ex) while the Chinese government has shut down e-mail service for entire ISPs, temporarily blocked Internet traffic from overseas universities, and could conceivably stop any flow of information, they have also been responsive to complaints about censorship from Chinese citizens. 1. Introduction(3/3)

7 Hanyang Univ. Computer Security Lab. 2. Probing The GFC(1/5) ■ ConceptDoppler’s Infrastructure They use the netfilter module Queue to capture all packets elicited by probes. They access these packets in Perl and Python scripts, using SWIG to wrap the system library libipq. They recorded all packets sent and received, in their entirety, in a PostgreSQL database. They experiments require the construction of TCP/IP packets. For this they used Scapy, a python library for packet manipulation.

8 Hanyang Univ. Computer Security Lab. 2. Probing The GFC(2/5) ■ The GFC does not Filter peremptorily at All Time Target : They launched probes against www.yahoo.cn for 72 hours. Method - They started by sending “FALUN” (a known filtered keyword) until they received RSTs from the GFC at which point they switched to “TEST” (a word known to not be filtered) until they got a valid HTTP response to our GET request. - After each test that provoked a RST, They waited for 30 seconds before probing with “TEST”; after tests that did not trigger RSTs, they waited for 5 seconds, then probed with “FALUN”. Slipping Filtered Keywords Through

9 Hanyang Univ. Computer Security Lab. 2. Probing The GFC(3/5) ■ Filtering Statistics From 00:00 to 24:00 The x-axis is the time of day and the y-axis is measured in individual probes. What is most important to notice in Figure is that there are diurnal patterns, with the GFC filtering becoming less effective sometimes more than one fourth of offending packets through, possibly during busy Internet traffic periods. (A value of 0 on the x-axis of Figure corresponds to midnight 00:00 Pacific Standard Time which is 3 in the afternoon 15:00 in Beijing.)

10 Hanyang Univ. Computer Security Lab. 2. Probing The GFC(4/5) ■ Discovering GFC Routers The goal of this experiment To identify the IP address of the first GFC router between our probing site s and t, a target web site within China, as shown in Figure. The general idea of the experiment To increase the TTL field of the packets They send out, starting from low values corresponding to routers outside of China. To identify GFC routers, Algorithm 1 randomly selects a target IP address from T, the list of targets compiled above.

11 Hanyang Univ. Computer Security Lab. 2. Probing The GFC(5/5) Filtering does not always, or even principally, occur at the first hop into China’s address space, with only 29.6% of filtering occurring at the first hop and 11.8% occurring beyond the third, with as many as 13 hops in one case; and Routers within CHINANET-* perform 83.3% of all filtering. ☞ GFC ≠ Firewall

12 Hanyang Univ. Computer Security Lab. 3. LSA-Based Probing(1/4) ■ Discovering Blacklisted Keywords Using LSA To test for new filtered keywords efficiently, They must try only words that are related to concepts that they suspect the government might filter. Latent semantic analysis(LSA) is a way to summarize the semantics of a corpus of text conceptually. ■ Reason of Using LSA They encoded the terms with UTF-8 HTTP encoding and tested each against search.yahoo.cn.com, waiting 100 seconds after a RST and 5 seconds otherwise. A RST packet indicates that a word was filtered and is therefore on the blacklist. Then by manual filtering they removed 56 false positives from the final filtered keyword list.

13 Hanyang Univ. Computer Security Lab. LSA Background(1/2) ■ What is LSA? Latent semantic analysis Word-document model describes the occurrences of terms in documents ■ LSA Word-document matrix W X = d 1 d 2......................... d j.......... d N w1w2wiwMw1w2wiwM w ij w ij : weight(importance) tf ij : j-th terms’s count in i-th documents df j :i-th document’s count in j-th term’s

14 Hanyang Univ. Computer Security Lab. T o : orthogonal, unit-length columns D o : orthogonal, unit-length columns S o : Diagonal Matrix t : Matrix X’s terms d : Matirx X’s documents m : Matix X’s rank (< min(t,d)) T : t × k S : k × k D’ : k × d LSA Background(2/2) Example

15 Hanyang Univ. Computer Security Lab. ■ Start With a Large Corpus (Wikipedia of Chinese-lang) 3. LSA-Based Probing(2/4) ■ LSA of Chinese Wikipedia n=94,863 documents and m=942,033 terms

16 Hanyang Univ. Computer Security Lab. 3. LSA-Based Probing(4/4) ■ LSA Results In total, they discovered 122 unknown keywords.

17 Hanyang Univ. Computer Security Lab. 4. Future Work ■ Discovering Unknown Keywords 1.Applying LSA to larger Chinese corpuses 2.Keeping the corpus up-to-date on current events 3.Technical implementation 4.Implementation possibilities 5.HTML responses 6.More complex rulesets 7.Imprecise filtering(ex : breasts, Cancer-breasts) ■ Internet Measurement 1.IP tunneling or traffic engineering. 2.IXPs Technical implementation. 3.Route dependency. 4.HTML responses. 5.Destination dependency.

18 Hanyang Univ. Computer Security Lab. 5. Conclusions GFC keyword filtering is more a panopticon than a firewall motivating surveillance rather than evasion as a focus of technical research. ☞ GFC ≠ Firewall, GFC ≈ Panopticon Probing the GFC is arduous motivating efficient probing via LSA

19 Hanyang Univ. Computer Security Lab. © The New Yorker Collection 1993 Peter Steiner from cartoonlink.com. All rights reserved. Thank you very much !!!


Download ppt "ConceptDoppler : A Weather Tracker for Internet censorship Presenter : 장 공 수."

Similar presentations


Ads by Google