Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Tool for Implementing COPA+ (Child Online Protection Act)

Similar presentations


Presentation on theme: "A Tool for Implementing COPA+ (Child Online Protection Act)"— Presentation transcript:

1 A Tool for Implementing COPA+ (Child Online Protection Act)
James Z. Wang & Gio Wiederhold, Penn State University. Inf.Sc. / Stanford University, CSD Joint Work: Jia Li, Penn State Statistics wang.ist.psu.edu / www-db.stanford.edu/IMAGE www-db.stanford.edu/pub/gio/inprogress.html#COPA 11/16/2018 J. Z. Wang & Gio Wiederhold

2 J. Z. Wang & Gio Wiederhold
Outline The Issues: legal and community pressures Current approaches to protect kids Filtering based on image content Goals and methods The WIPE system Experimental results Website classification by image content Conclusions and future work 11/16/2018 J. Z. Wang & Gio Wiederhold

3 Status of legal attempts to restrict dissemination of porn to minors:
CDA: Communications Decency Act of Restricts Transmission of Porn. Overturned for being overly restrictive of the rights of adults by Philadelphia district court; decision upheld by Supreme court in 1997. COPA: Child Online Protection Act of Fines to ISPs for delivering porn to minors. Again overturned for being overly restrictive of the rights of adults in implementation, by Philadelphia district court, decision upheld by appeals, now before Supreme court. NRC study. CIPA: Children's Internet Protection Act , passed late 2000, requires schools and libraries to install filtering software on all Internet-connected computers to screen out pornographic images as a condition of receiving federal funding. The law goes into effect April 20, but a suit is being brought again to the Philadelphia court. Regulations giving the specifics of how to comply to be issued by the Federal Communications Commission ( ) in late March 2001. The suits were/are filed by the ACLU and the ALA (Am.Library Ass.). Other participants in the arguments include the porn-industry, religious and parental organizations, the FBI, and filtering technology providers 11/16/2018 J. Z. Wang & Gio Wiederhold

4 The Size and Content of the Web
02/99: ~16 million total web servers Estimated total number of pages on the web: ~800 million 15 Terabytes of text (comparable to text of Library of Congress) Year 2001: 3 to 5 billion pages [Lawrence, Giles, Nature, 1999] Frequency of access and search #2, after music [Google] 11/16/2018 J. Z. Wang & Gio Wiederhold

5 Pornography-free Websites
E.g. Yahoo!Kids, disney.com Useful in protecting those children too young to know how to use the Web browser It is difficult to control access to other sites 11/16/2018 J. Z. Wang & Gio Wiederhold

6 J. Z. Wang & Gio Wiederhold
Filtering Software E.g.: NetNanny, Cyber Patrol, CyberSitter Methods: Store more than 10,000 IPs Blocking based on keywords Block all image access Problems: Internet is dynamic, especially porn sites Keywords are not satisfactory text hidden incorporated in images Excessive filtering (Anne Sexton, cum laude, breast cancer) Images are needed for all net users Poor reputation, poor sales, no funds to improve 11/16/2018 J. Z. Wang & Gio Wiederhold

7 Image based-filtering
The problem comes from images! Requirements: high accuracy and high speed Challenges: non-uniform image background, textual noise in foreground, wide range of image quality, wide range of camera positions, wide range of composition… Our approach: rapid feature extraction, machine learning of patterns, fast matching Applications: classify Web images and Websites 11/16/2018 J. Z. Wang & Gio Wiederhold

8 The WaveletImagePornographyElimination System
Inspired by the UC Berkeley’s FNP System Detailed analysis of images Skin filter and human figure grouper Speed: 6 mins CPU time per image Accuracy: 52% sensitivity and 96% specificity Stanford WIPE (medical image analysis spinoff) Wavelet-based feature extraction + image classification + integrated region matching + machine leaning Speed: < 1 second CPU time per image Accuracy: 96% sensitivity and 91% specificity 11/16/2018 J. Z. Wang & Gio Wiederhold

9 J. Z. Wang & Gio Wiederhold
System Flow Source Web Image Feature Extraction (color, texture, shape) Training Feature Extraction (color, texture, shape) Type Classification photograph graph Features from Training Photo Classification Result: REJECT or PASS 11/16/2018 J. Z. Wang & Gio Wiederhold

10 J. Z. Wang & Gio Wiederhold
Wavelet Principle 11/16/2018 J. Z. Wang & Gio Wiederhold

11 J. Z. Wang & Gio Wiederhold
Type Classification Graphs: Manually-generated images with constant tones, sharp edges. 11/16/2018 J. Z. Wang & Gio Wiederhold

12 J. Z. Wang & Gio Wiederhold
Type Classification Photographs: Images with continuous tones. 11/16/2018 J. Z. Wang & Gio Wiederhold

13 J. Z. Wang & Gio Wiederhold
Photo Classification Content-based image retrieval + statistical classification 11/16/2018 J. Z. Wang & Gio Wiederhold

14 J. Z. Wang & Gio Wiederhold
Experimental Results Tested on a set of over 10,000 photographic images (i.e., after type classification) Speed: Less than one second of response time on a Pentium III PC Accuracy Type of Images Test + (Rejected) Test – (Passed) Objectionable 96% 4% Benign 9% 91% 11/16/2018 J. Z. Wang & Gio Wiederhold

15 J. Z. Wang & Gio Wiederhold
Comment on Accuracy The algorithm can be adjusted to trade-off specificity for higher sensitivity In a real-world filtering application system, both the sensitivity and the specificity are expected to be higher Icons and graphs can be classified with almost 100% accuracy  higher specificity Combine text and image classification  higher sensitivity and higher speed 11/16/2018 J. Z. Wang & Gio Wiederhold

16 False Classifications Benign Images
Partially obscured human Areas with similar features Painting, fine-art Partially undressed human Animals (w/o clothes) 11/16/2018 J. Z. Wang & Gio Wiederhold

17 False Classifications Objectionable Images
Partially dressed Undressed area too small Dressed but objectionable Frame and text noise Dark, low contrast 11/16/2018 J. Z. Wang & Gio Wiederhold

18 Website Classification by Image Content
An objectionable site will have many such images For a given objectionable Website, we denote p as the chance of an image on the Website to be an objectionable image p is the percentage of objectionable images over all images provided by the site We assume some distributions of p over all Websites (e.g., Gaussian, shifted Gaussian) Classification levels could be provided as a service to filtering software producers 11/16/2018 J. Z. Wang & Gio Wiederhold

19 Flow in Website classification
11/16/2018 J. Z. Wang & Gio Wiederhold

20 Website Classification
Based on statistical analysis (see paper), we know we can expect higher than 97% accuracy on Website classification if We download images for each site We classify a Website as objectionable if 20-25% of downloaded images are objectionable Using text and IP addresses as criteria, the accuracy can be further improved skip IPs for museums, dog-shows, beach towns, sport events 11/16/2018 J. Z. Wang & Gio Wiederhold

21 Internet High Level Domain Proposal
.... .kids Sites that are kid-safe, rated by independent organization – several candidates Supported o.a. by porn industry Danger: fake .kids sites .... .xxx Legitimate sites for adults, easy to filter out for kids Potential loss of business for porn-industry (work, schools) No candidate organization – consortium of filter comp's Fear of government interference and loss of freedom No mechanism to force objectionable sites into .xxx Rejected by ICANN, accepted by New.net (Idealab) 11/16/2018 J. Z. Wang & Gio Wiederhold

22 Conclusions and Future Work
Perfect filtering is never possible Effective filtering based on image content is feasible with the current technology Systems that combine content-based filtering with text-based criteria will have good accuracy and acceptable speed Objectionable websites are automatically identifiable, a service for the community? These results were produced rapidly, they can be improved through further research. 11/16/2018 J. Z. Wang & Gio Wiederhold

23 J. Z. Wang & Gio Wiederhold
References (papers) /cgi-bin/zwang/wipe2_show.cgi (demo) /pub/gio/inprogress.html#COPA (testimony) (James Wang) (Gio Wiederhold) (Michel Bilello) 11/16/2018 J. Z. Wang & Gio Wiederhold


Download ppt "A Tool for Implementing COPA+ (Child Online Protection Act)"

Similar presentations


Ads by Google