Kanchana Ihalagedara Rajitha Kithuldeniya Supun weerasekara

Slides:



Advertisements
Similar presentations
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Advertisements

Web Content Filter: technology for social safe browsing Ilya Tikhomirov Institute for Systems Analysis of the Russian Academy of Sciences
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
Academic Advisor: Dr. Yuval Elovici Technical Advisor: Dr. Lidror Troyansky ADD Presentation.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Proxy Servers Dr. Ronald Bergmann, CIO, ISO. Proxy servers A proxy server is a machine which acts as an intermediary between the computers of a local.
 Proxy Servers are software that act as intermediaries between client and servers on the Internet.  They help users on private networks get information.
Pro Exchange SPAM Filter An Exchange 2000 based spam filtering solution.
Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Staying Safe. Files can be added to a computer by:- when users are copying files from a USB stick or CD/DVD - downloading files from the Internet - opening.
By: Paul Hill Technology Coordinator Gwinn Area Community Schools.
Appendix: The WEKA Data Mining Software
System Administration and Maintenance. Proxy Server 1 Purpose – – To separate internal network from internet (NAT) To cache often used content User control:
KYLE PATTERSON Automatic Age Estimation and Interactive Museum Exhibits Advisors: Prof. Cass and Prof. Lawson.
The identification of interesting web sites Presented by Xiaoshu Cai.
Philadelphia Area SharePoint User Group Building Customer/Partner Extranets Designing a Secure Extranet with Sharepoint 2007 Russ Basiura RJB Technical.
TEAM Basic TotalElectrostatic ManagementAwareness&
Web Categorization Crawler Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Design & Architecture Dec
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
What does WWW stand for? And following abbreviations? HTTP: Hyper Text Transfer Protocol HTML: Hyper Text Mark-up Language URL: Uniform Resource Locator.
SPAM DETECTION AND FILTERING By Prasanna Kunchavaram.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
McLean HIGHER COMPUTER NETWORKING Lesson 14 Firewalls & Filtering Comparison of Internet content filtering methods: firewalls, Internet filtering.
GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.
On The Cooperation of Web Clients and Proxy Caches Yiu Fai Sit, Francis C.M. Lau, Cho-Li Wang Department of Computer Science The University of Hong Kong.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
ProjFocusedCrawler CS5604 Information Storage and Retrieval, Fall 2012 Virginia Tech December 4, 2012 Mohamed M. G. Farag Mohammed Saquib Khan Prasad Krishnamurthi.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Nuhi BESIMI, Adrian BESIMI, Visar SHEHU
Software Configuration Management SEII-Lecture 21
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Securing Web Access Senior Design III – Spring 2009 Matt Shea.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Introduction to HTML Simple facts yet crucial to beginning of study in fundamentals of web page design!
Institute of Informatics & Telecommunications NCSR “Demokritos” Spidering Tool, Corpus collection Vangelis Karkaletsis, Kostas Stamatakis, Dimitra Farmakiotou.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
PRESENTED BY CHRIS ANDERSON MAY 12, 2008 Setting Up DansGuardian with Squid.
E-Business Infrastructure PRESENTED BY IKA NOVITA DEWI, MCS.
Big Data Processing of School Shooting Archives
Detecting Web Attacks Using Multi-Stage Log Analysis
Experience Report: System Log Analysis for Anomaly Detection
Under the Shadow of Sunshine: Understanding and Detecting Bulletproof Hosting on Legitimate Service Provider Networks Sumayah Alrwais, Xiaojing Liao, Xianghang.
Learning to Detect and Classify Malicious Executables in the Wild by J
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Automated Classification of Galaxy Images
LINUX ADMINISTRATION 1
E-commerce | WWW World Wide Web - Concepts
E-commerce | WWW World Wide Web - Concepts
Application of Classification and Clustering Methods on mVoC (Medical Voice of Customer) data for Scientific Engagement Yingzi Xu, Department of Statistics,
Data Mining 101 with Scikit-Learn
Web page a hypertext document connected to the World Wide Web.
CS6604 Project Ensemble Classification
Natural Language Processing of Knee MRI Reports
Machine Learning Week 1.
A Clinical trial awareness tool
Machine Learning with Weka
Thwarting the Nigritude Ultramarine: Learning to Identify Link Spam
Text Categorization Rong Jin.
iSRD Spam Review Detection with Imbalanced Data Distributions
Lesson 4: Hyperlinks.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Web Mining Research: A Survey
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Election Donor Records Linkage
Elena Mikhalkova, Nadezhda Ganzherli, Yuri Karyakin, Dmitriy Grigoryev
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Kanchana Ihalagedara Rajitha Kithuldeniya Supun weerasekara Supervised by Mr.Sampath Deegalla Feasibility of using Machine Learning to Access Control in Squid Proxy Server Kanchana Ihalagedara Rajitha Kithuldeniya Supun weerasekara 7/13/2019 Escape 2015

Internet in Educational Institutes Mainly for educational purposes. What happens if users priority is not the intended purpose. Network congestions Wastage of resources Affects individual user performance negatively 7/13/2019 Escape 2015

Blocking Web Sites in Proxy Server Squid ACLs - Text file of blacklists SquidGuard - External databases DansGuardian - Content filter 7/13/2019 Escape 2015

World Wide Web is Growing 672,985,183 - 2013 968,882,453 - 2014 295,897,270 From www.internetlivestats.com Manually blacklisting web sites is impossible Related products are not updated with the growing web 7/13/2019 Escape 2015

Dynamic automated method Automated web classification is required Machine Learning is used in automated web classification 7/13/2019 Escape 2015

Over View of Our Solution Copy client request Check URL Get web content Classify web content Update the blacklist 7/13/2019 Escape 2015

Machine Learning in Web Classification Several web classification researches can be found Frequently used algorithms Naïve Byes Support vector machine Nearest neighbor Classification requires a data set Set of URLs labeled as educational or non educational 7/13/2019 Escape 2015

Data Collection & Preprocessing Preprocess Squid server log Preprocess DMOZ data set Create labeled URLs Get web content Create training data set 7/13/2019 Escape 2015

Model Creation & Testing Four models were created from WEKA(small data set) Data set with two hundred records 10 – fold cross validation for testing Algorithm Accuracy(%) PRISM 74.5 C4.5 (J48 in WEKA) 83.0 Naïve bayes 95.0 Support Vector Machines 95.5 7/13/2019 Escape 2015

Model Creation & Testing Three models using Python (larger dataset) Data set of 4000 records Separate data set of 1000 records for Testing Algorithm Accuracy Naïve Bayes multinomial 92.9% SVC 77.5% Linear SVC 98.9% 7/13/2019 Escape 2015

Feature Selection in Linear SVC 7/13/2019 Escape 2015

Principal Component Analysis 7/13/2019 Escape 2015

Future Work Consider more content (Meta data) Other Languages (Sinhala) Image processing can be added 7/13/2019 Escape 2015

Thank You! 7/13/2019 Escape 2015