Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

Slides:



Advertisements
Similar presentations
Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
Advertisements

Introduction Lesson 1 Microsoft Office 2010 and the Internet
MICHAEL MARINO CSC 101 Whats New in Office Office Live Workspace 3 new things about Office Live Workspace are: Anywhere Access Store Microsoft.
COMPREHENSIVE APPROACH TO INFORMATION SECURITY IN ADVANCED COMPANIES.
LeadManager™- Internet Marketing Lead Management Solution May, 2009.
® Microsoft Office 2010 Browser and Basics.
July 2010 D2.1 Upgrading strategy Javier Soto Catalog Release 3. Communities.
CareCentrix Direct Training.
Copyright © 2008 Pearson Prentice Hall. All rights reserved Copyright © 2008 Prentice-Hall. All rights reserved. Committed to Shaping the Next.
Wincite Introduces Knowledge Notebooks A new approach to collecting, organizing and distributing internal and external information sources and analysis.
© 2008 RightNow Technologies, Inc. Title Best Practices for Maintaining Your RightNow Knowledge Base Penni Kolpin Knowledge Engineer.
Academic Advisor: Dr. Yuval Elovici Technical Advisor: Dr. Lidror Troyansky ADD Presentation.
A. Frank 1 Internet Resources Discovery (IRD) Peer-to-Peer (P2P) Technology (1) Thanks to Carmit Valit and Olga Gamayunov.
1 Computing for Todays Lecture 22 Yumei Huo Fall 2006.
Microsoft Visio is diagramming software for Microsoft Windows. It uses vector graphics to create diagrams. The 2007 Standard and Professional editions.
Academic Advisor: Dr. Yuval Elovici Technical Advisor: Dr. Lidror Troyansky.
Generic Simulator for Users' Movements and Behavior in Collaborative Systems.
Marketing The Basics What is Marketing? marketing is the advertisement of a product, service, or brand through.
The AdWords Toolbox All the tools you need to make your ad run more efficiently!
HOW TO USE BY ALEX ROSS ALEX ROSS. HOW TO CREATE ACCOUNT FOR DUMMIES is a great way to communicate with others. We can interact with.
Office 2003 to Office 2007 Transition. What’s New?  New GUI groups commands better  Better access to templates  Tabbed documents  Enhanced sorting.
Access 2007 ® Use Databases How can Access help you to find and use information?
1 Enabling Secure Internet Access with ISA Server.
Mendeley What is it? How is it different from other “Bibliographic databases” like End Note and Reference.
Computer Literacy BASICS: A Comprehensive Guide to IC 3, 5 th Edition Lesson 14 Sharing Documents 1 Morrison / Wells / Ruffolo.
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
1 SOCIAL BOOKMARKING 101. HIBA KHALID BILAL SAEED KHAN FARID ALIANI ASKARI HASAN SOCIAL BOOKMARKING.
P2P File Sharing Systems
CEDROM-SNi’s DITA- based Project From Analysis to Delivery By France Baril Documentation Architect.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. M I C R O S O F T ® Preparing for Electronic Distribution Lesson 14.
Lecturer: Ghadah Aldehim
14 Publishing a Web Site Section 14.1 Identify the technical needs of a Web server Evaluate Web hosts Compare and contrast internal and external Web hosting.
XP New Perspectives on Microsoft Office Access 2003 Tutorial 12 1 Microsoft Office Access 2003 Tutorial 12 – Managing and Securing a Database.
Tutorial 1: Getting Started with Adobe Dreamweaver CS4.
Windows Internet Explorer 9 Chapter 1 Introduction to Internet Explorer.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
So far, we have…
Using a Template to Create a Resume and Sharing a Finished Document
Office of Educational Technology School District of Philadelphia Introduction to Sites Google Sites This presentation is available at
Anti-Phishing Approaches Lifeng Hu
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
VistA Imaging Capture via Scanning. October VistA Imaging Capture via Scanning The information in this documentation includes only new and updated.
Hello We are looking for a coder to code the following We have a network that has different sites: we need an application that will scan the network and.
Advanced Lesson 5: Advanced Data Management Excel can import data, or bring it in from other sources and file formats. Importing data is useful because.
What’s Working in the Real World The Quick Data Excel Add In.
What’s New in WatchGuard XCS v9.1 Update 1. WatchGuard XCS v9.1 Update 1  Enhancements that improve ease of use New Dashboard items  Mail Summary >
XP New Perspectives on Microsoft Office FrontPage 2003 Tutorial 7 1 Microsoft Office FrontPage 2003 Tutorial 7 – Creating and Using Templates in a Web.
A Guide to Using Google Docs for Miss Micklos and Mr. Kelly Google Docs.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
August 2005 TMCOps TMC Operator Requirements and Position Descriptions Phase 2 Interactive Tool Project Presentation.
WebFOCUS Magnify: Search Based Applications Dr. Rado Kotorov Technical Director of Strategic Product Management.
«Fly Carrier» agent software Optimization of data transmission over IP satellite networks.
ONE® Pages Training Presentation North York General Hospital.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Management System For Graduate Students Projects Day Presentation – June 2011.
Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.
1 of 5 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
AUTOMATED HCES WORKSHEETS OM400, OM500, JB1200 Prepared by OCM 10/27/2008.
Fab25 User Training Cerium Labs LabCollector - LIMS Lynette Ballast.
Windows Vista Configuration MCTS : Internet Explorer 7.0.
XP Creating Web Pages with Microsoft Office
The Web Web Design. 3.2 The Web Focus on Reading Main Ideas A URL is an address that identifies a specific Web page. Web browsers have varying capabilities.
Playing Well with Others Collaborative Tools for Successful Group Projects James M. Donovan, J.D., Ph.D. Faculty and Access Services Librarian.
Intro to Google Docs 2014.
Chapter 10: Web Basics.
Project Management: Messages
Handling Data Using Databases
COMPREHENSIVE APPROACH TO INFORMATION SECURITY IN ADVANCED COMPANIES
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Unit 32 Every class minute counts! 2 assignments 3 tasks/assignment
Presentation transcript:

Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic

As the world gets computerized and connected, organizations are getting more and more exposed to data leaks (both malicious and innocent). 70% of the network traffic is occupied by P2P! BitTorrent, eMule, FreeNet, Gnutella… File Transferors can deliberately or unintentionally distribute sensitive information in seconds to all of the world! Examples: –Israeli Air Force lieutenant colonel shared via P2P his laptop and revealed confidential documents of the Israeli Air Force and got suspended from his office. –Israeli Police of Eilat’s chief of Intelligence also shared a secret police plan with all of the world and risked many policemen lives…

Nothing!!!! A Google search for the terms “P2P networks Information leaks” results with just 148 pages!!! After checking the first 50 for the relevance we got tired… As a world leader in the ILP (Information Leaks Prevention) PortAuthority© Technologies addressed this problem. The research will be done using “P2P Inspector Gadget” system.

Develop a system which will: –Connect to P2P networks and perform smart search and download suspicious files while avoiding P2P anti-bots algorithms. –Analyze the files (PDFs, DOCs, TXTs, source codes and other types) using smart Machine Learning, industry’s most advanced algorithms and user feedback mechanism with very few false-positives. –produce history and statistics such as IPGeoLocation and file information, stored in a database. –Enable the research of information leaks in P2P networks.

The file is in the P2P network and we have no information about it. The file is found by the system’s search engine and is fully downloaded. In this stage the file is first saved to the system’s database. The file is converted to text format as a preliminary action before it is analyzed by the system. The system currently works with all text formats and binary file types such as PDF, Word and PowerPoint. The file is analyzed by the system and its confidential probability is determined. The user is able to view the file’s content and give feedback to the system. In this stage the system adds the file to its database and to its probability hash tables. The system is reinitialized.

The problem of analyzing the files for confidential information is a part of the Categorization Problem Domain. In our case, there are two, well defined sets of documents (Confidential and Non-Confidential). There are many kinds of Algorithms for Categorization problems, after a research in the area and a warm recommendation from our professional advisor we chose the usage of an algorithm based on the Bayes Theorem, Conditional Probabilities (with some improvements). The usage of Bayes Theorem is very common in the problem of SPAM filtering (which resemblance to our problem).

The Algorithm works in two Phases: –First Phase: Learning – Building the Probabilities. At first, the Algorithms is given two Training Sets, a Set of Confidential files and a Set of Non-Confidential files. Using Bayes Conditional Probability formula, the Probability of each of the terms in the files is saved in a dedicated Data-Structure. –Second Phase: Analyzing - Combining. In the second phase each of the terms in the analyzed files gets its probability (computed in the learning phase). The Algorithm now tries to Combine the probabilities of all of the most frequent terms. We are using the Robinson-Fisher Combiner which improves greatly the Algorithm accuracy and reduces significantly the number of false-positives.

Option to force connection to a specific Ultra-peer Connecting to several Ultra-peers simultaneously Status bar shows the current status of the system and displays help messages.

Insert the keywords When pressing the system generates several search queries based on the words and file types. The user can choose which files to download, or configure the system to download all files. When the file starts to download, the system starts to save information about this file (IP sources, number of users currently hold the file and more).

The user can view the downloading progress.The user can cancel any download at anytime.The user can send a downloaded file to be analyzed.

Remaining tasks: –S–Statistics gathering. –I–Improve smart search and filtering. –A–Add more GUI functionality. –C–Conduct official algorithm test and document it. –A–ARD, Test Document, and more. Start date: Oct’ Estimated End date: Aug’ Over 15,500 lines of code and still counting… –M–More than 1267 python lines. Over 800 hours of work per man. 18 pizza platter