Detection of Internet Scam Using Logistic Regression

Slides:



Advertisements
Similar presentations
Reporter: Jing Chiu Advisor: Yuh-Jye Lee /7/181Data Mining & Machine Learning Lab.
Advertisements

Principles of Information Technology
What is Bad ? Spam, Phishing, Scam, Hoax and Malware distributed via
How It Applies In A Virtual World. Phishing Definition: n. To request confidential information over the Internet under false pretenses in order to fraudulently.
Phishing and Pharming New Identity Theft Threats Presentation by Jason Guthrie.
SDP-MARCH-Talk 恶意任务检测 姚大海 2013/11/24. papers Characterizing and Detecting Malicious Crowdsourcing Detecting Deceptive Opinion Spam Using Human Computation.
Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages.
A Quality Focused Crawler for Health Information Tim Tang.
A machine learning approach to improve precision for navigational queries in a Web information retrieval system Reiner Kraft
Personalized Cybersecurity for Dummies Jaime G. Carbonell Eugene Fink Mehrbod Sharifi Application of machine learning and crowdsourcing to adapt cybersecurity.
Phishing, Pharming, and Spam Margaret StewartTuesday, Oct. 21, 2006.
Teach a man (person) to Phish Recognizing scams, spams and other personal security attacks July 17 th, 2013 High Tea at IT, Summer, 2013.
BTT12OI.  Do you know someone who has been scammed? What happened?  Been tricked into sending someone else money (not who they thought they were) 
The OWASP Foundation OWASP Chennai Phishing.
105/35A Dum Dum Road, Kolkata – , Contact: Website: Bridging The.
Detection of Internet Scam Using Logistic Regression
Promote your website and get top listed in search engines Section E2 Andreas Livadiotis.
How It Applies In A Virtual World
Internet Safety Basics Being responsible -- and safer -- online Visit age-appropriate sites Minimize chatting with strangers. Think critically about.
PhishScore: Hacking Phishers’ Minds
1 ITGS - introduction A computer may have: a direct connection to a net (cable); or remote access (modem). Connect network to other network through: cables.
Lesson 2- Protecting Yourself Online. Determine the strength of passwords Evaluate online threats Protect against malware/hacking Protect against identity.
Know your Neighbors: Web Spam Detection Using the Web Topology Presented By, SOUMO GORAI Carlos Castillo(1), Debora Donato(1), Aristides Gionis(1), Vanessa.
Small Business Resource Power Point Series Linking.
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Anti-Phishing Approaches Lifeng Hu
Google Directory By, Dixie E. Oyola. Google Directory The Google Web Directory integrates Google's sophisticated search technology with Open Directory.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Web Spoofing Steve Newell Mike Falcon Computer Security CIS 4360.
What is Phishing?.  Phishing attempts are attempts to get valuable personal information from people via the internet.  Attempts usually come in the.
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
BTT12OI.  Do you know someone who has been scammed online? What happened?  Been tricked into sending someone else money (not who they thought they were)
Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock,
About Phishing Phishing is a criminal activity using social engineering techniques.criminalsocial engineering Phishers attempt to fraudulently acquire.
Online Services. Advertising & Marketing Big supermarket companies use lots of different ways of “saving money!” Different ways includes Tesco’s Clubcard,
ONLINE SERVICES ADVERTISING. ONLINE ADVERTISING Search Engine Results Pages Companies register with large search engines so that their websites appear.
Application of Machine Learning and Crowdsourcing to Detection of Cyber Threats Jaime G. Carbonell Eugene Fink Mehrbod Sharifi.
Minding your business on the internet Kelly Trevino Regional Director October 6,2015.
Created by Jodie Kleymeer, July 18, Permission to view and use with credit given to author. Evaluating Web Resources Authority, Content, Objectivity,
Activity 4 Catching Phish. Fishing If I went fishing what would I be doing? On the Internet fishing (phishing) is similar!
SEO stands for search engine optimization. It is simply strategies website owners use to increase ranking and traffic from the search engines. Search.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
USER´S GUIDE OF TRESMED4 WEB Working on Social Dialogue and Cooperation.
Machine Learning Methods for Cybersecurity Jaime G. Carbonell Eugene Fink Mehrbod Sharifi.
Created By Harris Milligan  YouTube would be the primary typical video sharing site inside the Web.  A lot of professionals have.
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
Catching Phish. If I went fishing what would I be doing? On the Internet fishing (phishing) is similar! On the internet people might want to get your.
Phishing and Internet Scams. Definitions and recent statistics Why is it dangerous? Phishing techniques and identifiers Examples of phishing and scam.
Session 5: How Search Engines Work. Focusing Questions How do search engines work? Is one search engine better than another?
SEO Tactics Search Engines Optimization is the best process which helps to improve your business in search engine mediums and social mediums such as Facebook,
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Cyber Info Gathering Techniques
By : Namesh Kher Big Data Insights – INFM 750
ISYM 540 Current Topics in Information System Management
Lesson 3 Safe Computing.
How we protect you from scams
Phishing is a form of social engineering that attempts to steal sensitive information.
Internet Fraud By (NAMES).
Best SEO Tips to Make Your Website Stand Out. SEARCH ENGINE OPTIMIZATION It is essential that you implement Search Engine Optimization strategies to make.
Lesson 2- Protecting Yourself Online
SEARCH ENGINE OPTIMIZATION SEO. What is SEO? It is the process of optimizing structure, design and content of your website in order to increase traffic.
Campbell R. Harvey Duke University and NBER
Search Pages and Results
Computer Security.
What is Phishing? Pronounced “Fishing”
Lesson 2- Protecting Yourself Online
Spear Phishing Awareness
Presentation transcript:

Detection of Internet Scam Using Logistic Regression Mehrbod Sharifi Eugene Fink Jaime G. Carbonell

Internet Scam Intentionally misleading information posted on the web, usually with the intent of tricking people into sending money or disclosing sensitive data.

Scam Types Medical: Fake cures, longevity, weight loss. Phishing: Pretending to be a well known company, such as PayPal, and requesting a user action. Advance payout: Requests to make a payment in order to get a large gain, such as a lottery prize. False deals: Fake offers of products, such as meds and software, at unusually steep discounts. Other: False promises of online degrees, work at home, dating, and other desirable opportunities. One picture:

Common Approach: Blacklisting Create a list of all malicious websites through engineering and user feedback. Problems: False negatives: Misses many malicious websites, such as new and moved sites. False positives: Occasionally includes legitimate websites. Before, Now

Our Work: Machine Learning Create a dataset of known scam and legitimated websites. Determine relevant features. Apply supervised learning to distinguish scams from legitimate websites. Before, Now Specific learning algorithm: L1-regularized logistic regression.

Datasets We need labeled data for supervised learning; to our knowledge, there is no publicly available data sets.

Datasets Scam queries: Top 500 Google search results for “cancer treatments”, “work at home”, and “mortgage loans”. 3 Mechanical Turk annotations per website. Web of Trust mywot.com: 200 most recent discussion threads; 159 unique domain names. Add high rank websites with >5 comments. Sort by their WOT score and keep the top and bottom. Spam emails: 1551 spam emails detected by McAfee; 11825 web links from those emails. Eliminate <10 times or in top websites. hpHosts: 100 most recent reports on hosts-file.net. Top Websites: Top 100 websites on alexa.com. Dataset Scam Non-Scam Total Scam Queries 33 63 96 Web of Trust 150 300 Spam Emails 241 none hpHosts 100 Top Websites All Datasets 524 313 837

Features Collect relevant data about websites from publicly available resources: Monthly user traffic (alexa.com) Search result rank (google.com) Being on specific blacklists The current system collects 42 features from 11 sources. No architecture all

Performance Dataset Precision Recall F1 AUC 0.983 0.966 0.974 0.992 Scam Queries 0.983 0.966 0.974 Web of Trust 0.992 0.999 All Datasets 0.979 0.981 0.980 0.985 Add bullets

Performance Add bullets

Performance Comparison with related tasks: Web Spam: Tricking search engines to get high search ranks (keyword stuffing, cloaking, etc.). Email Spam: Unwanted bulk messages. Non zero – features