Download presentation
Presentation is loading. Please wait.
Published byMagnus Campbell Modified over 9 years ago
1
Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari
2
Past projects Image Spam Clustering Project – Cluster image spam through common visual features present in image attachments – Reveal common origins of image spam
3
examples 3 These two spam images exemplify illustrations with similar color composition but different layouts. This example demonstrates illustrations in spam with similar layouts but different color composition.
4
Ongoing projects: – Phishing website clustering by text and visual similarity
5
Nat West Helpful Bonking Accessibility I Help Got a question? We can help … Nat West Helpful Bonking Help 24x7 can’t I log in? Accessibility I Help … RBS ThQ Roy& Bank cq3codand Make it happen … Text Recognized by OCR
6
A Sample Cluster for PayPal
7
4 Clusters Relate to PayPal Cluster ID: 15 (76 Images)Cluster ID: 28 (20 Images)Cluster ID: 49 (13 Images)Cluster ID: 57 (22 Images)
8
Dataset Statistics 8 Days (7-10,17-19 & 22 Feb., 2011) Total number of phishing website screen-shot images: 1461 Total number of produced clusters (cutoff similarity value = 60%): 156 + 1(ungrouped)
9
Observations: high cluster purity Hard to measure completeness Next step: – Incorporate visual features such as visual layout – Brand
10
Ongoing projects: – Uncovering auction fraud from eBay transaction graph - Initial study
11
Data set: eBay transaction feedbacks – A total of 220,000 (two-hundred and twenty thousand) users are crawled. Idea of belief propagation: – Fraudsters create two types of identities - fraud and accomplice, where fraud identities are the ones used eventually to carry out the actual fraud, and the accomplice identities are the ones used to help build the reputation for the fraud identities. This pattern forms a near bipartite core in the transaction graph.
12
Algorithm: – Each vertex in the transaction graph is labeled by one of {fraud, accomplice, honest} based on their pattern of interaction with other vertexes. – Belief propagation (BP) is used to optimize the labeling across the entire graph by maximizing the joint probabilities of all the vertexes. – Honest user model: Barabasi-Albert model
14
Evaluation results on the sparse eBay transaction dataset – 20% accomplice – 50% fraud??? What can be improved: – Network too sparse (average degree is ~5, ideally >=10) – Initial probabilities (1/3, 1/3, 1/3) may not make sense. – BP seems not to scale well with large graphs.
15
Projects under plan: – Modeling online user navigation patterns and detecting anomalies using click stream data
16
Idea #1: Each user session is represented by an n-dimensional feature vector, where n is the number of Web pages in the session. – The value of each feature is a weight, indicating the degree of interest of the user in the particular Web page. – Based on these vectors, clusters of similar sessions are produced and characterized by the Web pages with the highest associated weights.
17
Idea #2: Markov Model – Pages (or page categories) as states Or page+parameters as nodes – Transition probabilities between nodes Idea #3: Graph partitioning – Pages as nodes – Edges as connectivity/weight between a pair of pages Co-occurrence, time difference, etc. – Graph partitioning to find groups of strongly correlated pages
18
Projects under plan: – Novel biometrics
19
Palm print photo
20
Touch panel: handdrawing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.