1 A LVQ-based neural network anti-spam email approach 楊婉秀 教授 資管碩一 詹元順 94722001 2005/12/07.

Slides:



Advertisements
Similar presentations
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Advertisements

Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
-Artificial Neural Network- Chapter 2 Basic Model
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
Competitive Networks. Outline Hamming Network.
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.
Deep Belief Networks for Spam Filtering
Document Classification Comparison Evangel Sarwar, Josh Woolever, Rebecca Zimmerman.
Implementing a reliable neuro-classifier
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Sentence Classifier for Helpdesk s Anthony 6 June 2006 Supervisors: Dr. Yuval Marom Dr. David Albrecht.
1 Pertemuan 9 JARINGAN LEARNING VECTOR QUANTIZATION Matakuliah: H0434/Jaringan Syaraf Tiruan Tahun: 2005 Versi: 1.
Distributed Representations of Sentences and Documents
Goal: Goal: Learn to automatically  File s into folders  Filter spam Motivation  Information overload - we are spending more and more time.
Introduction to Machine Learning Approach Lecture 5.
Neural Networks in Data Mining “An Overview”
Introduction to machine learning
Spam? Not any more !! Detecting spam s using neural networks ECE/CS/ME 539 Project presentation Submitted by Sivanadyan, Thiagarajan.
Presented by: Kamakhaya Argulewar Guided by: Prof. Shweta V. Jain
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
Machine Learning. Learning agent Any other agent.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Bayesian Networks. Male brain wiring Female brain wiring.
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
Presented by Tienwei Tsai July, 2005
-Artificial Neural Network- Chapter 9 Self Organization Map(SOM) 朝陽科技大學 資訊管理系 李麗華 教授.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure.
SCAVENGER: A JUNK MAIL CLASSIFICATION PROGRAM Rohan Malkhare Committee : Dr. Eugene Fink Dr. Dewey Rundus Dr. Alan Hevner.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
INFORMATION NETWORKS DIVISION COMPUTER FORENSICS UNCLASSIFIED 1 DFRWS2002 Language and Gender Author Cohort Analysis of .
Spam Detection Ethan Grefe December 13, 2013.
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
Non-Bayes classifiers. Linear discriminants, neural networks.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
1 An Anti-Spam filter based on Adaptive Neural Networks Alexandru Catalin Cosoi Researcher / BitDefender AntiSpam Laboratory
Machine Learning for Spam Filtering 1 Sai Koushik Haddunoori.
Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.
Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Class Imbalance in Text Classification
CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
A COMPARISON OF ANN, NAÏVE BAYES, AND DECISION TREE FOR THE PURPOSE OF SPAM FILTERING KAASHYAPEE JHA ECE/CS
A Comprehensive Comparative Study on Term Weighting Schemes for Text Categorization with SVM Lan Man 3 Nov, 2004.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Neural network based hybrid computing model for wind speed prediction K. Gnana Sheela, S.N. Deepa Neurocomputing Volume 122, 25 December 2013, Pages 425–429.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Learning Mid-Level Features For Recognition
CSSE463: Image Recognition Day 11
Final Year Project Presentation --- Magic Paint Face
Asymmetric Gradient Boosting with Application to Spam Filtering
CSSE463: Image Recognition Day 11
Prepared by: Mahmoud Rafeek Al-Farra
Competitive Networks.
An Improved Neural Network Algorithm for Classifying the Transmission Line Faults Slavko Vasilic Dr Mladen Kezunovic Texas A&M University.
iSRD Spam Review Detection with Imbalanced Data Distributions
Department of Electrical Engineering
Competitive Networks.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
CSSE463: Image Recognition Day 11
Machine Learning with Clinical Data
CSSE463: Image Recognition Day 11
CAMCOS Report Day December 9th, 2015 San Jose State University
Using Clustering to Make Prediction Intervals For Neural Networks
Areas under the receiver operating characteristic (ROC) curves for both the training and testing data sets based on a number of hidden-layer perceptrons.
Spam Detection Using Support Vector Machine Presenting By Nan Mya Oo University of Computer Studies Taunggyi.
Presentation transcript:

1 A LVQ-based neural network anti-spam approach 楊婉秀 教授 資管碩一 詹元順 /12/07

2 Outline 1. Introduction 2. sample and data preprocessing –2.1 representation –2.2 Feature extraction 3. Anti-spam LVQ model –3.1 Spam category. –3.2 Learning vector quantization neural network model –3.3 Anti-spam LVQ algorithm –3.4 Parameter setting 4. Experiments and result 5. Conclusion

3 1. Introduction(1/2) Spam waste users time, money, network bandwidth as well as, meanwhile, clutter users' mailboxes, even be harmful, e.g. pornographic content. In America, spam s make enterprises to be loss up to 9 billions per year. Without appropriate counter-measures, the situation will continue worsening and spam will eventually undermine the usability of .

4 1. Introduction(2/2) Duhong Chen et al. compared four algorithms, Bayes, decision tree, neural networks, Boosting, and drew a conclusion that neural network algorithm has higher performance. Experiments have proved that the LVQ-based anti-spare filter has better performance than Bayes- based and BP neural network.-based approaches.

5 2. sample and data preprocessing(1/2) 2.1 representation TFIDFi=TFi × log (N/DFi) (1) –TFi : the frequency that word ti appears in document d 2.2 Feature extraction –N : the total numbers of training documents –DFi : represents the numbers of documents which contain word ti

6 2. sample and data preprocessing(2/2) 2.2 Feature extraction –A : the numbers of s which contain word t and belong to class s –B : that of s which contain word but not belong to class s –C : that of s which belong to class s but not contain word t –N : the total number in training corpus

7 3. Anti-spam LVQ model(1/5) 3.1 Spam category.

8 3. Anti-spam LVQ model(2/5) 3.2 Learning vector quantization neural network model –The model is divided into two layers. The first layer is competitive layer, in which each neuron represents a subclass. –The second is output layer, in which each neuron represents a class.

9 3. Anti-spam LVQ model(3/5) 3.3 Anti-spam LVQ algorithm(1/2)

10 3. Anti-spam LVQ model(4/5) 3.3 Anti-spam LVQ algorithm(2/2)

11 3. Anti-spam LVQ model(5/5) 3.4 Parameter setting

12 4. Experiments and result(1/4) This project makes use of corpus from which is open available source. Select 1000 pieces s randomly from the corpus, including 580 spam s, 420 legitimate s.

13 4. Experiments and result(2/4) Anti-spare filter performance is often measured in terms of spam precision (SP) and sparn recall (SR).

14 4. Experiments and result(3/4) A criterion F1, which incorporates spam precision and spare recall.

15 4. Experiments and result(4/4)

16 5. Conclusion Both neural network-based algorithms are usually better than that based on Bayes. LVQ-based method classify spam s into several subclasses in content so that the feature words of each subclass of spam is more related and closer as well as characteristics of each subclass of spam s are easier to identify.