CSE 534 Final Project Internet Outage Analysis Name: Guanyu Zhu, Wei-Ting Lin, Zhaowei Sun Professor: Phillipa Gill.

Slides:



Advertisements
Similar presentations
Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.
Advertisements

PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
Evaluation of Decision Forests on Text Categorization
Authorship Verification Authorship Identification Authorship Attribution Stylometry.
Scott Wiese ECE 539 Professor Hu
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
Lesson learnt from the UCSD datamining contest Richard Sia 2008/10/10.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
+ Doing More with Less : Student Modeling and Performance Prediction with Reduced Content Models Yun Huang, University of Pittsburgh Yanbo Xu, Carnegie.
TTI's Gender Prediction System using Bootstrapping and Identical-Hierarchy Mohammad Golam Sohrab Computational Intelligence Laboratory Toyota.
SEO PACKAGES. Types of Plans Starter Plan Business Plan Enterprises Plan.
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
Tag-based Social Interest Discovery
Jay Stokes, Microsoft Research John Platt, Microsoft Research Joseph Kravis, Microsoft Network Security Michael Shilman, ChatterPop, Inc. ALADIN: Active.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Personalisation Seminar on Unlocking the Secrets of the Past: Text Mining for Historical Documents Sven Steudter.
July 25, 2010 SensorKDD Activity Recognition Using Cell Phone Accelerometers Jennifer Kwapisz, Gary Weiss, Samuel Moore Department of Computer &
ADHD – Presentation Week 3 Arjun Watane Soumyabrata Dey.
Some Key Questions about you Data Damian Gordon Brendan Tierney Brian Mac Namee.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. BNS Feature Scaling: An Improved Representation over TF·IDF for SVM Text Classification Presenter : Lin,
Event Detection using Customer Care Calls 04/17/2013 IEEE INFOCOM 2013 Yi-Chao Chen 1, Gene Moo Lee 1, Nick Duffield 2, Lili Qiu 1, Jia Wang 2 The University.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Feng Zhang, Guang Qiu, Jiajun Bu*, Mingcheng Qu, Chun Chen College of Computer Science, Zhejiang University Hangzhou, China Reporter: 洪紹祥 Adviser: 鄭淑真.
AUTOMATED TEXT CATEGORIZATION: THE TWO-DIMENSIONAL PROBABILITY MODE Abdulaziz alsharikh.
Multimodal Information Analysis for Emotion Recognition
Improving Classification Accuracy Using Automatically Extracted Training Data Ariel Fuxman A. Kannan, A. Goldberg, R. Agrawal, P. Tsaparas, J. Shafer Search.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Automatic Detection of Social Tag Spams Using a Text Mining Approach Hsin-Chang Yang Associate Professor Department of Information Management National.
Response Class Projects Review. Summary of functionality Configuration Management Establish connection to the servers Establish connection to the.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Spam Detection Ethan Grefe December 13, 2013.
Project Final Presentation – Dec. 6, 2012 CS 5604 : Information Storage and Retrieval Instructor: Prof. Edward Fox GTA : Tarek Kanan ProjArabic Team Ahmed.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Intelligent Systems Research Centre University of Ulster, Magee Campus BCI Research at the ISRC, University of Ulster N. Ireland, UK By Dr. Girijesh Prasad.
Classifying Covert Photographs CVPR 2012 POSTER. Outline  Introduction  Combine Image Features and Attributes  Experiment  Conclusion.
Classification using Co-Training
Miloš Kotlar 2012/115 Single Layer Perceptron Linear Classifier.
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.
Collaborative Deep Learning for Recommender Systems
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
College of Engineering
Deceptive News Prediction Clickbait Score Inference
Analyzing and Visualizing Disaster Phases from Social Media Streams
Thwarting the Nigritude Ultramarine: Learning to Identify Link Spam
Supervised vs. unsupervised Learning
Classification and Prediction
A Classification Data Set for PLM
BCI Research at the ISRC, University of Ulster N. Ireland, UK
Stable and Practical AS Relationship Inference with ProbLink
Spam Detection Using Support Vector Machine Presenting By Nan Mya Oo University of Computer Studies Taunggyi.
Presenter: Donovan Orn
USING NLP TO MAKE UNSTRUCTURED DATA HIGHLY ACCESSABLE
Presentation transcript:

CSE 534 Final Project Internet Outage Analysis Name: Guanyu Zhu, Wei-Ting Lin, Zhaowei Sun Professor: Phillipa Gill

Motivation/ Goal Motivation: (1) Network outages can lead societal and economic impact. (2) Knowing the reasons of network outages are always desirable Goal: (1) Find out what type of outages occur commonly (2) Predict the on-going outage type

Data Set First Sep 29, 2006 Last Mar 24, 2015 Num of Posts6963 Num of Threads2102 Num of Replies4725 Num of Posters1256 Summary of Outage mailing list dataset What - Outage Mailing list Why - Public (Free) / rich information

Preliminary Data Analysis:  Content Providers (Yahoo, google, facebook…etc)  ISPs (AT&T, Verizon, Sprint…etc)  Protocols (BGP, DNS, IPv6…etc)  Security (DDoS, Hijack, Virus…etc)

Preliminary Data Analysis

Data Preprocessing Steps:  Integrate threads  Remove words unrelated to network outage  Stemming and Lemmatization  Remove words with less TF-IDF value  Generate Term Frequency in the dataset

Classification Labeling  Labeling Standard

Labeling Standard

Classification Labeling  Labeling Standard  Why labeling  How to label(Fleiss’ kappa)

Classification  Train the classifier Multiple Classification -> Multiple Binary Classification ---- one vs all Why using this method?  Test the classifier’s effect Halve labeled data--training data and test data separately Evaluation the Classifier – Accuracy of the classification, Confusion Matrix

Classifier accuracy

Classification  Train the classifier Multiple Classification -> Multiple Binary Classification ---- one vs all Why using this method?  Test the classifier’s effect Halve labeled data--training data and test data separately Evaluation the Classifier – Accuracy of the classification, Confusion Matrix  Classify the unlabeled data Based on the substantial well accuracy of the classification, classify the remaining unlabeled data.

Result  Outage Types Distribution of each year

Outage Types Distribution of Each Year

Result  Each year outage types distribution  every outage type percentage

Outage Types Percentage

Result  Each year outage types distribution  every outage type percentage  Extension:  Real-time outage type prediction

Real-time outage type prediction  How to do Integrate data preprocessing, classification method, real-time predict new mail’s outage type and show on website immediately.  What to show If the mail text include traceroute information, then extract it and show on the website. Combine the 2015’s all mail text and analyze the tendency of the outage type.

Real-time outage type prediction

Conclusion  Feature of Outage Causes Mobile network issues are increasing Common outage types are easily observed by users  Real-time Predict the on-going Outage Type  Future Work Analyzing keywords with associated outage type in advance Integrate data based on subjects VS threads