The Combination of Supervised and Unsupervised Approach

Slides:



Advertisements
Similar presentations
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Data Mining Classification: Alternative Techniques
Transportation mode detection using mobile phones and GIS information Leon Stenneth, Ouri Wolfson, Philip Yu, Bo Xu 1University of Illinois, Chicago.
By Russell Armstrong Supervisor Mrs Wei Ji Diagnosis Analysis of Lung Cancer by Genome Expression Profiles.
Three kinds of learning
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
For Better Accuracy Eick: Ensemble Learning
PAKDD'15 DATA MINING COMPETITION: GENDER PREDICTION BASED ON E-COMMERCE DATA Team members: Maria Brbic, Dragan Gamberger, Jan Kralj, Matej Mihelcic, Matija.
Machine Learning CS 165B Spring 2012
Copyright 2015 Fujitsu R&D Center Co.,LTD FRDC’s approach at PAKDD’s Data Mining Competition Ruiyu Fang, Qingliang Miao, Cuiqin Hou, Yao Meng, Lu Fang,
Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.
Data mining and machine learning A brief introduction.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.
Classification Heejune Ahn SeoulTech Last updated May. 03.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CLASSIFICATION: Ensemble Methods
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Handwritten digit recognition
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Cooperative Classifiers Rozita Dara Supervisor: Prof. Kamel Pattern Analysis and Machine Intelligence Lab University of Waterloo.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Classification Ensemble Methods 1
COMP24111: Machine Learning Ensemble Models Gavin Brown
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Unsupervised Streaming Feature Selection in Social Media
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
An Effective Hybridized Classifier for Breast Cancer Diagnosis DISHANT MITTAL, DEV GAURAV & SANJIBAN SEKHAR ROY VIT University, India.
Hierarchical Sampling for Active Learning Sanjoy Dasgupta and Daniel Hsu University of California, San Diego Session : Active Learning and Experimental.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Introduction to Machine Learning, its potential usage in network area,
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning with Spark MLlib
Recent Trends in Text Mining
Introduction to Machine Learning
Semi-Supervised Clustering
Applying Deep Neural Network to Enhance EMPI Searching
Advanced data mining with TagHelper and Weka
Semi-supervised Machine Learning Gergana Lazarova
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Restricted Boltzmann Machines for Classification
ECE 471/571 – Lecture 18 Classifier Fusion 04/12/17.
The Elements of Statistical Learning
COMP61011 : Machine Learning Ensemble Models
Basic machine learning background with Python scikit-learn
Asymmetric Gradient Boosting with Application to Spam Filtering
Video Summarization via Determinantal Point Processes (DPP)
Combining Base Learners
A Modified Naïve Possibilistic Classifier for Numerical Data
Prepared by: Mahmoud Rafeek Al-Farra
iSRD Spam Review Detection with Imbalanced Data Distributions
Analytics: Its More than Just Modeling
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Machine Learning in Business John C. Hull
What is Artificial Intelligence?
Presentation transcript:

The Combination of Supervised and Unsupervised Approach Yingju Xia, Shuangyong Song, Qingliang Miao, Zhongguang Zheng Fujitsu Research and Development Center Copyright 2015 FUJITSU R&D CENTER CO., LTD.

Overview Training data Test data Feature Extraction Unsupervised method Features Features Predictions Supervised Ensemble Learning Combination Model Final predictions Copyright 2015 FUJITSU R&D CENTER CO., LTD.

Main Features Time features Product and Category feature Start time, End time, day, hour, weekday, … Product and Category feature Single level : such as ‘A00001’, ‘B00001’ , ‘C00001’, ‘D00001’ Combinations: such as ‘A00001/B00001’, ‘A00001/B00001/C00012’, ‘/B00001/C00012’ Transferring features The transferring from one record to other record in the same session for example: ‘A00002/B00003/C00014/D11017/’ ;‘A00010/B00055/C00135/D11018/’ has the feature: ‘A00002-A00010’, ‘B00003-B00055’, … Product ID Prefix For example: ‘D09233’ has the prefix feature ‘D0923’, ‘D092’, ‘‘D09’’ ‘D09232’ also has the prefix feature ‘D0923’, ‘D092’, ‘‘D09’’ Copyright 2015 FUJITSU R&D CENTER CO., LTD.

Supervised Ensemble Learning Dynamic Classifier Selection using competence smoothness We adopt the DCS framework[1] and use competence to measure the classifier behavior The competence is defined according to the BAC evaluation Classifiers competence is learning on Training data Graph method is used for smoothness Experimental results Dataset: the training set of PAKDD’15 (15000 samples) Training set: 11000, Valid set: 2000, Test set 2000 Models: random forest, naïve bayes, decision tree, knn, boost, neural networks About 3% enhancement by using the model fusion [1] Giacinto G, Roli F, Dynamic classifier selection based on multiple classifier behavior, Pattern Recognition 34 (2001) 1879–1881. [2] Woloszynski T, Kurzynski M, Podsiadlo P, et al. A measure of competence based on random classification for dynamic ensemble selection[J]. Information Fusion, 2012, 13(3): 207-213. Copyright 2015 FUJITSU R&D CENTER CO., LTD.

Combination of Supervised and Unsupervised method The classification is usually under the assumption of independent and identical distribution of objects The internal structure information among the objects are good complement to classification. We follow the idea of maximizing the consensus among both supervised predictions and unsupervised constraints[1]. We tried several unsupervised approaches to put the data into groups The most efficient grouping method for this data set is using the time interval of adjacent sessions About 2% enhancement by using the combination of supervised and unsupervised method [1] Gao J, Liang F, Fan W, Sun Y, and Han J. Graph-based consensus maximization among multiple supervised and unsupervised models. Advances in Neural Information Processing Systems (NIPS), 22:585–593, 2009. Copyright 2015 FUJITSU R&D CENTER CO., LTD.

Remarks The task of PAKDD 2015 is wonderful platform for evaluating machine learning methods We adopt the Dynamic Classifier Selection method for model fusion Task oriented classifiers competence and graph based smooth method is explored in model fusion The combination of supervised and unsupervised approach is explored in this contest. Future work: More general fusion method should be explored Optimal target need to explored for finding the tradeoff between supervised and unsupervised approach Copyright 2015 FUJITSU R&D CENTER CO., LTD.

Copyright 2010 FUJITSU LIMITED