Copyright 2015 Fujitsu R&D Center Co.,LTD FRDC’s approach at PAKDD’s Data Mining Competition Ruiyu Fang, Qingliang Miao, Cuiqin Hou, Yao Meng, Lu Fang,

Slides:



Advertisements
Similar presentations
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Advertisements

Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
LibSVM LING572 Fei Xia Week 9: 3/4/08 1. Documentation The libSVM directory on Patas: /NLP_TOOLS/svm/libsvm/latest/
SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.
Mismatch string kernels for discriminative protein classification By Leslie. et.al Presented by Yan Wang.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Single Category Classification Stage One Additive Weighted Prototype Model.
Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Presented by: Travis Desell.
1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.
U.S. SENATE BILL CLASSIFICATION & VOTE PREDICTION Alessandra Paulino Rick Pocklington Serhat Selcuk Bucak.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
TTI's Gender Prediction System using Bootstrapping and Identical-Hierarchy Mohammad Golam Sohrab Computational Intelligence Laboratory Toyota.
Distributed Representations of Sentences and Documents
Learning from Multiple Outlooks Maayan Harel and Shie Mannor ICML 2011 Presented by Minhua Chen.
PAKDD'15 DATA MINING COMPETITION: GENDER PREDICTION BASED ON E-COMMERCE DATA Team members: Maria Brbic, Dragan Gamberger, Jan Kralj, Matej Mihelcic, Matija.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Advanced Multimedia Text Classification Tamara Berg.
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
©2015 Apigee Corp. All Rights Reserved. Preserving signal in customer journeys Joy Thomas, Apigee Jagdish Chand, Visa.
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Linear Document Classifier.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
A Comparative Study of Kernel Methods for Classification Applications Yan Liu Oct 21, 2003.
Transfer Learning for Image Classification Group No.: 15 Group member : Feng Cai Sauptik Dhar Sauptik.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Protein Fold Recognition as a Data Mining Coursework Project Badri Adhikari Department of Computer Science University of Missouri-Columbia.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
SVMs for (x) Recognition (From Moghaddam / Yang’s “Gender Classification with SVMs”) Brian Whitman.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin, Sanghack Lee, Ngot Bui.
Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Development of A Tool for Calculating Flexibility Development of A Tool for Calculating Flexibility.
Portfolio Selection with Support Vector Regression Henrique, Pedro Alexandre University of Brasilia, Brazil.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
A Simple Approach for Author Profiling in MapReduce
Big data classification using neural network
Queensland University of Technology
An Enhanced Support Vector Machine Model for Intrusion Detection
Machine Learning Week 1.
CIKM Competition 2014 Second Place Solution
CIKM Competition 2014 Second Place Solution
The Combination of Supervised and Unsupervised Approach
Unit 09 – LO3 - Be able to Implement and Test Products
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
Yining ZHAO Computer Network Information Center,
Presentation transcript:

Copyright 2015 Fujitsu R&D Center Co.,LTD FRDC’s approach at PAKDD’s Data Mining Competition Ruiyu Fang, Qingliang Miao, Cuiqin Hou, Yao Meng, Lu Fang, Dajun Chen Fujitsu Research and Development Center, Beijing, China

Background Copyright 2015 Fujitsu R&D Center Co.,LTD 2 Gender Prediction: The task in this competition is to predict user’s gender from product viewing logs. Our solution: Use the product viewing information within single session Use information among different sessions by exploring their potential associations We adopt a two-step strategy for gender prediction, which consists of “gender classification” and the “continuous session alignment model”.

Copyright 2015 Fujitsu R&D Center Co.,LTD 3 Features for gender classification[1] Product and Category features view products and product categories in each session as words in the document, and the “bag of word” model is applied U /12/20 20: /12/20 20:31 A00001/B00001/C00075/D33237/;A00001/B00001/C00075/D34328 A00001, A00001/B00001, A00001/B00001/C00075, A00001/B00001/C00075/D33237, A00001/B00001/C00075/D34328 Product and Category features with timestamp time stamp is taken from the start time (only year, month and date) of each session u /12/20 20: /12/20 20:31 A00001/B00001/C00075/D33237/;A00001/B00001/C00075/D34328/ male u /11/14 0: /11/14 0:37 A00001/B00001/C00019/D00044/male 2014/12/20/A00001,2014/12/20/A00001/B00001,2014/12/20/A00001/B00001/C00075,2014/12/20/A00001/B00001/C00075/D33237 ; 2014/11/14/A00001,2014/11/14/A00001/B00001,2014/11/14/A00001/B00001/C00019,2014/11/14/A00001/B00001/C00019/D00044;

Features for gender classification[2] Copyright 2015 Fujitsu R&D Center Co.,LTD 4 Same level product and category features with time stamp Considering that different products focus on different target customers, it is quite natural for individual customers to hold several fixed preferences, like products and categories. u /12/19 14: /12/19 14:42 A00003/B00036/C00190/D33072/;A00003/B00036/C00175/D33078/ female 2014/12/19/A00003, 2014/12/19/B00036, 2014/12/19/{C000175,C000190}, 2014/12/19/{D333072,D333078} Product ID Prefix with time stamp We have noticed that many products hold same product ID prefix in training data. Products share same product id prefix “D3307” as follows. Prefix length is set to 4 u /12/19 14: /12/19 14:42 A00003/B00036/C00190/D33072/;A00003/B00036/C00175/D33078 female 2014/12/19/D3307

Features for gender classification[3] Copyright 2015 Fujitsu R&D Center Co.,LTD 5 Transferring features of sequential products The transferring actions between sequentially viewed products may reflect click habits of users with different genders. Counts on different kinds of features: u /12/19 14: /12/19 14:42 A00003/B00036/C00190/D33072/;A00003/B00036/C00175/D33078/ female 2014/12/19/D33072/C FeaturesPFPFTSLTPIPTFTTotal Count 22,46435, ,15717,23192,083 Table 1. Counts of different kinds of features. PF denotes product and category feature, PFT is PF with timestamp, SLT denotes same level features with timestamp, PIP means product ID prefix features, TFT means transferring features

Features for gender classification[4] Copyright 2015 Fujitsu R&D Center Co.,LTD 6 Feature value is calculated by: Classification model: We use a well implemented SVM library named libsvm [2] with linear kernel function, We set “male” session’s weight to be 1.3 and 0.25 for “female” session during training due to the unbalance of gender ratio in training data. Summary We finally use a sparse feature set with high feature dimensions. Timestamp based features greatly increase feature dimensions, but turn out to be useful. Linear classifier is efficient and works well on this data set.

Continuous Session Alignment Model Copyright 2015 Fujitsu R&D Center Co.,LTD 7

References 1.Zellig S. Harris. Distributional structure. Word, 10: , Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, Software available at Copyright 2015 Fujitsu R&D Center Co.,LTD 8