Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Nathan Wiebe, Ashish Kapoor and Krysta Svore Microsoft Research ASCR Workshop Washington DC Quantum Deep Learning.
Data Mining Classification: Alternative Techniques
Introduction to Supervised Machine Learning Concepts PRESENTED BY B. Barla Cambazoglu February 21, 2014.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
On Community Outliers and their Efficient Detection in Information Networks Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Network A/B Testing: From Sampling to Estimation
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learing.
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
On the Application of Artificial Intelligence Techniques to the Quality Improvement of Industrial Processes P. Georgilakis N. Hatziargyriou Schneider ElectricNational.
Introduction to machine learning
Remote Sensing Laboratory Dept. of Information Engineering and Computer Science University of Trento Via Sommarive, 14, I Povo, Trento, Italy Remote.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Machine Learning CS 165B Spring 2012
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Active Learning for Class Imbalance Problem
Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
C LUSTERING NETWORKED DATA BASED ON LINK AND SIMILARITY IN A CTIVE LEARNING Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang 1.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Modern Topics in Multivariate Methods for Data Analysis.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Measuring Behavioral Trust in Social Networks
Quantification in Social Networks Letizia Milli, Anna Monreale, Giulio Rossetti, Dino Pedreschi, Fosca Giannotti, Fabrizio Sebastiani Computer Science.
Mining information from social media
Typically, classifiers are trained based on local features of each site in the training set of protein sequences. Thus no global sequence information is.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
1 Simulation Scenarios. 2 Computer Based Experiments Systematically planning and conducting scientific studies that change experimental variables together.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Predicting Links and Link Change in Friends Networks: Supervised.
Mining Concept-Drifting Data Streams Using Ensemble Classifiers Haixun Wang Wei Fan Philip S. YU Jiawei Han Proc. 9 th ACM SIGKDD Internal Conf. Knowledge.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
An Evolutionary Algorithm for Neural Network Learning using Direct Encoding Paul Batchis Department of Computer Science Rutgers University.
Ke (Kevin) Wu1,2, Philip Watters1, Malik Magdon-Ismail1
Compact Bilinear Pooling
Sofus A. Macskassy Fetch Technologies
Active Learning Intrusion Detection using k-Means Clustering Selection
Introductory Seminar on Research: Fall 2017
Basic machine learning background with Python scikit-learn
Dieudo Mulamba November 2017
Using Friendship Ties and Family Circles for Link Prediction
Prof. Carolina Ruiz Department of Computer Science
GAUSSIAN PROCESS REGRESSION WITHIN AN ACTIVE LEARNING SCHEME
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
GANG: Detecting Fraudulent Users in OSNs
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Label and Link Prediction in Relational Data
Xiao-Yu Zhang, Shupeng Wang, Xiaochun Yun
Modeling IDS using hybrid intelligent systems
Prof. Carolina Ruiz Department of Computer Science
Presentation transcript:

Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign

2 Information Networks: the Data Information networks  Abstraction: graphs  Data instances connected by edges representing certain relationships Examples  Telephone account networks linked by calls  user networks linked by s  Social networks linked by friendship relations  Twitter users linked by the ``follow” relation  Webpage networks interconnected by hyperlinks in the World Wide Web …

3 Active Learning: the Problem Classical task: classification of the nodes in a graph  Applications: terrorist detection, fraud detection … Why active learning  Training classification models requires labels that are often very expensive to obtain  Different labeled data will train different learners  Given an network containing millions of users, we can only sample a few users and ask experts to investigate whether they are suspicious or not, and then use the labeled data to predict which users are suspicious among all the users

4 Active Learning: the Problem Problem definition of active learning  Input: data and a classification model  Output: find out which data examples (e.g., which users) should be labeled such that the classifier could achieve higher prediction accuracy over the unlabeled data as compared to random label selection  Goal: maximize the learner's ability given a fixed budget of labeling effort.

Notations 5

Classification Model 6

The Variance Minimization Criterion 7

Experimental Results on the Co-author Network 8 # of labels2050 VM ERM Random LSC Uncertainty Classification accuracy (%) comparison

Experimental Results on the Isolet Data Set 9 Classification accuracy vs. the number of labels used

Conclusions Publication: Ming Ji and Jiawei Han, “A Variance Minimization Criterion to Active Learning on Graphs”, Proc Int. Conf. on Artificial Intelligence and Statistics (AISTAT'12), La Palma, Canary Islands, April Main advantages of the novel criterion proposed  The first work to theoretically minimize the expected prediction error of a classification model on networks/graphs  The only information used: the graph structure Do not need to know any label information The data points do not need to have feature representation Future work  Test the assumptions and applicability of the criterion on real data  Study the expected error of other classification models 10