Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Data Mining Classification: Alternative Techniques
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
SVM—Support Vector Machines
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Optimizing Statistical Information Extraction Programs Over Evolving Text Fei Chen Xixuan (Aaron) Feng Christopher Ré Min Wang.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Iterative Set Expansion of Named Entities using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Holistic Web Page Classification William W. Cohen Center for Automated Learning and Discovery (CALD) Carnegie-Mellon University.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.
Fast Webpage classification using URL features Authors: Min-Yen Kan Hoang and Oanh Nguyen Thi Conference: ICIKM 2005 Reporter: Yi-Ren Yeh.
Bayesian Networks. Male brain wiring Female brain wiring.
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.
EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –TBEDL ML: Classical methods from AI.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Presenter: Shanshan Lu 03/04/2010
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :
Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :
Learning TFC Meeting, SRI March 2005 On the Collective Classification of “Speech Acts” Vitor R. Carvalho & William W. Cohen Carnegie Mellon University.
AUTOMATIC TARGET RECOGNITION AND DATA FUSION March 9 th, 2004 Bala Lakshminarayanan.
Graph Algorithms: Classification William Cohen. Outline Last week: – PageRank – one algorithm on graphs edges and nodes in memory nodes in memory nothing.
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
High Throughput and Programmable Online Traffic Classifier on FPGA Author: Da Tong, Lu Sun, Kiran Kumar Matam, Viktor Prasanna Publisher: FPGA 2013 Presenter:
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
SIGIR, August 2005, Salvador, Brazil On the Collective Classification of “Speech Acts” Vitor R. Carvalho & William W. Cohen Carnegie Mellon University.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Bayesian Conditional Random Fields using Power EP Tom Minka Joint work with Yuan Qi and Martin Szummer.
Learning Bayesian Networks for Complex Relational Data
Online Multiscale Dynamic Topic Models
Sofus A. Macskassy Fetch Technologies
iSRD Spam Review Detection with Imbalanced Data Distributions
Leverage Consensus Partition for Domain-Specific Entity Coreference
Discriminative Probabilistic Models for Relational Data
Label and Link Prediction in Relational Data
Presentation transcript:

Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute *, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA Abstract Statistical relational learning has been widely studied to predict class labels simultaneously for relational data, such as hyperlinked webpages, social networks, and data in a relational database. The existing collective classification methods are usually expensive due to the iterative inference in graphical models and their learning procedures based on iterative optimization. When the dataset is large, the cost of maintaining large graphs or related instances in memory becomes a problem as well. Stacked graphical learning has been proposed for collective classification with efficient inference. However, the memory and time cost of standard stacked graphical learning is still expensive since it requires cross-validation-like predictions to be constructed during training. In this paper, we proposed a new scheme to integrate recently-developed single-pass online learning with stacked learning, to save training time and to handle large streaming datasets with minimal memory overhead. Experimentally we showed that online stacked graphical learning gives accurate and reliable results on eleven sample problems from three domains, with much less time and memory cost. With competitive accuracy, high efficiency and low memory cost, online stacked graphical learning is very promising in real world large-scale applications. Also, with the online learning scheme, stacked graphical learning is able to be applied to streaming data. Introduction There are many relational datasets in reality, where the instances are not independent to each other Web pages linked to each other; Data in a database; Papers with citations, co- authorships; … Statistical relational learning Traditional machine learning algorithms assume independence among records Relational models analyze dependence among instances Relational Bayesian networks / Relational Markov networks / Relational dependency networks / Markov logic networks/ … Most existing models are expensive Iterative inference in graphical models An algorithm with efficient inference is important in applications Experimental results - continued Sequential partitioning –Task Sequential classification with long runs of identical labels –Datasets Signature dataset; FAQ dataset; video segmentation –Baseline –Stacked models –Competitive models Conditional random fields –Relational template: Exists Predictions of ten adjacent examples Named Entity extractions –Datasets Person name extraction in s and protein name extraction in Medline abstracts –Baseline –Stacked models –Competitive models Conditional random fields –Relational template: A second one including adjacent words and repeated words Online stacked graphical learning Single-pass online algorithm - modified balanced Winnow (MBW) A single-pass online learning algorithm needs only a single training pass over the available data. Previous work showed that MBW can provide batch-level performance. Intermediate predictions for training data are generated to learn the online model Combining online learning with stacked learning may help save training time and memory Efficiency Learning efficiency –Task: compare the training time –Competitive models Standard stacked graphical models Competitive relational models Inference efficiency –Demonstrated in SDM 07 and Kou’s dissertation ~80 times faster than Gibbs sampling Summary Accurate Represent dependencies among relational data Competitive compared to state-of-art relational models Relational Markov networks; relational dependency networks Efficient During inference: ~80 times faster than Gibbs sampling During learning: online stacked learning is over 10 times faster than competitive relational models Standard stacked graphical models (SGMs) Predict the class labels based on local features with a base learning method Get an expanded feature vector and train a model with the expanded features Standard stacked graphical learning – SDM 07 is effective and efficient in inference, while still expensive in learning requires to calculate the predictions for training data in a cross-validated way Solution: online stacked graphical learning Collective classification (accuracy) Name extraction (F) Sequential Partitioning (accuracy) SLIFWebKbCoraCiteSeerUTYapexGenia CSpac e FAQSignatureVideo Local models Maxent MBW Competitive model Stacked graphical models Standard stacked models (MaxEnt) Standard stacked models (MBW) Online stacked graphical models x2x2 x1x1 x5x5 x4x4 x3x3 stacked models + y2y2 y3y3 y1y1 y4y4 y5y5 x 2, y’ 2 x 1, y’ 1 x 5, y’ 5 x 4, y’ 4 x 3, y’ 3 x 2, C(x 2,y’) x 1, C(x 1,y’) x 5, C(x 5,y’) x 4, C(x 4,y’) x 3, C(x 3,y’) Iterate Very few! x1x1 x2x2 … xbxb x 2b ……… xnxn …… … … ……………………… burn-in data size Efficiency analysis when there are infinitely many training examples, i.e., kb << n, single-pass training over the training set at level k there are reliable predictions after (k+1)b examples have streamed by the learner needs to maintain only k classifiers and does not need to store examples Experimental results Collective classification over relational data –Datasets Document classifications: WebKB, Cora, Citeseer Text region detection in SLIF –Baselines MaxEnt learner MBW –Competitive models Relational dependency networks –Stacked models Standard stacked models based on MaxEnt and MBW, respectively Online stacked graphical models based on MBW –Relational template: Count Standard SGM vs Online SGM Competitive relational model vs online SGM SLIF WebKB Cora Signature FAQ Video UT Yapex Genia CSpace Average speed-up w0w0 w1w1 w2w2 w3w3 w4w4 ras4gene ras4 The w 10 ……