Survey on Semi-Supervised CRFs Yusuke Miyao Department of Computer Science The University of Tokyo.

Slides:



Advertisements
Similar presentations
Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.
Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department
An Introduction to Conditional Random Field Ching-Chun Hsiao 1.
Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.
Learning to Combine Bottom-Up and Top-Down Segmentation Anat Levin and Yair Weiss School of CS&Eng, The Hebrew University of Jerusalem, Israel.
Generalizing Backpropagation to Include Sparse Coding David M. Bradley and Drew Bagnell Robotics Institute Carnegie.
Conditional Random Fields and beyond …
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Noun. Noun - verb noun Noun - verb article- adj. - adj. - Noun - verb.
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.
John Lafferty, Andrew McCallum, Fernando Pereira
Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Derek Hao Hu, Qiang Yang Hong Kong University of Science and Technology.
Graph Based Semi- Supervised Learning Fei Wang Department of Statistical Science Cornell University.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
Learning Seminar, 2004 Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data J. Lafferty, A. McCallum, F. Pereira Presentation:
Herding: The Nonlinear Dynamics of Learning Max Welling SCIVI LAB - UCIrvine.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Conditional Random Fields
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department
Text Classification With Labeled and Unlabeled Data Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht.
Latent Boosting for Action Recognition Zhi Feng Huang et al. BMVC Jeany Son.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 6: Conditional Random Fields 1.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Constrained Optimization for Validation-Guided Conditional Random Field Learning Minmin Chen , Yixin Chen , Michael Brent , Aaron Tenney Washington University.
Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
Conditional Topic Random Fields Jun Zhu and Eric P. Xing ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011.
Graphical models for part of speech tagging
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
Semisupervised Learning A brief introduction. Semisupervised Learning Introduction Types of semisupervised learning Paper for review References.
Active Learning An example From Xu et al., “Training SpamAssassin with Active Semi- Supervised Learning”
Markov Random Fields Probabilistic Models for Images
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
1 Conditional Random Fields Jie Tang KEG, DCST, Tsinghua 24, Nov, 2005.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Presented by Jian-Shiun Tzeng 5/7/2009 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania CIS Technical Report MS-CIS
A Generalization of Forward-backward Algorithm Ai Azuma Yuji Matsumoto Nara Institute of Science and Technology.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences Duke University Machine Learning Group Presented by Qiuhua Liu March.
John Lafferty Andrew McCallum Fernando Pereira
GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Sara Mostafavi, Debajyoti Ray, David Warde-Farley,
Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Conditional Random Fields & Table Extraction Dongfang Xu School of Information.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Multi-label Classification Yusuke Miyao. N. Ghamrawi, A. McCallum. Collective multi-label classification. CIKM S. Godbole, S. Sarawagi. Discriminative.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Semi-Supervised Learning Jing xu. Slides From: Xiaojin (Jerry) Zhu ---An associate professor in the Department of Computer Sciences at University of Wisconsin-Madison.
Semi-Supervised Clustering
Restricted Boltzmann Machines for Classification
CSC 594 Topics in AI – Natural Language Processing
Semi-Supervised Learning
Presentation transcript:

Survey on Semi-Supervised CRFs Yusuke Miyao Department of Computer Science The University of Tokyo

Contents 1.Conditional Random Fields (CRFs) 2.Semi-Supervised Log-Linear Model 3.Semi-Supervised CRFs 4.Dynamic Programming for Semi-Supervised CRFs

1. Conditional Random Fields (CRFs) Log-linear model: for sentence x = and label sequence y =, – λ k : parameter – f k : feature function – Z : partition function

Parameter Estimation (1/2) Estimate parameters λ k, given labeled training data D = { } Objective function: log-likelihood (+ regularizer)

Parameter Estimation (2/2) Gradient-based optimization is applied (CG, pseudo-Newton, etc.) model expectation

Dynamic Programming for CRFs Computation of model expectations requires summation over y → exponential Dynamic programming allows for efficient computation of model expectations His friend runs the company Noun Det Noun x y

Dynamic Programming for CRFs Assumption: (0-th order) His friend runs the company Noun Det Verb Adj t ytyt

Forward/Backward Probability His friend runs the company Noun Det Verb Adj Computed by Dynamic Programming

2. Semi-Supervised Log-Linear Model Grandvalet et al. (2004) Given labeled data D L = { } and unlabeled data D U = {z i } Objective function: log-likelihood + negative entropy regularizer

Negative Entropy Regularizer Maximizing → Minimizing class overwraps = Targets are separated

Gradient of Entropy (1/2)

Gradient of Entropy (2/2)

3. Semi-Supervised CRFs Jiao et al. (2006) Given labeled data D L ={ } and unlabeled data D U = {z i } Objective function: log-likelihood + negative entropy regularizer

Application to NER Gene and protein identification A (labeled): 5448 words, B (unlabeled): 5210 words, C: words, D: words Self-training did not get any improvements A & BA & CA & D γ PRFPRFPRF

Results

4. Dynamic Programming for Semi-Supervised CRFs Mann et al. (2007) We have to compute: where y -t = and y -t ・ y =

Example Enumerate all y while fixing t -th state to y t If we can compute efficiently, we can compute the gradient His friend runs the company Noun Det Verb Adj

Decomposition of Entropy In the following, we use

Subsequence Constrained Entropy Computed from forward-backward probability Subsequence constrained entropy His friend runs the company Noun Det Verb Adj

Forward/Backward Subsequence Constrained Entropy His friend runs the company Noun Det Verb Adj

Dynamic Computation of H α H α can be computed incrementally Computed from forward-backward probability =

References Y. Grandvalet and Y. Bengio Semi-supervised learning by entropy minimization. In NIPS F. Jiao, S. Wang, C.-H. Lee, R. Greiner, and D. Schuurmans Semi-supervised conditional random fields for improved sequence segmentation and labeling. In COLING/ACL G. S. Mann and A. McCallum Efficient Computation of Entropy Gradient for Semi-Supervised Conditional Random Fields. In NAACL-HLT X. Zhu Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison.