Exploiting Domain Structure for Named Entity Recognition Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao Wei Fan Jing JiangJiawei Han University of Illinois at Urbana-Champaign IBM T. J.
Advertisements

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
The Painter’s Feature Selection for Gene Expression Data Lyon, August 2007 Daniele Apiletti, Elena Baralis, Giulia Bruno, Alessandro Fiori.
Multi-Task Transfer Learning for Weakly- Supervised Relation Extraction Jing Jiang Singapore Management University ACL-IJCNLP 2009.
Feature/Model Selection by Linear Programming SVM, Combined with State-of-Art Classifiers: What Can We Learn About the Data Erinija Pranckeviciene, Ray.
A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois.
Multi-View Learning in the Presence of View Disagreement C. Mario Christoudias, Raquel Urtasun, Trevor Darrell UC Berkeley EECS & ICSI MIT CSAIL.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Transfer Learning for WiFi-based Indoor Localization
Final review LING572 Fei Xia Week 10: 03/13/08 1.
Zhu, Rogers, Qian, Kalish Presented by Syeda Selina Akter.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Scalable Text Mining with Sparse Generative Models
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
Topic Modeling with Network Regularization Qiaozhu Mei, Deng Cai, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Final review LING572 Fei Xia Week 10: 03/11/
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007.
Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.
Graphical models for part of speech tagging
Towards Improving Classification of Real World Biomedical Articles Kostas Fragos TEI of Athens Christos Skourlas TEI of Athens
Domain Adaptation in Natural Language Processing Jing Jiang Department of Computer Science University of Illinois at Urbana-Champaign.
University of Southern California Department Computer Science Bayesian Logistic Regression Model (Final Report) Graduate Student Teawon Han Professor Schweighofer,
Named Entity Recognition based on Bilingual Co-training Li Yegang School of Computer, BIT.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
Multiple Instance Real Boosting with Aggregation Functions Hossein Hajimirsadeghi and Greg Mori School of Computing Science Simon Fraser University International.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
MODEL ADAPTATION FOR PERSONALIZED OPINION ANALYSIS MOHAMMAD AL BONI KEIRA ZHOU.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
1 A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao and ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Domain Adaptation for Biomedical Information Extraction Jing Jiang BeeSpace Seminar Oct 17, 2007.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 龙星计划课程 : 信息检索 Course Summary ChengXiang Zhai ( 翟成祥 ) Department of.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
NTU & MSRA Ming-Feng Tsai
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Learning from Labeled and Unlabeled Data Tom Mitchell Statistical Approaches to Learning and Discovery, and March 31, 2003.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign RankFP : A Framework for Rank Formulation and Processing Hwanjo Yu, Seung-won.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Hierarchical Sampling for Active Learning Sanjoy Dasgupta and Daniel Hsu University of California, San Diego Session : Active Learning and Experimental.
Semi-Supervised Learning Jing xu. Slides From: Xiaojin (Jerry) Zhu ---An associate professor in the Department of Computer Sciences at University of Wisconsin-Madison.
BeeSpace Informatics Research
Cross Domain Distribution Adaptation via Kernel Mapping
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Discriminative Frequent Pattern Analysis for Effective Classification
Knowledge Transfer via Multiple Model Local Structure Mapping
Using Uneven Margins SVM and Perceptron for IE
Presentation transcript:

Exploiting Domain Structure for Named Entity Recognition Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign

2 Named Entity Recognition A fundamental task in IE An important and challenging task in biomedical text mining  Critical for relation mining  Great variation and different gene naming conventions

3 Need for domain adaptation Performance degrades when test domain differs from training domain Domain overfitting taskNE typestrain → testF1 newsLOC, ORG, PER NYT → NYT0.855 Reuters → NYT0.641 biomedicalgene, protein mouse → mouse0.541 fly → mouse0.281

4 Existing work Supervised learning  HMM, MEMM, CRF, SVM, etc. (e.g., [Zhou & Su 02], [Bender et al. 03], [McCallum & Li 03] ) Semi-supervised learning  Co-training [Collins & Singer 99] Domain adaptation  External dictionary [Ciaramita & Altun 05]  Not seriously studied

5 Outline Observations Method  Generalizability-based feature ranking  Rank-based prior Experiments Conclusions and future work

6 Observation I Overemphasis on domain-specific features in the trained model wingless daughterless eyeless apexless … fly “suffix –less” weighted high in the model trained from fly data Useful for other organisms? in general NO! May cause generalizable features to be downweighted

7 Observation II Generalizable features: generalize well in all domains  …decapentaplegic and wingless are expressed in analogous patterns in each primordium of… (fly)  …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues. (mouse)

8 Observation II Generalizable features: generalize well in all domains  …decapentaplegic and wingless are expressed in analogous patterns in each primordium of… (fly)  …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues. (mouse) “w i+2 = expressed” is generalizable

9 Generalizability-based feature ranking flymouseD3D3 DmDm … training data … -less … expressed … expressed … -less … expressed … -less … expressed … -less … s(“expressed”) = 1/6 = s(“-less”) = 1/8 = … expressed … -less … … …

10 Feature ranking & learning labeled training data supervised learning algorithm trained classifier... expressed … -less … top k features F

11 Feature ranking & learning labeled training data supervised learning algorithm trained classifier... expressed … top k features F

12 Feature ranking & learning labeled training data supervised learning algorithm trained classifier... expressed … -less … prior F logistic regression model (MEMM) rank-based prior variances in a Gaussian prior

13 Prior variances Logistic regression model MAP parameter estimation σ j 2 is a function of r j prior for the parameters

14 Rank-based prior variance σ 2 rank rr = 1, 2, 3, … a important features → large σ 2 non-important features → small σ 2

15 Rank-based prior variance σ 2 rank rr = 1, 2, 3, … a b = 6 b = 4 b = 2 a and b are set empirically

16 Summary D1D1 DmDm … training data O1O1 OmOm … individual domain feature ranking generalizability-based feature ranking O’ learning E testing entity tagger test data optimal b 1 for D 1 optimal b 2 for D 2 b = λ 1 b 1 + … + λ m b m λ 1, …, λ m rank-based prior optimal b m for D m rank-based prior

17 Experiments Data set  BioCreative Challenge Task 1B  Gene/protein recognition  3 organisms/domains: fly, mouse and yeast Experimental setup  2 organisms for training, 1 for testing  Baseline: uniform-variance Gaussian prior  Compared with 3 regular feature ranking methods: frequency, information gain, chi-square

18 Comparison with baseline ExpMethodPrecisionRecallF1 F+M→YBaseline Domain % Imprv.+3.2%+10.7%+7.1% F+Y→MBaseline Domain % Imprv.+1.9%+13.7%+9.2% M+Y→FBaseline Domain % Imprv.+1.4%+43.3%+35.5%

19 Comparison with regular feature ranking methods generalizability-based feature ranking information gain and chi-square feature frequency

20 Conclusions and future work We proposed  Generalizability-based feature ranking method  Rank-based prior variances Experiments show  Domain-aware method outperformed baseline method  Generalizability-based feature ranking better than regular feature ranking To exploit the unlabeled test data

21 Thank you! The end

22 How to set a and b? a is empirically set to a constant  b is set from cross validation  Let b i be the optimal b for domain D i  b is set as a weighted average of b i ’s, where the weights indicate the similarity between the test domain and the training domains