Exploiting Domain Structure for Named Entity Recognition Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign
2 Named Entity Recognition A fundamental task in IE An important and challenging task in biomedical text mining Critical for relation mining Great variation and different gene naming conventions
3 Need for domain adaptation Performance degrades when test domain differs from training domain Domain overfitting taskNE typestrain → testF1 newsLOC, ORG, PER NYT → NYT0.855 Reuters → NYT0.641 biomedicalgene, protein mouse → mouse0.541 fly → mouse0.281
4 Existing work Supervised learning HMM, MEMM, CRF, SVM, etc. (e.g., [Zhou & Su 02], [Bender et al. 03], [McCallum & Li 03] ) Semi-supervised learning Co-training [Collins & Singer 99] Domain adaptation External dictionary [Ciaramita & Altun 05] Not seriously studied
5 Outline Observations Method Generalizability-based feature ranking Rank-based prior Experiments Conclusions and future work
6 Observation I Overemphasis on domain-specific features in the trained model wingless daughterless eyeless apexless … fly “suffix –less” weighted high in the model trained from fly data Useful for other organisms? in general NO! May cause generalizable features to be downweighted
7 Observation II Generalizable features: generalize well in all domains …decapentaplegic and wingless are expressed in analogous patterns in each primordium of… (fly) …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues. (mouse)
8 Observation II Generalizable features: generalize well in all domains …decapentaplegic and wingless are expressed in analogous patterns in each primordium of… (fly) …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues. (mouse) “w i+2 = expressed” is generalizable
9 Generalizability-based feature ranking flymouseD3D3 DmDm … training data … -less … expressed … expressed … -less … expressed … -less … expressed … -less … s(“expressed”) = 1/6 = s(“-less”) = 1/8 = … expressed … -less … … …
10 Feature ranking & learning labeled training data supervised learning algorithm trained classifier... expressed … -less … top k features F
11 Feature ranking & learning labeled training data supervised learning algorithm trained classifier... expressed … top k features F
12 Feature ranking & learning labeled training data supervised learning algorithm trained classifier... expressed … -less … prior F logistic regression model (MEMM) rank-based prior variances in a Gaussian prior
13 Prior variances Logistic regression model MAP parameter estimation σ j 2 is a function of r j prior for the parameters
14 Rank-based prior variance σ 2 rank rr = 1, 2, 3, … a important features → large σ 2 non-important features → small σ 2
15 Rank-based prior variance σ 2 rank rr = 1, 2, 3, … a b = 6 b = 4 b = 2 a and b are set empirically
16 Summary D1D1 DmDm … training data O1O1 OmOm … individual domain feature ranking generalizability-based feature ranking O’ learning E testing entity tagger test data optimal b 1 for D 1 optimal b 2 for D 2 b = λ 1 b 1 + … + λ m b m λ 1, …, λ m rank-based prior optimal b m for D m rank-based prior
17 Experiments Data set BioCreative Challenge Task 1B Gene/protein recognition 3 organisms/domains: fly, mouse and yeast Experimental setup 2 organisms for training, 1 for testing Baseline: uniform-variance Gaussian prior Compared with 3 regular feature ranking methods: frequency, information gain, chi-square
18 Comparison with baseline ExpMethodPrecisionRecallF1 F+M→YBaseline Domain % Imprv.+3.2%+10.7%+7.1% F+Y→MBaseline Domain % Imprv.+1.9%+13.7%+9.2% M+Y→FBaseline Domain % Imprv.+1.4%+43.3%+35.5%
19 Comparison with regular feature ranking methods generalizability-based feature ranking information gain and chi-square feature frequency
20 Conclusions and future work We proposed Generalizability-based feature ranking method Rank-based prior variances Experiments show Domain-aware method outperformed baseline method Generalizability-based feature ranking better than regular feature ranking To exploit the unlabeled test data
21 Thank you! The end
22 How to set a and b? a is empirically set to a constant b is set from cross validation Let b i be the optimal b for domain D i b is set as a weighted average of b i ’s, where the weights indicate the similarity between the test domain and the training domains