Download presentation
Presentation is loading. Please wait.
1
The Third Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition Gina-Anne Levow Fifth SIGHAN Workshop July 22, 2006
2
Roadmap Bakeoff Task Motivation Bakeoff Structure: Materials and annotations Tasks and conditions Participants and timeline Results & Discussion: Word Segmentation Named Entity Recognition Observations & Conclusions Thanks
3
Bakeoff Task Motivation Core enabling technologies for Chinese language processing Word segmentation (WS) Crucial tokenization in absence of whitespace Supports POS tagging, parsing, ref. resolution, etc Fundamental challenges: “Word” not well, consistently defined; humans disagree Unknown words impede performance Named Entity Recognition (NER) Essential for reference resolution, IR, etc Common class of new unknown words
4
Data Source Characterization Five corpora, providers Annotation guidelines available, varied Simplified and traditional characters Range of encodings, all available in Unicode (UTF-8) Provided in common XML, converted to train/test form (LDC)
5
Tasks and Tracks Tasks: Word Segmentation: Training and truth: whitespace delimited End-of-word tags replaced with space, no others Named Entity Recognition: Training and truth: Similar to Co-NLL 2-column NAMEX only: LOC, PER, ORG (LDC: +GPE) Tracks: Closed: Only provided materials may be used Open: Any materials may be used, but must document
6
Structure: Participants &Timeline Participants: 29 sites submitted runs for evaluation (36 init) 144 runs submitted: ~2/3 WS; 1/3 NER Diverse groups: 11 PRC, 7 Taiwan, 5 US, 2 Japan, 1each: Singapore, Korea, Hong Kong, Canada Mix of Commercial: MSRA, Yahoo!, Alias-I, FR Telecom, etc- and Academic sites Timeline: March 15: Registration open April 17: Training data released May 15: Test data released May 17: Results due
7
Word Segmentation: Results Contrasts: Left-to-right maximal match Baseline: Uses only training vocabulary Topline: Uses only testing vocabulary SourceRecallPrecF-scoreOOVRoovRiv CITYU0.930.8820.9060.0490.0090.969 CKIP0.9150.870.8920.0420.030.954 MSRA0.9490.90.9240.0340.0220.981 UPUC0.8690.790.8280.0880.0110.951 SourceRecallPrecF-ScoreOOVRoovRiv CITYU0.9820.9850.9840.040.9930.981 CKIP0.980.9870.9830.0420.9970.979 MSRA0.9910.9930.9920.0340.9990.991 UPUC0.9610.9760.9680.0880.9890.958
8
Word Segmentation: CityU SiteRunIDRPFRoovRiv 15D0.9730.972 0.7870.981 15B0.9730.972 0.7870.981 200.9720.971 0.7920.979 320.9690.970 0.7730.978 CityU Closed SiteRunIDRPFRoovRiv 20 0.9780.977 0.840.984 32 0.9790.9760.9770.8130.985 340.9710.9670.9690.7950.978 220.9700.9650.9670.7610.979 CityU Open
9
Word Segmentation: CKIP SiteRunIDRPFRoovRiv 200.9610.9550.9580.7020.972 15A0.9610.9530.9570.6580.974 15B0.9610.9520.570.6560.974 320.9580.9480.9530.6460.972 SiteRunIDRPFRoovRiv 200.9640.9550.9590.7040.975 34 0.9590.9490.9540.6720.972 32 0.9580.9480.9530.6470.972 2A0.9530.9460.9490.6790.965 CKIP Closed CKIP Open
10
Word Segmentation: MSRA SiteRunIDRPFRoovRiv 320.9640.9610.9630.6120.976 260.9610.9530.9570.4990.977 9 0.9590.9550.9570.4940.975 1A0.9550.956 0.6500.966 SiteRunIDRPFRoovRiv 11A0.9800.9780.9790.8390.985 11B0.9770.9760.9770.8400.982 14 0.9750.9760.9750.8110.981 320.9770.9710.9740.6750.988 MSRA Closed MSRA Open
11
Word Segmentation: UPUC SiteRunIDRPFRoovRiv 200.9400.9260.9330.7070.963 32 0.9360.9230.9300.6830.961 1A0.9400.9140.9270.6340.969 26A0.9360.9170.9260.6170.966 SiteRunIDRPFRoovRiv 340.9490.9390.9440.7680.966 20.9420.9280.9350.7110.964 20 0.9400.9270.9330.7410.959 70.9440.9220.9330.6800.970 UPUC Closed UPUC Open
12
Word Segmentation: Overview F-scores: 0.481-0.797 Best score: MSRA Open Task (FR Telecom) Best relative to topline: CityU Open: >99% Most frequent top rank: MSRA Both F-scores and OOV recall higher in Open Overall good results: Most outperform baseline
13
Word Segmentation: Discussion Continuing OOV challenges Highest F-scores on MSRA Also highest top and base lines Lowest OOV rate Lowest F-scores on UPUC Also lowest top and baselines Highest OOV rate (> double all other OOV) Smallest corpus (~1/3 MSRA) Best scores: most consistent corpus Vocabulary, annotation UPUC also varies in genre: train: CTB; test: CTB,NW,BN
14
NER Results Contrast: Baseline Label as Named Entity if unique tag in training SourcePRFPER-FORG-FLOC-FGPE-F CITYU0.6110.4670.5290.5870.5160.503N/A LDC0.4930.3780.4280.3950.290.2590.539 MSRA0.590.4880.5340.6140.4690.531N/A
15
NER Results: CityU SitePRFORG-FLOC-FPER-F 30.9140.8670.890.8050.9210.909 190.920.8540.8860.8050.9250.887 21a0.9270.8470.8850.7970.920.89 21b0.9240.8490.8850.7980.9240.892 SitePRFORG-FLOC-FPER-F 60.8690.7490.8050.680.860.81 CityU Closed CityU Open
16
NER Results: LDC SitePRFORG-FLOC-FPER-F 70.76160.6620.7080.5210.2860.742 6-gpe-loc0.6720.6550.6640.4550.7080.742 60.3060.2980.3020.4550.0370.742 SitePRFORG-FLOC-FPER-F 30.8030.7260.7630.6580.3050.788 80.8140.5940.6880.5850.1700.657 LDC Closed LDC Open
17
NER Results: MSRA SitePRFORG-FLOC-FPER-F 140.8890.8420.8650.8310.8540.901 21a0.9120.8170.8620.820.9050.826 21b0.8840.8290.8560.770.9010.849 30.8810.8230.8510.8150.9060.794 SitePRFORG-FLOC-FPER-F 100.9220.9020.9120.8590.9030.960 140.9080.8920.8990.840.910.926 11b0.8770.8750.8760.7610.8970.922 11a0.8640.840.8520.6940.8740.92 MSRA Closed MSRA Open
18
NER: Overview Overall results: Best F-score: MSRA Open Track: 0.91 Strong overall performance: Only two results below baseline Direct comparison of NER Open vs Closed Difficult: only two sites performed both tracks Only MSRA had large numbers of runs Here Open outperformed Closed: top 3 Open > Closed
19
NER Observations Named Entity Recognition challenges Tagsets, variation, and corpus size Results on MSRA/CityU much better than LDC LDC corpus substantially smaller Also larger tagset: GPE GPE easily confused for ORG or LOC NER results sensitive to corpus size, tagset, genre
20
Conclusions & Future Challenges Strong, diverse participation in WS & NER Many effective competitive results Cross-task, cross-evaluation comparisons Still difficult Scores sensitive to corpus size, annotation consistency, tagset, genre, etc Need corpus, config-independent measure of progress Encourage submissions that support comparisons Extrinsic, task-oriented evaluation of WS/NER Continuing challenges: OOV, annotation consistency, encoding combinations and variation, code-switching
21
Thanks Data Providers: Chinese Knowledge Information Processing Group, Academia Sinica, Taiwan: Keh-Jiann Chen, Henning Chiu City University of Hong Kong: Benjamin K.Tsou, Olivia Oi Yee Kwong Linguistic Data Consortium: Stephanie Strassel Microsoft Research Asia: Mu Li University of Pennsylvania/University of Colorado: Martha Palmer, Nianwen Xue Workshop co-chairs: Hwee Tou Ng and Olivia Oi Yee Kwong All participants!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.