The Second International Chinese Word Segmentation Bakeoff Coordinated by Thomas Emerson.

Slides:



Advertisements
Similar presentations
Large-Scale Entity-Based Online Social Network Profile Linkage.
Advertisements

Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.
HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task Robert Dale, Ilya Anisimoff and George Narroway Centre for Language Technology.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.
Numbers
2006 3D Shape Retrieval Contest Remco Veltkamp, Utrecht University.
Academia Sinica, Taiwan 1/10 Argument Score Combination for Constituents Tzong-Han Tsai, Chia-Wei Wu, Yu- Chun Lin, and Wen-Lian Hsu Institute of Information.
The Third Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition Gina-Anne Levow Fifth SIGHAN Workshop July 22, 2006.
Name Extraction from Chinese Novels CS224n Spring 2008 Jing Chen and Raylene Yung.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Kuang Ru; Jinan Xu; Yujie Zhang; Peihao Wu Beijing Jiaotong University
NERIL: Named Entity Recognition for Indian FIRE 2013.
Welcome to the Display Advertising Webinar Learn How Display Can Impact Your Search Business Follow us on Follow this webinar #msdisplaywebinar.
S1316 – The Malignant Bowel Obstruction Study Forms and Procedures Katie Arnold, MS SWOG Statistical Center Seattle, WA 10/24/2014S1316 Training1.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
Researcher affiliation extraction from homepages I. Nagy, R. Farkas, M. Jelasity University of Szeged, Hungary.
Screencap Yahoo! HK SEM Discuss HK eNewsletter.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Evaluation CSCI-GA.2590 – Lecture 6A Ralph Grishman NYU.
Spatio-Temporal Analysis of Multimodal Speaker Activity Guillaume Lathoud, IDIAP Supervised by Dr Iain McCowan, IDIAP.
ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer 1 Ting-Hao (Kenneth) Huang Yun-Nung (Vivian) Chen Lingpeng Kong
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
Welcome to Digital Cookie. Welcome to Digital Cookie! At this point in your Digital Cookie experience: o A parent/caregiver has registered you to participate.
Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification 黃居仁 Chu-Ren Huang Academia Sinica
Experiments of Opinion Analysis On MPQA and NTCIR-6 Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield.
ETISEO Evaluation Nice, May th 2005 Evaluation Cycles.
Mebi 591D – BHI Kaggle Class Baselines kaggleclass.weebly.com/
1 Richard Tzong-Han Tsai, Po-Ting Lai, Hong-Jie Dai, Chi-Hsin Huang,Yue-Yang Bow Yen-Ching Chang,Wen-Harn Pan, Wen-Lian Hsu HypertenGene: Extracting key.
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
PASCAL P ASCAL C HALLENGE ON I NFORMATION E XTRACTION & M ACHINE L EARNING Neil Ireson Local Challenge Coordinator Web Intelligent Group Department of.
ETISEO Project Evaluation for video understanding Nice, May th 2005 Evaluation du Traitement et de l’Interprétation de Séquences vidEO.
Thomas Mandl: GeoCLEF Track Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.
aastocks.com – Homepage superbanner 728x90 aastocks.com - Run of News 300x250.
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
Professional Development through On-line CoPs: A Case Study of EFL Teachers in China Ping Wang 17 August 2007.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
UC-12 L1 Flight Track over Ground Sites.
A service of the U.S. National Institutes of Health Module 2: Searching and Interpreting Results Posted on ClinicalTrials.gov.
Meeting Hosted by CNAO Beijing, April 12 – 13, 2010.
7.5 – Radical Expressions. Radical Radical – radical symbol.
WePS2 Attribute Extraction Task Sekine and Artiles WWW 2009 Workshop.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.
10-1 人生与责任 淮安工业园区实验学校 连芳芳 “ 自我介绍 ” “ 自我介绍 ” 儿童时期的我.
1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.
YAHOO TECHNICAL SUPPORT SERVICE Welcome to you.
Configuring My Elected Representative and Election Results in Your Community Lindsay Thomas.
With the support of the Erasmus+ Programme of the European Union
Discussion on CgTLDs Che-Hoo Cheng HKNIC.
Collective Network Linkage across Heterogeneous Social Platforms
Spoken Dialog System.
Yahoo Mail Customer Support Number
TECHjOSH.COM TechJosh.com.
Most Effective Techniques to Park your Manual Transmission Car
How do Power Car Windows Ensure Occupants Safety
Time to Reach New Heights
ريكاوري (بازگشت به حالت اوليه)
GroupNet for Plan Members SIMPLIFIED RE-REGISTRATION
THANK YOU!.
مديريت موثر جلسات Running a Meeting that Works
Thank you.
Thank you.
Perceptron Learning for Chinese Word Segmentation
1098Ts – A Panel Discussion Tom Roth - Stockton State University
Core Indicators: Annual Reporting Exercise 2010
For More Details:
Report 7 Brandon Silva.
TrueNTH A big thank you to everyone who is involved in the TrueNTH Global Registry study. We are on target to reach our current recruitment target of.
Presentation transcript:

The Second International Chinese Word Segmentation Bakeoff Coordinated by Thomas Emerson

Roadmap Contest Details –Corpora, Tracks, and Sites Results –Baselines and Measures Discussion Thanks

Corpora Four Corpora: 2 simplified chars, 2 traditional All provide ground truth and segmentation standard

Tracks and Sites Two tracks: –Open: Participants may use any data to train External lexica, POS information, etc –Closed: Sites may ONLY use training data set 23 Participating sites completed bakeoff –9 PRC, 4 HK, 4 US, 2 TW, 1 GB, 1 JP, 1 SG 130 runs submitted

Results Baseline: L-to-R MaxMatch w/training vocab: Topline: L-to-R MaxMatch w/test truth vocab: 0.99 Measures: Recall, Precision, F-measure –Recall on OOV, Recall on in-vocab Best F-score: Open 0.972, median –Best closed: (on MSR corpus) –Best OOV recall: Open 0.872; Closed Vs 2003: best F-score: 0.961: now 17 reach

Results AS Closed: NAIST, Stanford AS Open: SG, Yahoo!, Sheffield MSR Closed Stanford, UHK, Yahoo! MSR Open: Harbin, SG, UHK

Thanks & Future Thanks to participants and providers –Academia Sinica, ICL Beijing, CUHK,MSRA Future Bakeoffs: –Different training/test registers? –Additional tasks? NER? –Suggestions?