Download presentation
Presentation is loading. Please wait.
Published byLogan Little Modified over 11 years ago
1
1 Noisy Text Analytics: An Exercise in Futility? Hwee Tou Ng Department of Computer Science National University of Singapore 8 Jan 2007
2
2 Noisy Text Analytics: An Exercise in Futility?
3
3 Sources of Noisy Text Traditional sources –Automatically transcribed text from speech –Automatically OCRed text from image
4
4 Sources of Noisy Text More recent sources from the Web –Blogs, wikis, message boards, online chats, SMS, etc. –User generated content
5
5 Sources of Noisy Text More recent sources from the Web –Blogs, wikis, message boards, online chats, SMS, etc. –User generated content –Informal text »Acronyms, abbreviations, specialized vocabulary »Sublanguage, sub-community
6
6 Importance The rise of social media (Web 2.0) –Commercial, economic interest
7
7 Importance ACL SIGWAC (Special Interest Group on the Web as Corpus, Association for Computational Linguistics) –CLEANEVAL (shared task and competition for web corpus cleaning)
8
8 Noisy Text Analytics: An Exercise in Futility?
9
9 An Exercise in Futility? Necessity is the mother of invention!
10
10 Noisy Text Analytics: An Exercise in Futility?
11
11 What is Analytics? American Heritage Dictionary –The branch of logic dealing with analysis Merriam-Websters Online Dictionary –The method of logical analysis
12
12 Analytics Approach #1 –Eliminate the noise in noisy text (text normalization), followed by processing the text as per normal »Noise: Misspelled words, wrongly cased words, wrong sentence and paragraph boundaries –Examples: »Table recognition Learning to Recognize Tables in Free Text, H T Ng, C Y Lim, J L T Koo, ACL 1999
13
13 Table Recognition
14
14 Table Recognition
15
15 Table Recognition
16
16 Analytics Approach #2 –Process the noisy text as is directly –Examples: »Upper case text (e.g., speech recognizer output) Teaching a Weaker Classifier: Named Entity Recognition on Upper Case Text, H L Chieu, H T Ng, ACL 2002 »Semi-structured text (e.g., seminar announcements, job advertisements) A Maximum Entropy Approach to Information Extraction from Semi-Structured and Free Text, H L Chieu, H T Ng, AAAI 2002
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.