Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung Ching-Long Yeh

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Natural Language Processing Syntax. Syntactic structure John likes Mary PN VtVt NP VP S DetPNVtVt NP VP S Every man likes Mary Noun.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Natural Language Processing Lecture 2: Semantics.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Natural Language Processing - Feature Structures - Feature Structures and Unification.
1 Pertemuan 23 Syntatic Processing Matakuliah: T0264/Intelijensia Semu Tahun: 2005 Versi: 1/0.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Natural Language Processing AI - Weeks 19 & 20 Natural Language Processing Lee McCluskey, room 2/07
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
C SC 620 Advanced Topics in Natural Language Processing 3/9 Lecture 14.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Introduction to Natural Language Generation
9/8/20151 Natural Language Processing Lecture Notes 1.
CCSB354 ARTIFICIAL INTELLIGENCE (AI)
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Monday Read chapter 23, sections 1-2 FOIL exercise due.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Artificial intelligence project
1 Natural Language Processing Gholamreza Ghassem-Sani Fall 1383.
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
PS: Introduction to Psycholinguistics Winter Term 2005/06 Instructor: Daniel Wiechmann Office hours: Mon 2-3 pm Phone:
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Grammars Grammars can get quite complex, but are essential. Syntax: the form of the text that is valid Semantics: the meaning of the form – Sometimes semantics.
Rules, Movement, Ambiguity
Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.
Data Mining: Text Mining
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
1 SWE Introduction to Software Engineering Lecture 14 – System Modeling.
NATURAL LANGUAGE PROCESSING
Basics of Natural Language Processing Introduction to Computational Linguistics.
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Natural Language Processing (NLP)
CS : Speech, NLP and the Web/Topics in AI
CSE 635 Multimedia Information Retrieval
CS246: Information Retrieval
Natural Language Processing (NLP)
Artificial Intelligence 2004 Speech & Natural Language Processing
Natural Language Processing (NLP)
Presentation transcript:

Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung Ching-Long Yeh NLP One of the Top Priority Funding Items in Computer Science Research -- National Natural Science Foundation, China

Language Listen (Understand) Speak (Generate)

Natural Language Internal Representations Generation Analysis/ Understanding Natural Language Processing

Outline of Presentation NLP IntroductionNLP Introduction – Natural Language Analysis/Understanding – Natural Language Generation Case 1: Verbatim Text CodingCase 1: Verbatim Text Coding – May need NL analysis techniques Case 2: Data Mining Report GenerationCase 2: Data Mining Report Generation – May need NL generation techniques

Pre-processing Tokens Parsing Syntactic structure Semantic Interpretation Semantic representation Contextual Interpretation Knowledge representation Input sentence Modules of NL Understanding

Parsing for Syntactic Analysis Grammar Rules: S NP VP NP + VP ART + N V + NP Lexicon: N N V ART dog cat chased the

s NPVP ARTNVNP dogchasedthecat ARTN the Syntactic Structure

Structural Ambiguity Time flies like an arrow.Time flies like an arrow. The passage of time is as quick as an arrow.The passage of time is as quick as an arrow. A species of flies called ‘time flies’ enjoy an arrow.A species of flies called ‘time flies’ enjoy an arrow.

Structural Ambiguity The man saw the girl with telescope.The man saw the girl with telescope. The man saw the girl who possessed the telescope.The man saw the girl who possessed the telescope. The man saw the girl with the aid of the telescope.The man saw the girl with the aid of the telescope.

User’s Goal Surface Sentences Strategic Component Tactical Component Domain KB Planning Operators User Model Discourse Model Linguistic Rules & Lexicon Text Planning Linguistic Realization Natural Language Generation

Unification Grammar the man sees a sheep the man sees a sheep S [numb=X, tense=T] NP [numb=X] VP [numb=X, tense=T] VP[numb=N,tense=M] V [numb=N, tense=M] NP NP [numb=Y] det [numb = Y] noun [numb = Y] man: noun [numb = sing] a:det [numb = sing] the: det sheep:noun sees: [tense = pres, numb = sing]

Migraine abortive treatment is used to abort migraine. ((cat clause) (process ((lex “use”) (type material))) (partic ((affected ((cat proper) (lex “migraine abortive treatment”))) (agent none))) (circum ((purpose ((cat clause) (keep-in-order no) (keep-for no) (position end) (process ((lex “abort”) (effect-type creative) (type material))) (partic ((created ((lex “migraine”) (countable no) (cat common)))))))))))

Verbatim Text Coding A text content classification problem.A text content classification problem. Group semantically similar answer items.Group semantically similar answer items. Develop a code list/tree to represent the answer item groups.Develop a code list/tree to represent the answer item groups. Simple NL analysis techniques may help.Simple NL analysis techniques may help. Details will be given in the first example of NLP application.Details will be given in the first example of NLP application.

Data Mining Report Generation Data mining results are usually in rule or tree formats with obscure notations.Data mining results are usually in rule or tree formats with obscure notations. NL generation techniques may help translate the data mining results into plain natural languages.NL generation techniques may help translate the data mining results into plain natural languages. Details will be given in the second example of NLP application.Details will be given in the second example of NLP application.

Codia for Verbatim Text Coding Answer ItemsCode Tree Small screen/window/text Long list of answer items Long list of answer items Difficult to browse/view Difficult to browse/view Worse than paper form Worse than paper form

Codia for Verbatim Text Coding Key Terms

Ranking Answers by Similarity Items with similar meaning

Text Similarity Measures String SemanticsCoverage Text Similarity Score

Codia for Verbatim Text Coding A user-interface for classifying answer items by drag-and-drop actions.A user-interface for classifying answer items by drag-and-drop actions. NLP reduces time and effort in searching, browsing, and selecting multiple answer items for classification.NLP reduces time and effort in searching, browsing, and selecting multiple answer items for classification. There’s still limitations and not fully automated.There’s still limitations and not fully automated.

Technical Issues of Codia Improve user-interface.Improve user-interface. Use only simple NLP techniques.Use only simple NLP techniques. Ambiguity resolution by human.Ambiguity resolution by human. Limited by thesaurus.Limited by thesaurus. Still cannot handle negatives ‘Not’.Still cannot handle negatives ‘Not’. Knowledge engineering is tedious.Knowledge engineering is tedious.

Limitations and Future Improvements Thesaurus has only 60,000 terms classified into 3900 semantic categories.Thesaurus has only 60,000 terms classified into 3900 semantic categories. Manual operation (ambiguity resolution relies on human).Manual operation (ambiguity resolution relies on human). Similarity measures are too mechanical.Similarity measures are too mechanical. Need to update and incorporate frequently used terms/categories. Towards automation by using more AI such as NLP, GA and NN. More adaptive by rule-based or case- based reasoning.

Data Mining and Knowledge Discovery Patterns Knowledge Data Data Mining Interpretation Knowledge Discovery

If q12 = 4 and q31 = 6 and q35 = 3 then q38 = 3

If h/h_income = 4 and city = 6 and car_owner = 3 then user = 3

say(feature,[r1]).

The segment of respondents who are product X users is characterized by residence in Shanghai, consumption of brand Y cigarettes, overseas travel in the past twelve months, ownership of imported cars, and high monthly household income. r1 say(feature, [r1]).

say(general,[r1]).say(likely,[r1]).say(reason,[r1]).

Basically, the respondents who are product X users have residence in Shanghai, consumption of brand Y cigarettes, overseas travel in the past twelve months, ownership of imported cars, and high monthly household income. r1 say(general, [r1]).

The respondents who are product X users because they have residence in Shanghai, consumption of brand Y cigarettes, overseas travel in the past twelve months, ownership of imported cars, and high monthly household income. r1 say(reason, [r1]).

It is likely that the people who have residence in Shanghai, consumption of brand Y cigarettes, overseas travel in the past twelve months, ownership of imported cars, and high monthly household income are product X users are product X users. r1 say(likely, [r1]).

Limitations and Future Improvements Pre-defined syntactic category of code labels.Pre-defined syntactic category of code labels. Single sentence for each rule.Single sentence for each rule. Lack visualization.Lack visualization. Almost no text planning.Almost no text planning. English only.English only. Lack knowledge of explanation.Lack knowledge of explanation. Automatic recognition of the syntax. Describe rule relationship in multiple coherent sentences. Text + graphics or even multimedia generation. Implement text planning. Multilingual. Implement NL techniques for explanation.

Concluding Remarks NLP techniques are found useful in:NLP techniques are found useful in: – Verbatim text coding and – Data mining report generation. Group similar answer items.Group similar answer items. Write simple natural language text.Write simple natural language text. A pricey technology because few tools are available.A pricey technology because few tools are available.

Natural Language Processing Josef Siu-Wai Leung Ching-Long Yeh