CLiNG - May 24 2002 Overview of Research - Computational Terminology - Knowledge extraction from Text - Study of causal relation - Corpus building - Uncertainty.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop November 2007.
TU/e technische universiteit eindhoven Hera: Development of Semantic Web Information Systems Geert-Jan Houben Peter Barna Flavius Frasincar Richard Vdovjak.
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
From Words to Meaning to Insight Julia Cretchley & Mike Neal.
Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
MP IP Strategy Stateye-GUI Provided by Edotronik Munich, May 05, 2006.
Automatic indexing and retrieval of crime-scene photographs Katerina Pastra, Horacio Saggion, Yorick Wilks NLP group, University of Sheffield Scene of.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Chapter 6: Design of Expert Systems
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Help and Documentation zUser support issues ydifferent types of support at different times yimplementation and presentation both important yall need careful.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Designing Help… Mark Johnson Providing Support Issues –different types of support at different times –implementation and presentation both important.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 1: Introduction to Decision Support Systems Decision Support.
Digital Library Service Integration (DLSI) --> Looking for Collections and Services to be DLSI Testbeds
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Building Knowledge-Driven DSS and Mining Data
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Yuliya Morozova Institute for Informatics Problems of the Russian Academy of Sciences, Moscow.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
Knowledge Representation and Semantic Capturing Albena Strupchanska Linguistic Modelling Department, Institute for Parallel Processing, Bulgarian Academy.
Overview of the Database Development Process
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
Intelligent Database Systems Lab Presenter: WU, JHEN-WEI Authors: Rodrigo RizziStarr, Jose´ Maria Parente de Oliveira IS Concept maps as the first.
Academic Needs of L2/Bilingual Learners
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
Data Mining By Dave Maung.
Databases: An Overview Chapter 7, Exploring the Digital Domain.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Chap#11 What is User Support?
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
1 Knowledge Acquisition and Learning by Experience – The Role of Case-Specific Knowledge Knowledge modeling and acquisition Learning by experience Framework.
MedKAT Medical Knowledge Analysis Tool December 2009.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Performance Comparison of Speaker and Emotion Recognition
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
1 Dictionary priorities, e- dictionaries of compounds, morphological mode Cvetana Krstev & Duško Vitas.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Expanding the Notion of Links DeRose, S.J. Expanding the Notion of Links. In Proceedings of Hypertext ‘89 (Nov. 5-8, Pittsburgh, PA). ACM, New York, 1989,
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Rule-based Reasoning in Semantic Text Analysis
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Extracting Semantic Concept Relations
Chapter 11 user support.
LINGUA INGLESE 2A – a.a. 2018/2019 Computer-Aided Translation Technology LESSON 3 prof. ssa Laura Liucci –
Presentation transcript:

CLiNG - May Overview of Research - Computational Terminology - Knowledge extraction from Text - Study of causal relation - Corpus building - Uncertainty - Computer Assisted Language Learning (CALL) - Interdisciplinary project on French Second Language - Text understanding - From speech to sentence

CLiNG - May SeRT - a tool for knowledge extraction from text Caroline Barrière School of Information Technology and Engineering University of Ottawa Ottawa, Ontario, Canada

CLiNG - May A few questions... - Why knowledge extraction from text? For building a Knowledge Base... - What’s a Knowledge Base? It depends who defines it From a terminological standpoint: A static repository of domain-specific knowledge, giving the important concepts and their relations. - What kind of relations? Hyperonymy (is-a), meronymy (part-of), synonymy, function, definition, causality - Why start from text? What are the alternatives?

CLiNG - May Semantic Relations in Text (SeRT) - Goal : Starting from a corpus of texts on a specific domain, capture and store the important concepts (terms) of that domain, as well as their relations. - Hypothesis - definitions can be derived from text analysis - text is used as language and meta-language - paradigmatic relations can be found in texts by pattern search - present knowledge representation formalism allow the representation of this information

CLiNG - May Example of a pattern search for hyperonymy (Corpus on Composting )

CLiNG - May SeRT - Features - parallel search of terms and relations - term extraction - search for surface patterns leading to semantic relations - focus on user interaction (nothing fully automatic) - term selection and validation - user definition of surface patterns corresponding to semantic relations - user selection of concepts involved (tuple) in the semantic relation - raw text used (no preprocessing necessary) - easy access to KB : save and retrieval - to be used in “bootstrapping” mode

CLiNG - May Term extraction - Usage of a stop list a, able, about, above, according, accordingly, across, actually … - appropriate method for English (but maybe not for French) satellite link - liaison par satellite laser printer - imprimante au laser communication network - réseau de communication - no syntactic analysis - different from: Daille 1994: linguistic patterns (French) Bourigault 1994: morpho-syntactic markers (French) - lemmatization 'moving quickly'  ‘mov[ing] quick[ly]  'mov* quick*

CLiNG - May Results - Corpus on “composting” - Terms 503compost 373pile 258composting 202soil 170materials 155material 142nitrogen 110compost pile 103water 102bin 100time 92leaves 83bacteria 402compost 369pile 199soil 187composting 149material 146materials 133nitrogen 105compost pile 102bin 96time 95water 94Compost 85leaves 402compost 369pile 295materi* 260compost* 199soil 133nitrogen 105compost pile 105temperatur* 102bin 96time 95leav* 95water 94Compost

CLiNG - May

Search for patterns indicating semantic relations - pre-encoded patterns (earlier work - Barrière 1997) - find list from all other authors - pattern search has multiple possibilities: - string matching - lemmatized token matching - part of speech matching - inclusion of a dictionary look-up (derived from Collins + morphological rules added) - possibility of searching for a pattern around 1 term - usually what Computational Terminologists want to do - display limited or enlarged context

CLiNG - May Example of search patterns Hyperonymy such as (string matching) and other *|n (string + POS) includ* *|n (lemmatized string + POS) *|n is a *|a of [~part] (negative filter) *|y organic materi* [mostly, especially, specifically] (positive filter) + (search with specific term) Synonymy known as (string matching) also called (string matching) Meronymy contains *|n (string + POS) is a *|a part of (string + POS)

CLiNG - May

Information storage in the TKB - transfer of info found at previous step - user selects the terms (concepts) around the pattern - semantic relation / pattern / tuple are stored in the TKB - an uncertainty factor can also be added to the tuple - research on causal relation has lead to realize the necessity of this information - applies to different relations

CLiNG - May Semantic relation extraction

CLiNG - May Results - semantic relations - Exploration of a few patterns - contain? (meronymy) - such as & and other (hypernymy)

CLiNG - May

Could we infer is-a relations and extend the type hierarchy?

CLiNG - May SeRT use - Parallel mode - searching on patterns can suggest terms to be explored - search on terms can suggest patterns around them - Bootstrapping mode for relations - start with one pattern: enhance - tuplet compost/soil found used to find other patterns

CLiNG - May

Future work Short term (tool itself) - Add list of predefined relations & patterns - Add flexibility in pattern search - toward a mix of semantic and syntactic search - Construction of a graphical representation of the semantic network built

CLiNG - May Future work Long term (tool + theoretical background) - Work on compound nouns - much implicit information that could be put explicitly in the KB - Work on representational scheme - the relational database is too limiting - causal relation requires a different type of representation - contexts for expressing the relation (possibly nested) - uncertainty factors - inferencing - Explore pattern search in French - Batch mode extraction (no user) - automatic selection of terms around patterns - after certain terms and patterns have been identified - need an integration of confidence levels on patterns