Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

Slides:



Advertisements
Similar presentations
Background Knowledge for Ontology Construction Blaž Fortuna, Marko Grobelnik, Dunja Mladenić, Institute Jožef Stefan, Slovenia.
Advertisements

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Question Answering Based on Semantic Graphs Lorand Dali – Delia Rusu – Blaž Fortuna – Dunja Mladenić.
CSC411Artificial Intelligence 1 Chapter 3 Structures and Strategies For Space State Search Contents Graph Theory Strategies for Space State Search Using.
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
47 th Annual Meeting of the Association for Computational Linguistics and 4 th International Joint Conference on Natural Language Processing Of the AFNLP.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Sentence Classifier for Helpdesk s Anthony 6 June 2006 Supervisors: Dr. Yuval Marom Dr. David Albrecht.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.
Information Retrieval – and projects we have done. Group Members: Aditya Tiwari ( ) Harshit Mittal ( ) Rohit Kumar Saraf ( ) Vinay.
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Marko Grobelnik Jozef Stefan Institute ( Ljubljana, Slovenia.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
Understanding Natural Language
A Language Independent Method for Question Classification COLING 2004.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
An Intelligent Analyzer and Understander of English Yorick Wilks 1975, ACM.
Triplet Extraction from Sentences Lorand Dali Blaž “Jožef Stefan” Institute, Ljubljana 17 th of October 2008.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Chapter 11 Artificial Intelligence Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
Chapter 23: Probabilistic Language Models April 13, 2004.
1 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Automatic Functor Assignment (AFA) in the Prague Dependency Treebank PDT : –a long term.
Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature Deyu Zhou, Yulan He and Chee Keong Kwoh School of Computer Engineering.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Psychiatric document retrieval using a discourse-aware model Presenter : Wu, Jia-Hao Authors : Liang-Chih.
Supertagging CMSC Natural Language Processing January 31, 2006.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
Copyright © Curt Hill Other Trees Applications of the Tree Structure.
Marko Grobelnik, Janez Brank, Blaž Fortuna, Igor Mozetič.
What are the Command Words? Calculate Compare Complete Describe Evaluate Explain State, Give, Name, Write down Suggest Use information to…..
Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Text Summarization using Lexical Chains. Summarization using Lexical Chains Summarization? What is Summarization? Advantages… Challenges…
By Kyle McCardle.  Issues with Natural Language  Basic Components  Syntax  The Earley Parser  Transition Network Parsers  Augmented Transition Networks.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Computational Challenges in BIG DATA 28/Apr/2012 China-Korea-Japan Workshop Takeaki Uno National Institute of Informatics & Graduated School for Advanced.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
String Kernels on Slovenian documents Blaž Fortuna Dunja Mladenić Marko Grobelnik.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Approaches to Machine Translation
PRESENTED BY: PEAR A BHUIYAN
System for Semi-automatic ontology construction
PART IV: The Potential of Algorithmic Machines.
Sparsity Analysis of Term Weighting Schemes and Application to Text Classification Nataša Milić-Frayling,1 Dunja Mladenić,2 Janez Brank,2 Marko Grobelnik2.
Basic Intro Tutorial on Machine Learning and Data Mining
Zachary Cleaver Semantic Web.
Automatic Detection of Causal Relations for Question Answering
Approaches to Machine Translation
Semi-Automatic Data-Driven Ontology Construction System
Artificial Intelligence 2004 Speech & Natural Language Processing
Topic: Semantic Text Mining
Presentation transcript:

Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist. Prof. Dr. Dunja Mladenić Blaž Fortuna Marko Grobelnik Lorand DaliJune 2008

Location of the project in the field of Computer Science Artificial Intelligence Natural Language Processing Machine Learning

My father carries around the picture of the kid who came with his wallet.

Motivation of Triplet Extraction Advantages ◦ compact and simple representation of the information contained in a sentence ◦ avoids the complexity of a full parse ◦ contains semantic information Applications ◦ building the semantic graph of a document ◦ summarization ◦ question answering

Triplet Extraction – 2 Approaches Extraction from the parse tree of the sentence using heuristic rules ◦ OpenNLP – Treebank Parsetree ◦ Link Parser – Link Grammar (a type of dependency grammar) Extraction using Machine Learning ◦ Support Vector Machines (SVM) are used ◦ The SVM model is trained on human annotated data

Short review of SVM

Features of the triplet candidates Over 300 features depending on: Sentence ◦ length of sentence, number of words, etc Candidate ◦ context of Subj, Verb and Obj; ◦ distance between Subj, Verb, Obj Linkage ◦ number of links, of link types, nr of links from S, V, O Minipar ◦ depth, diameter, siblings, uncles, cousins, categories, relations Treebank ◦ depth, diameter, siblings, uncles, cousins, path to root, POS

Evaluation and Testing Training set = 700 annotated sentences Test set = 100 annotated sentences Compare the extracted triplets from a sentence to the annotated triplets from that same sentence Comparison is done according to a similaritry measure  [0, 1] between two triplets extracted to annotated => precision annotated to extracted => recall

Conclusions Triplet extraction using hand rules Triplet extraction using machine learning (SVM) Question answering system based on triplets

Questions

Triplet Similarity Measure SVO S’ V’O’ SubjSimVerbSimObjSim TrSim = (SubjSim + VerbSim + ObjSim) / 3 TrSim, SubjSim, VerbSim, ObjSim  [0, 1]

String Similarity Measure The way to success is under heavy construction The road to success is always under construction road successunderconstruction waysuccessunderheavyconstruction Sim = nMatch / maxLen = 3 / 5 = 0.6

Evaluating the extracted triplets Sentence Tr1 Tr2 Tr3 Tr1 Tr2 Precision Recall ExtractedGolden Standard

My father carries around the picture of the kid who came with his wallet.

Question Types Yes/No Questions List Questions Reason Questions Quantity Questions Location Questions Time Questions

Block Diagram of QA System Parse and determine question type Build Query Search Triplets Question Answer

If a listener nods his head while you're explaining your program; wake him up.