Download presentation
Presentation is loading. Please wait.
Published byBrian Gray Modified over 10 years ago
1
Pushpak Bhattacharyya CSE Dept. IIT Bombay 19 May, 2014
2
The goal of the project is to develop Next Generation Information Extraction Technology The IE environment will be multi lingual Involves Machine Translation and Cross Lingual Search The IE focus is on relation extraction, named entity extraction, multi word extraction, semantic role labeling, corpus management Relation and name extraction are to be jointly done since they are synergistic. (CEO_of is a relation between Person and Organization) The fruits of this research is to be carried to TCS IE environment called INX High quality publications in IE, in all the above tasks and combinations thereof
3
1. Mr. Girish Palshikar, Principal Scientist, Systems Research Lab, Tata Consultancy Services Limited 2. Dr. Pushpak Bhattacharyya, Professor, Department of Computer Science & engineering, IIT Bombay 3. Other members- Rohit Bangera, Sachin Pawar, Rudra Murthy, Girish Ponkia, Ravi Soni, Manish Shrivastava, Diptesh Kanojia, Gajanan Rane
4
1. Relation Extraction: A relation extraction system has been built which can extract entities from natural language sentence and identify relationships among them. Following papers have been published: Sachin Pawar, Pushpak Bhattacharyya and Girish Keshav Palshikar, Semi-supervised Relation Extraction using EM Algorithm, International Conference on NLP (ICON 2013), Noida, India, 18-20 December, 2013Semi-supervised Relation Extraction using EM Algorithm Sachin Pawar, Pushpak Bhattacharyya and Girish Keshav Palshikar, Improving Relation Extraction Using A Joint Model of Entities and Relations, under revision.
5
E1, E2 : Types of the first and second entity mentions R : Type of the Relation between two mentions F : Feature Vector capturing characteristics of the entity mentions and how they occur in the sentence Can be used in- Semi-supervised mode : F, E1,E2 known, R unknown, EM algorithm is used for learning the model parameters. Supervised mode : F, E1, E2 and R are known while learning
6
Input sentence : Patricia Newell, an organizer for Nader at the University of Florida in Gainesville, said that Nader had won far fewer votes in Florida than his supporters had expected. Entity Mentions Extracted : PERSON - Patricia Newell, organizer, Nader, Nader, his, supporters ORGANIZATION - University of Florida GPE (Geo-Political Entity) – Gainesville, Florida Relations Extracted : RelationEntity Mention 1Entity Mention 2 PER-SOC organizerNader GPE-ORG University of FloridaGainesville PER-SOC hissupporters
7
2. Multiword Extraction: Identifying and Extracting multi words using deep learning (multilayered neural networks) Paper submitted to COLING 2014 (Ireland): Rahul Sharnagat, Rudra Murthy V, Dhirendra Singh, Pushpak Bhattacharyya, Identification of Multiword Named Entities using Co-occurrence Statistics and Distributed Word Representation.
8
Machine Translation o यूक्रेन की सेना ने क्रीमिया के सीमावर्ती इलाकों में अपना डेरा डाल दिया है। Natural Language Generation o Good said or Well said ? o Baby chaning room (what is changed?)
9
ಈ ಕೆಲಸವು ಕಬ್ಬಿಣದ ಕಡಲೆಯೇ ಸರಿ o Transliteration: I kelasavu kabbiNada kaDaleyE sari o Gloss: this job iron nut correct o Translation: This job is a hard nut to crack o Google: This work is strong meat ಯಾರ ಹತ್ತಿರವೂ ಕೈ ಚಾಚಬೇಡ o Transliteration: yAra hattiravU kai chAchabEDa o Gloss: which near hand no extend o Translation: Do not ask help from anybody o Google: Whose ever hand cacabeda
10
Rule BasedEmpirical MWE Extraction Techniques Statistical Measures Based Similarity based Thesaurus based Distributional Word Representation
11
Artificial Neural Networks(ANN) successfully applied to various Natural Language Processing tasks ANNs able to capture the semantics of the word Use ANNs to extract MWE from the text: Deep Learning
15
NGIE: additional outcomes
16
HMM trained on Hindi Tested on Hindi words aligned with source Language words Hindi as Helper Marathi55.18 Bengali41.11 Gujarati42.23 Punjabi45.54
18
Project goal: Advanced IE in Multilingual setting Involves Machine Translation and Search too Sophisticated machine learning techniques like Markov Logic Network, Deep Learning etc. to be used for NLP The incumbent will get into depths of ML and NLP with active support for existing project work Expectation: day to day project work, attending research evaluation meetings around the country, publish, create downloadable resources and tools
19
Thank you Lab URL: http://www.cfilt.iitb.ac.inhttp://www.cfilt.iitb.ac.in My URL: http://www.cse.iitb.ac.in/~pb
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.