Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pushpak Bhattacharyya CSE Dept. IIT Bombay 19 May, 2014.

Similar presentations


Presentation on theme: "Pushpak Bhattacharyya CSE Dept. IIT Bombay 19 May, 2014."— Presentation transcript:

1 Pushpak Bhattacharyya CSE Dept. IIT Bombay 19 May, 2014

2  The goal of the project is to develop Next Generation Information Extraction Technology  The IE environment will be multi lingual  Involves Machine Translation and Cross Lingual Search  The IE focus is on relation extraction, named entity extraction, multi word extraction, semantic role labeling, corpus management  Relation and name extraction are to be jointly done since they are synergistic. (CEO_of is a relation between Person and Organization)  The fruits of this research is to be carried to TCS IE environment called INX  High quality publications in IE, in all the above tasks and combinations thereof

3 1. Mr. Girish Palshikar, Principal Scientist, Systems Research Lab, Tata Consultancy Services Limited 2. Dr. Pushpak Bhattacharyya, Professor, Department of Computer Science & engineering, IIT Bombay 3. Other members- Rohit Bangera, Sachin Pawar, Rudra Murthy, Girish Ponkia, Ravi Soni, Manish Shrivastava, Diptesh Kanojia, Gajanan Rane

4 1. Relation Extraction: A relation extraction system has been built which can extract entities from natural language sentence and identify relationships among them. Following papers have been published:  Sachin Pawar, Pushpak Bhattacharyya and Girish Keshav Palshikar, Semi-supervised Relation Extraction using EM Algorithm, International Conference on NLP (ICON 2013), Noida, India, 18-20 December, 2013Semi-supervised Relation Extraction using EM Algorithm  Sachin Pawar, Pushpak Bhattacharyya and Girish Keshav Palshikar, Improving Relation Extraction Using A Joint Model of Entities and Relations, under revision.

5  E1, E2 : Types of the first and second entity mentions  R : Type of the Relation between two mentions  F : Feature Vector capturing characteristics of the entity mentions and how they occur in the sentence  Can be used in-  Semi-supervised mode : F, E1,E2 known, R unknown, EM algorithm is used for learning the model parameters.  Supervised mode : F, E1, E2 and R are known while learning

6  Input sentence : Patricia Newell, an organizer for Nader at the University of Florida in Gainesville, said that Nader had won far fewer votes in Florida than his supporters had expected.  Entity Mentions Extracted :  PERSON - Patricia Newell, organizer, Nader, Nader, his, supporters  ORGANIZATION - University of Florida  GPE (Geo-Political Entity) – Gainesville, Florida  Relations Extracted : RelationEntity Mention 1Entity Mention 2 PER-SOC organizerNader GPE-ORG University of FloridaGainesville PER-SOC hissupporters

7 2. Multiword Extraction: Identifying and Extracting multi words using deep learning (multilayered neural networks)  Paper submitted to COLING 2014 (Ireland): Rahul Sharnagat, Rudra Murthy V, Dhirendra Singh, Pushpak Bhattacharyya, Identification of Multiword Named Entities using Co-occurrence Statistics and Distributed Word Representation.

8 Machine Translation o यूक्रेन की सेना ने क्रीमिया के सीमावर्ती इलाकों में अपना डेरा डाल दिया है। Natural Language Generation o Good said or Well said ? o Baby chaning room (what is changed?)

9 ಈ ಕೆಲಸವು ಕಬ್ಬಿಣದ ಕಡಲೆಯೇ ಸರಿ o Transliteration: I kelasavu kabbiNada kaDaleyE sari o Gloss: this job iron nut correct o Translation: This job is a hard nut to crack o Google: This work is strong meat ಯಾರ ಹತ್ತಿರವೂ ಕೈ ಚಾಚಬೇಡ o Transliteration: yAra hattiravU kai chAchabEDa o Gloss: which near hand no extend o Translation: Do not ask help from anybody o Google: Whose ever hand cacabeda

10 Rule BasedEmpirical MWE Extraction Techniques Statistical Measures Based Similarity based Thesaurus based Distributional Word Representation

11  Artificial Neural Networks(ANN) successfully applied to various Natural Language Processing tasks  ANNs able to capture the semantics of the word  Use ANNs to extract MWE from the text: Deep Learning

12

13

14

15 NGIE: additional outcomes

16 HMM trained on Hindi Tested on Hindi words aligned with source Language words Hindi as Helper Marathi55.18 Bengali41.11 Gujarati42.23 Punjabi45.54

17

18  Project goal: Advanced IE in Multilingual setting  Involves Machine Translation and Search too  Sophisticated machine learning techniques like Markov Logic Network, Deep Learning etc. to be used for NLP  The incumbent will get into depths of ML and NLP with active support for existing project work  Expectation: day to day project work, attending research evaluation meetings around the country, publish, create downloadable resources and tools

19 Thank you Lab URL: http://www.cfilt.iitb.ac.inhttp://www.cfilt.iitb.ac.in My URL: http://www.cse.iitb.ac.in/~pb


Download ppt "Pushpak Bhattacharyya CSE Dept. IIT Bombay 19 May, 2014."

Similar presentations


Ads by Google