Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Slides:

Advertisements

Similar presentations

Deep Learning in NLP Word representation and how to use it for Parsing

Advertisements

ELN – Natural Language Processing Giuseppe Attardi

INTRODUCTION TO ARTIFICIAL INTELLIGENCE Truc-Vien T. Nguyen Lab: Named Entity Recognition.

Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.

Giuseppe Attardi Dipartimento di Informatica Università di Pisa

L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

30 March – 8 April 2005 Dipartimento di Informatica, Universita di Pisa ML for NLP With Special Focus on Tagging and Parsing Kiril Ribarov.

Seminar Topics and Projects Giuseppe Attardi Dipartimento di Informatica Università di Pisa.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

DeepWalk: Online Learning of Social Representations

Graph-based Dependency Parsing with Bidirectional LSTM Wenhui Wang and Baobao Chang Institute of Computational Linguistics, Peking University.

Deep Learning for Text Analysis Where do we stand?

Language Identification and Part-of-Speech Tagging

S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer

Ensembling Diverse Approaches to Question Answering

R-NET: Machine Reading Comprehension With Self-Matching Networks

Big data classification using neural network

Sentiment analysis using deep learning methods

Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C

Neural Machine Translation

Like It or Not: A Survey of Twitter Sentiment Analysis Methods

Sentiment analysis algorithms and applications: A survey

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

[A Contrastive Study of Syntacto-Semantic Dependencies]

Relation Extraction CSCI-GA.2591

Factual Claim Validation Models Extraction of Evidence

Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Are End-to-end Systems the Ultimate Solutions for NLP?

Deep learning and applications to Natural language processing

Introduction to Machine Learning and NLP

Mining the Data Charu C. Aggarwal, ChengXiang Zhai

Deceptive News Prediction Clickbait Score Inference

AI in Cyber-security: Examples of Algorithms & Techniques

--Mengxue Zhang, Qingyang Li

Text Analytics Giuseppe Attardi Università di Pisa

Trends and Future of NLP

Deep Learning based Machine Translation

convolutional neural networkS

Distributed Representation of Words, Sentences and Paragraphs

convolutional neural networkS

Tagging Review Comments Rationale #10 Week 13

Seminar Topics and Projects

Drug Overdose Classification with NLP and Deep Learning: Background

Word embeddings based mapping

Word embeddings based mapping

Text Mining & Natural Language Processing

Computational Linguistics: New Vistas

Introduction to Natural Language Processing

Text Mining & Natural Language Processing

How To Extend the Training Data

CS224N Section 3: Corpora, etc.

Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Preposition error correction using Graph Convolutional Networks

CS565: Intelligent Systems and Interfaces

Attention for translation

Learn to Comment Mentor: Mahdi M. Kalayeh

CSE 291G : Deep Learning for Sequences

Automatic Handwriting Generation

Artificial Intelligence 2004 Speech & Natural Language Processing

Factual Claim Validation Models

Sentiment Classification

Tokenizing Search/regex Statistics

Ask and Answer Questions

The experiments based on Recurrent Neural Networks

Bidirectional LSTM-CRF Models for Sequence Tagging

CSC 578 Neural Networks and Deep Learning

Huawei CBG AI Challenges

Presentation transcript:

Giuseppe Attardi Dipartimento di Informatica Università di Pisa Topics for Projects Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Fujitsu 2018 challenge AI-NLP Challenge Answer Sentence Selection problem

Google Assistant Guide for the Museo del Calcolo Internet Festival 2018: October 11-14, 2018 Actions on Google

Chatbot The Conversational Intelligence Challenge 2 (ConvAI2) convai.io/ Deadline: September 30, 2018

CoNLL 2018 Shared Task The CoNLL 2018 Shared Task involves dependency parsing from plain text. This involves several subtasks: Tokenization using DL POS using DL Morphological analysis Depenedncy parsing Timeline: Test data: May 2, 2018 Submission: June 26, 2018

CoNLL 2018 UD Parsing Parsing Universal Dependencies for the CoNLL 2018 Shared Task: BiLSTM with Attention T. Dozat, P. Qi, C.D. Manning. 2017. Graph-based Neural Dependency Parser.

CoNLL 2018: Deep Learning Tokenizer CoNLL 2018 challenge requires a tokenizer for all the Universal Dependency TreeBanks Build a DL tokenizer using Keras based on the approach of: Basile, Valerio and Bos, Johan and Evang, Kilian A General-Purpose Machine Learning Method for Tokenization and Sentence Boundary Detection (2013), http://gmb.let.rug.nl/elephant/

CoNLL 2018: Deep Learning POS Depling 2016 challenge requires tokenizer for any of the Universal Dependency TreeBank Build a DL POS using CNN, for example a LSTM that uses word embeddings and possible charcater embeddings.

CoNLL 2018: Deep Learning Morph Analyzer CoNLL 2018 challenge requires dealing with all the Universal Dependency TreeBanks Build a DL morphological analyzer that computes morphological embeddings for each word, using Keras and character embeddings.

Evalita 2016-2018 www.evalita.it/2016 www.evalita.it/2018 POSTWITA QA4FAQ NEEL-IT www.evalita.it/2018 ABSITA HaSpeeDe NLP4FUN (more statistics than linguistics?) Timeline Data Release: May 28, 2018 Evalutation: September 10-16, 2018

Possible Approach for ABSITA A Siamese Bidirectional LSTM with context-aware attention. Baziotis et al. DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis. www.aclweb.org/anthology/S17-2126 Code: https://github.com/cbaziotis/datastories-semeval2017-task4

Question Answering Tasks SemEval 2017 Task 3 Evalita 2016 QA4FAQ SQuAD https://towardsdatascience.com/nlp-building-a-question-answering-model-ed0529a68c54 Movie QA http://movieqa.cs.toronto.edu/home/

Chatbots AWS Chatbot Challenge Ubuntu Dialog Corpus: https://aws.amazon.com/events/chatbot-challenge/ Ubuntu Dialog Corpus: https://github.com/rkadlec/ubuntu-ranking-dataset-creator

Neural Machine Translation English-Italian Europarl Corpus Ses2Seq TensorFlow Tutorial References: D. Bahdanau, K. Cho, Y. Bengio. Neural machine translation by jointly learning to align and translate. http://arxiv.org/pdf/1409.0473v6 Zhang, X., & LeCun, Y. (2015). Text Understanding from Scratch. http://arxiv.org/abs/1502.01710

Twitter Modeling Political Bias Detecting Toxic Comments Use Italian Tweets collection Detecting Toxic Comments Use Italian Tweets collection and Evalita 2018 HaSpeeDe corpus

Deep Learning for Sentiment Analysis Annotated Data: SemEval training set http://alt.qcri.org/semeval2017/task4/index.php?id=data-and-tools Unannotated Data: 50 million tweets CNN approach: Code: DeepNL, https://github.com/attardi/deepnl Article: A. Severyn, A. Moschitti.UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification BiLSTM approach: Baziotis et al. DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis. www.aclweb.org/anthology/S17-2126 Code: https://github.com/cbaziotis/datastories-semeval2017-task4

POS tagging using Word Embeddings Data: Evalita 2016 Embeddings: http://tanl.di.unipi.it/embeddings/ Article: Stratos, M. Collins. Simple Semi-Supervised POS Tagging. http://www.cs.columbia.edu/~stratos/research/naacl15semipos.pdf

Medical texts Predicting side effects of drugs Using collection of Italian medical record on kidney and heart diseases Negation/Speculative Scope Detection BioScope Corpus: http://rgai.inf.u-szeged.hu/index.php?page=bioscope Semantic QA on medical texts: BioASQ datasets: bioasq.org/

Negation/Speculation Scope Determine the scope of negative or speculative statements: The lyso-platelet had no effect MnlI-AluI could suppress the basal-level activity Approach: Classifier for identifying cues Classifier to determine scope Data BioScope collection

Relation Extraction Exploit word embeddings as features + extra hand-coded features Use the Factor Based Compositional Embedding Model (FCM) http://www.cs.jhu.edu/~mrg/publications/finere-naacl-2015.pdf SemEval 2014 Relation Extraction data

Entity Linking with Embeddings Experiment with technique: R. Blanco, G. Ottaviano, E. Meiji. 2014. Fast and Space-Efficient Entity Linking in Queries. labs.yahoo.com/_c/uploads/WSDM-2015-blanco.pdf Dataset: Neel-it (Evalita 2016)

Extraction of Semantic Hierarchies Use word embeddings as measure of semantic distance Use Wikipedia as source of text http://ir.hit.edu.cn/~jguo/papers/acl2014-hypernym.pdf Organism Plant Ranuncolacee Aconitum