Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Slides:



Advertisements
Similar presentations
Deep Learning in NLP Word representation and how to use it for Parsing
Advertisements

ELN – Natural Language Processing Giuseppe Attardi
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Truc-Vien T. Nguyen Lab: Named Entity Recognition.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
30 March – 8 April 2005 Dipartimento di Informatica, Universita di Pisa ML for NLP With Special Focus on Tagging and Parsing Kiril Ribarov.
Seminar Topics and Projects Giuseppe Attardi Dipartimento di Informatica Università di Pisa.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
DeepWalk: Online Learning of Social Representations
Graph-based Dependency Parsing with Bidirectional LSTM Wenhui Wang and Baobao Chang Institute of Computational Linguistics, Peking University.
Deep Learning for Text Analysis Where do we stand?
Language Identification and Part-of-Speech Tagging
S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer
Ensembling Diverse Approaches to Question Answering
R-NET: Machine Reading Comprehension With Self-Matching Networks
Big data classification using neural network
Sentiment analysis using deep learning methods
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
Neural Machine Translation
Like It or Not: A Survey of Twitter Sentiment Analysis Methods
Sentiment analysis algorithms and applications: A survey
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
[A Contrastive Study of Syntacto-Semantic Dependencies]
Relation Extraction CSCI-GA.2591
Factual Claim Validation Models Extraction of Evidence
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Are End-to-end Systems the Ultimate Solutions for NLP?
Deep learning and applications to Natural language processing
Introduction to Machine Learning and NLP
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Deceptive News Prediction Clickbait Score Inference
AI in Cyber-security: Examples of Algorithms & Techniques
--Mengxue Zhang, Qingyang Li
Text Analytics Giuseppe Attardi Università di Pisa
Trends and Future of NLP
Deep Learning based Machine Translation
convolutional neural networkS
Distributed Representation of Words, Sentences and Paragraphs
convolutional neural networkS
Tagging Review Comments Rationale #10 Week 13
Seminar Topics and Projects
Drug Overdose Classification with NLP and Deep Learning: Background
Word embeddings based mapping
Word embeddings based mapping
Text Mining & Natural Language Processing
Computational Linguistics: New Vistas
Introduction to Natural Language Processing
Text Mining & Natural Language Processing
How To Extend the Training Data
CS224N Section 3: Corpora, etc.
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Preposition error correction using Graph Convolutional Networks
CS565: Intelligent Systems and Interfaces
Attention for translation
Learn to Comment Mentor: Mahdi M. Kalayeh
CSE 291G : Deep Learning for Sequences
Automatic Handwriting Generation
Artificial Intelligence 2004 Speech & Natural Language Processing
Factual Claim Validation Models
Sentiment Classification
Tokenizing Search/regex Statistics
Ask and Answer Questions
The experiments based on Recurrent Neural Networks
Bidirectional LSTM-CRF Models for Sequence Tagging
CSC 578 Neural Networks and Deep Learning
Huawei CBG AI Challenges
Presentation transcript:

Giuseppe Attardi Dipartimento di Informatica Università di Pisa Topics for Projects Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Fujitsu 2018 challenge AI-NLP Challenge Answer Sentence Selection problem

Google Assistant Guide for the Museo del Calcolo Internet Festival 2018: October 11-14, 2018 Actions on Google

Chatbot The Conversational Intelligence Challenge 2 (ConvAI2) convai.io/ Deadline: September 30, 2018

CoNLL 2018 Shared Task The CoNLL 2018 Shared Task involves dependency parsing from plain text. This involves several subtasks: Tokenization using DL POS using DL Morphological analysis Depenedncy parsing Timeline: Test data: May 2, 2018 Submission: June 26, 2018

CoNLL 2018 UD Parsing Parsing Universal Dependencies for the CoNLL 2018 Shared Task: BiLSTM with Attention T. Dozat, P. Qi, C.D. Manning. 2017. Graph-based Neural Dependency Parser.

CoNLL 2018: Deep Learning Tokenizer CoNLL 2018 challenge requires a tokenizer for all the Universal Dependency TreeBanks Build a DL tokenizer using Keras based on the approach of: Basile, Valerio and Bos, Johan and Evang, Kilian A General-Purpose Machine Learning Method for Tokenization and Sentence Boundary Detection (2013), http://gmb.let.rug.nl/elephant/

CoNLL 2018: Deep Learning POS Depling 2016 challenge requires tokenizer for any of the Universal Dependency TreeBank Build a DL POS using CNN, for example a LSTM that uses word embeddings and possible charcater embeddings.

CoNLL 2018: Deep Learning Morph Analyzer CoNLL 2018 challenge requires dealing with all the Universal Dependency TreeBanks Build a DL morphological analyzer that computes morphological embeddings for each word, using Keras and character embeddings.

Evalita 2016-2018 www.evalita.it/2016 www.evalita.it/2018 POSTWITA QA4FAQ NEEL-IT www.evalita.it/2018 ABSITA HaSpeeDe NLP4FUN (more statistics than linguistics?) Timeline Data Release: May 28, 2018 Evalutation: September 10-16, 2018

Possible Approach for ABSITA A Siamese Bidirectional LSTM with context-aware attention. Baziotis et al. DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis. www.aclweb.org/anthology/S17-2126 Code: https://github.com/cbaziotis/datastories-semeval2017-task4

Question Answering Tasks SemEval 2017 Task 3 Evalita 2016 QA4FAQ SQuAD https://towardsdatascience.com/nlp-building-a-question-answering-model-ed0529a68c54 Movie QA http://movieqa.cs.toronto.edu/home/

Chatbots AWS Chatbot Challenge Ubuntu Dialog Corpus: https://aws.amazon.com/events/chatbot-challenge/ Ubuntu Dialog Corpus: https://github.com/rkadlec/ubuntu-ranking-dataset-creator

Neural Machine Translation English-Italian Europarl Corpus Ses2Seq TensorFlow Tutorial References: D. Bahdanau, K. Cho, Y. Bengio. Neural machine translation by jointly learning to align and translate. http://arxiv.org/pdf/1409.0473v6 Zhang, X., & LeCun, Y. (2015). Text Understanding from Scratch. http://arxiv.org/abs/1502.01710

Twitter Modeling Political Bias Detecting Toxic Comments Use Italian Tweets collection Detecting Toxic Comments Use Italian Tweets collection and Evalita 2018 HaSpeeDe corpus

Deep Learning for Sentiment Analysis Annotated Data: SemEval training set http://alt.qcri.org/semeval2017/task4/index.php?id=data-and-tools Unannotated Data: 50 million tweets CNN approach: Code: DeepNL, https://github.com/attardi/deepnl Article: A. Severyn, A. Moschitti.UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification BiLSTM approach: Baziotis et al. DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis. www.aclweb.org/anthology/S17-2126 Code: https://github.com/cbaziotis/datastories-semeval2017-task4

POS tagging using Word Embeddings Data: Evalita 2016 Embeddings: http://tanl.di.unipi.it/embeddings/ Article: Stratos, M. Collins. Simple Semi-Supervised POS Tagging. http://www.cs.columbia.edu/~stratos/research/naacl15semipos.pdf

Medical texts Predicting side effects of drugs Using collection of Italian medical record on kidney and heart diseases Negation/Speculative Scope Detection BioScope Corpus: http://rgai.inf.u-szeged.hu/index.php?page=bioscope Semantic QA on medical texts: BioASQ datasets: bioasq.org/

Negation/Speculation Scope Determine the scope of negative or speculative statements: The lyso-platelet had no effect MnlI-AluI could suppress the basal-level activity Approach: Classifier for identifying cues Classifier to determine scope Data BioScope collection

Relation Extraction Exploit word embeddings as features + extra hand-coded features Use the Factor Based Compositional Embedding Model (FCM) http://www.cs.jhu.edu/~mrg/publications/finere-naacl-2015.pdf SemEval 2014 Relation Extraction data

Entity Linking with Embeddings Experiment with technique: R. Blanco, G. Ottaviano, E. Meiji. 2014. Fast and Space-Efficient Entity Linking in Queries. labs.yahoo.com/_c/uploads/WSDM-2015-blanco.pdf Dataset: Neel-it (Evalita 2016)

Extraction of Semantic Hierarchies Use word embeddings as measure of semantic distance Use Wikipedia as source of text http://ir.hit.edu.cn/~jguo/papers/acl2014-hypernym.pdf Organism Plant Ranuncolacee Aconitum