Stanford CoreNLP 20150826.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Quranic Arabic Corpus Data Mining & Text Analytics By Ismail Teladia & Abdullah Alazwari.
QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
Deep Learning in NLP Word representation and how to use it for Parsing
LingPipe Does a variety of tasks  Tokenization  Part of Speech Tagging  Named Entity Detection  Clustering  Identifies.
A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
Robust Textual Inference via Graph Matching Aria Haghighi Andrew Ng Christopher Manning.
Tools and resources Summary of working group discussion.
Introduction to treebanks Session 1: 7/08/
Tasks Talk: ULA08 Workshop March 18, 2007 A Talk about Tasks Unified Linguistic Annotation Workshop Adam Meyers New York University March 18, 2008.
1 I256: Applied Natural Language Processing Marti Hearst Sept 25, 2006.
Parsing the NEGRA corpus Greg Donaker June 14, 2006.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Named Entity Recognition and the Stanford NER Software Jenny Rose Finkel Stanford University March 9, 2007.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Truc-Vien T. Nguyen Lab: Named Entity Recognition.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Some Advances in Transformation-Based Part of Speech Tagging
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Researcher affiliation extraction from homepages I. Nagy, R. Farkas, M. Jelasity University of Szeged, Hungary.
Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology
NATURAL LANGUAGE UNDERSTANDING FOR SOFT INFORMATION FUSION Stuart C. Shapiro and Daniel R. Schlegel Department of Computer Science and Engineering Center.
Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
Natural language processing tools Lê Đức Trọng 1.
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
Tokenization & POS-Tagging
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Capitalization DAY 1 COMPLETE SLIDES 1-8
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
5/6/04Biolink1 Integrated Annotation for Biomedical IE Mining the Bibliome: Information Extraction from the Biomedical Literature NSF ITR grant EIA
Correcting Comma Errors in Learner Essays, and Restoring Commas in Newswire Text Ross Israel Indiana University Joel Tetreault Educational Testing Service.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Part-of-Speech Tagging & Sequence Labeling Hongning Wang
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Graph-based Dependency Parsing with Bidirectional LSTM Wenhui Wang and Baobao Chang Institute of Computational Linguistics, Peking University.
Language Identification and Part-of-Speech Tagging
Deep Learning for Bacteria Event Identification
PRESENTED BY: PEAR A BHUIYAN
Google SyntaxNet “Parsey McParseface and other SyntaxNet models are some of the most complex networks that we have trained with the TensorFlow framework.
Tools for Natural Language Processing Applications
张昊.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Bidirectional CRF for NER
Improving a Pipeline Architecture for Shallow Discourse Parsing
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
Universal Dependencies
Writing Analytics Clayton Clemens Vive Kumar.
Topics in Linguistics ENG 331
NETWORK-BASED MODEL OF LEARNING
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26
Computational Linguistics: New Vistas
Recognizing Location Names from Chinese Texts
Dependency Grammar & Stanford Dependencies
CS224N Section 3: Corpora, etc.
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
CS224N Section 3: Project,Corpora
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Stanford CoreNLP 20150826

Content Architecture Annotator Examples from QALD-5 POS Tagger NER Parser Dependency Parser Coreference Resolution Examples from QALD-5

Architecture Annotator: adds some kind of analysis information to an Annotation object. Annotation: a type safe heterogeneous map, the data structure which hold the results of annotators.

Annotator 16 annotators 1 TokenizerAnnotator 9 TrueCaseAnnotator 2 CleanXmlAnnotator 10 ParserAnnotator 3 WordToSentenceAnnotator 11 DependencyParseAnnotator 4 POSTaggerAnnotator 12 DeterministicCorefAnnotator 5 MorphaAnnotator 13 RelationExtractorAnnotator 6 NERClassifierCombiner 14 NaturalLogicAnnotator 7 RegexNERAnnotator 15 QuoteAnnotator 8 SentimentAnnotator 16 EntityMentionsAnnotator NaturalLogicAnnotator: Marks quantifier scope and token polarity, according to natural logic semantics.

POS Tagger http://nlp.stanford.edu/software/tagger.shtml Labels tokens with their part-of-speech (POS) tag Maximum entropy Support English, Arabic, Chinese, French, Spanish, and German Accuracy: 97.24% (Penn Treebank tagset, English Penn Treebank WSJ) 93.46% (LDC Chinese Treebank POS tag set, Chinese and Hong Kong texts, 79.40% on unknown words) Example

NER, regexner http://nlp.stanford.edu/software/CRF-NER.shtml Recognizes named (PERSON, LOCATION, ORGANIZATION, MISC). CRF sequence taggers Support English, Dutch, Spanish, German F1 87.94% (Prec 88.21%, Rec 87.68%. CoNLL 2003 English news testb) Model: english.muc.7class.distsim.crf.ser, english.conll.4class.distsim.crf.ser Example numerical (MONEY, NUMBER, DATE, TIME, DURATION, SET) entities rule-based Bachelor of (Arts|Laws|Science|Engineering) DEGREE

Parser http://nlp.stanford.edu/software/lex-parser.shtml Syntactic analysis Support English, Chinese, Arabic, Spanish, German PCFG(2003), recursive neural network(2013) F1: 86.36% (PCFG), 90.4% (RNN) (Penn Treebank WSJ Engish) Model: englishPCFG.ser, englishRNN.ser Example

DependencyParse http://nlp.stanford.edu/software/nndep.shtml analyzes the grammatical structure of a sentence, establishing relationships between "head" words and words which modify those heads. two type of output: basic, collapsed Support English, Chinese Neural Network, transition-based English Penn Treebank and Chinese Penn Treebank(2014) Universal Dependencies representation unlabeled attachment scores (UAS) English 92.0, Chinese 83.9 labeled attachment scores (LAS) English 90.7, Chinese 82.4 Models english_UD.gz (default, English, Universal Dependencies) PTB_Stanford_params.txt.gz (English, Stanford Dependencies) PTB_CoNLL_params.txt.gz (English, CoNLL Dependencies)  CTB_CoNLL_params.txt.gz (Chinese, CoNLL Dependencies)

DependencyParse http://universaldependencies.github.io/docs/u/dep/all.html

Coreference Resolution http://nlp.stanford.edu/software/dcoref.shtml pronominal and nominal coreference resolution The music was so loud that it couldn't be enjoyed. The project leader is refusing to help. The jerk thinks only of himself. Support English, Chinese Rule-based, Sieve-based (……, DiscourseMatch, ExactStringMatch, ……, RelaxedHeadMatch, PronounMatch) Avg F1 59.5% (CoNLL-2011 Shared Task data set, 2013) Dictionary: Demonym (Asia Asian Asians) Male (johannsen johansen johanson johansson) Female (kate katelyn kater katerina) …… Example:

Examples from QALD-5(1) Wo has vice president under the president who authorized atomic weapons against Japan during World War II?

Examples from QALD-5(2) Of the people that died of radiation in Los Alamos, whose death was an accident?

Examples from QALD-5(3) Which actress starring in the TV series Friends owns the production company Coquette Productions?

Examples from QALD-5(4) Which city does the first person to climb all 14 eight-thousanders come from?

Examples from QALD-5(5) What is the largest city in the county in which Faulkner spent most of his life?

Thank you!