LING/C SC 581: Advanced Computational Linguistics Lecture 3 Jan 17th
2019 HLT Lecture Series Speaker Title Date Tatjana Scheffler Analyzing Discourse Structure on Social Media Friday Feb 15th, 3pm, Comm 311. Marcos Zampieri Language Variation and Automatic Language Identification. The Case of Dialects and Similar Languages. Wednesday Feb 20th, noon, room TBA Adriana Picoral TBA Wednesday Feb 27th, noon, room TBA Gus Hahn-Powell Wednesday Mar 13th, noon, room TBA Miikka Silfverberg Wednesday Mar 20th, noon, room TBA
Named Entity Recognition In my other class, doing a demo: University of Illinois Unfortunately, it is down this week so far…
Named Entity Recognition Google Cloud Natural Language: language/ also supplies sentiment/magnitude scores for the identified entities
Named Entity Recognition
Named Entity Recognition Illinois Named Entity Recognizer example: Helicopters will patrol the temporary no-fly zone around New Jersey's MetLife Stadium Sunday, with F-16s based in Atlantic City ready to be scrambled if an unauthorized aircraft does enter the restricted airspace. Down below, bomb-sniffing dogs will patrol the trains and buses that are expected to take approximately 30,000 of the 80,000-plus spectators to Sunday's Super Bowl between the Denver Broncos and Seattle Seahawks. The Transportation Security Administration said it has added about two dozen dogs to monitor passengers coming in and out of the airport around the Super Bowl. On Saturday, TSA agents demonstrated how the dogs can sniff out many different types of explosives. Once they do, they're trained to sit rather than attack, so as not to raise suspicion or create a panic. TSA spokeswoman Lisa Farbstein said the dogs undergo 12 weeks of training, which costs about $200,000, factoring in food, vehicles and salaries for trainers. Dogs have been used in cargo areas for some time, but have just been introduced recently in passenger areas at Newark and JFK airports. JFK has one dog and Newark has a handful, Farbstein said.
Dependency-Based Parsing
Universal Dependencies (UD) 100 treebanks in over 70 languages Some relations involving dependent clauses: ccomp: connects higher verb with verbal head of sentential complement with overt subject xcomp: connects higher verb with verbal head of non-finite sentential complement without a subject. csubj: connects higher verb with verbal head of sentential subject. vmod ➤ advcl/acl: connects word to verbal head of a reduced non-finite verbal modifier (deprecated in UD; still emitted by syntaxnet)
Google Cloud Natural Language RRS Sir David Attenborough "Boaty McBoatface" Parsey McParseface (Andor et al., 2016) Free: DragNN (Kong et al., 2017), the follow-on to SyntaxNet (2016) Free sampling at For-Pay Google Cloud version is trained on additional proprietary corpora
Google Cloud Natural Language is ^
Google Cloud Natural Language
Quick Homework 3 The Penn Treebank is partially installed as a corpus in NLTK Data (Sections 00 and 01: wsj_0001.mrg to wsj_0199.mrg) from nltk.corpus import treebank Methods: .words() .sents() .parsed_sents() .draw() .fileids()
Quick Homework 3 Pick a random (see right) parse from treebank Run it through the Google Cloud Parser Analyze and comment on how it compares to the gold standard parse include the gold tree and the Google dependency parse One PDF file Due next Wednesday (by midnight) import random random.seed() random.randrange(0,391 4) 1462 >>> len(treebank.sents()) 3914