Google SyntaxNet “Parsey McParseface and other SyntaxNet models are some of the most complex networks that we have trained with the TensorFlow framework at Google.” Quote from: https://www.tensorflow.org/versions/r0.9/tutorials/syntaxnet/index.html And https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html Globally Normalized Transition-Based Neural Networks https://arxiv.org/pdf/1603.06042v2.pdf
Highlights Neural Net NLP framework for TensorFlow Pretrained Parsey McParseface parser Trained using old English newswire stories 20 year old Penn Treebank Wall Street Journal Speak Over 94% accuracy & > 600 words/second spaCy: 92.4% at 15k words per second linguists trained for the task agree 96-97% of the cases Parsey has cousins now! Thanks to open source, 40+ languages Drill Bit: POS + Dependency Parsing Parsey’s cousins: https://github.com/tensorflow/models/blob/master/syntaxnet/universal.md
Picture and caption from: http://www. dailymail. co
What SyntaxNet does not do Coreference Resolution Named-entity recognition (NER) Sentiment Analysis Many other things
Coreference Resolution From: http://nlp.stanford.edu/projects/coref.shtml
Named-entity recognition From: https://en.wikipedia.org/wiki/Named-entity_recognition
Sentiment Analysis https://en.wikipedia.org/wiki/Sentiment_analysis Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgment or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader). Could consider demoing: https://foxtype.com/politeness (Chocolate is the very best and amazing)
What SyntaxNet does Syntactic Parser Part-of-speech (POS) tagging Dependency Parsing
Why gramm[ae]r stinks as "Those buffalo(es) from Buffalo that are intimidated by buffalo(es) from Buffalo intimidate buffalo(es) from Buffalo."[1] From: https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo and https://upload.wikimedia.org/wikipedia/commons/2/2c/Buffalo_buffalo_WikiWorld.png The sentence uses three distinct meanings of the word buffalo: the city of Buffalo, New York; the uncommon verb to buffalo, meaning "to bully, harass,or intimidate" or "to baffle"; and the animal, bison (often called buffalo in North America). The sentence can be phrased differently as "Those buffalo(es) from Buffalo that are intimidated by buffalo(es) from Buffalo intimidate buffalo(es) from Buffalo."[1]
Part-of-speech (POS) tagging From: https://github.com/tensorflow/models/tree/master/syntaxnet#installation This sentence is composed of words: strings of characters that are segmented into groups (e.g. "I", "saw", etc.) Each word in the sentence has a grammatical function that can be useful for understanding the meaning of language. For example, "saw" in this example is a past tense of the verb "to see". But any given word might have different meanings in different contexts: "saw" could just as well be a noun (e.g., a saw used for cutting) or a present tense verb (using a saw to cut something).
From: https://www. google. com/url
POS + Dependency Parsing From: https://github.com/tensorflow/models/tree/master/syntaxnet#installation This sentence is composed of words: strings of characters that are segmented into groups (e.g. "I", "saw", etc.) Each word in the sentence has a grammatical function that can be useful for understanding the meaning of language. For example, "saw" in this example is a past tense of the verb "to see". But any given word might have different meanings in different contexts: "saw" could just as well be a noun (e.g., a saw used for cutting) or a present tense verb (using a saw to cut something).
Stanford typed dependencies (http://nlp. stanford From: http://nlp.stanford.edu/software/dependencies_manual.pdf
Grammars Slide directly taken from Berkeley slides: https://bcourses.berkeley.edu/courses/1267848/files/50935030/download?verifier=qPVn1u6pa0LKopYB6n7daB9KX9stNJxCWnwM7oBh&wrap=1 Quote: ‘The reconstruction of a sequence of grammar productions from a sentence is called “parsing” the sentence…… It is most conveniently represented as a tree….. The parser then tries to find the most likely sequence of productions that generate the given sentence’
One of the main problems that makes parsing so challenging is that human languages show remarkable levels of ambiguity. It is not uncommon for moderate length sentences - say 20 or 30 words in length - to have hundreds, thousands, or even tens of thousands of possible syntactic structures. A natural language parser must somehow search through all of these alternatives, and find the most plausible structure given the context. From: https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html e.g. Alice drove down the street in her car has at least two possible dependency parses:
From: http://brnrd.me/google-syntaxnet-sentiment-analysis/
From: https://explosion.ai/blog/syntaxnet-in-context
Installation Runs on top of TensorFlow Python 2.7 Package manager: pip/brew Build tool: Bazel, Mock (unit testing) Other: Swig (script bindings), protobuf (serializing data), asciitree (for drawing parse trees)
POS + Dependency Parsing Default From: https://github.com/tensorflow/models/tree/master/syntaxnet#installation This sentence is composed of words: strings of characters that are segmented into groups (e.g. "I", "saw", etc.) Each word in the sentence has a grammatical function that can be useful for understanding the meaning of language. For example, "saw" in this example is a past tense of the verb "to see". But any given word might have different meanings in different contexts: "saw" could just as well be a noun (e.g., a saw used for cutting) or a present tense verb (using a saw to cut something). ConLL
Training the SyntaxNet POS Tagger From: https://github.com/tensorflow/models/tree/master/syntaxnet#installation
Transition-Based Parsing + Beam Search Paper: https://arxiv.org/pdf/1603.06042v2.pdf Garden path sentence: "The old man the boat".
Backup
it is critical to tightly integrate learning and search in order to achieve the highest prediction accuracy https://www.quora.com/Whats-the-difference-between-Machine-Learning-AI-and-NLP
Phrase structure grammar From: https://en.wikipedia.org/wiki/Phrase_structure_grammar
Other links Other: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwiN8J7knuXPAhXGHD4KHWMuDGwQFggjMAE&url=https%3A%2F%2Fcourses.cs.washington.edu%2Fcourses%2Fcse454%2F09sp%2Fslides%2F07-posparsing.pptx&usg=AFQjCNGTy5Nr7tiEZ5HIKC3o-uSrRbSkIA&sig2=17Ok_q1lhrnjQP-HPg5HWw&bvm=bv.135974163,d.cWw