Language and Vision: Useful Tools Presenter: Vicente Ordonez.

Slides:

Advertisements

Similar presentations

Text Mining Lab Adrian and Shawndra December 4, 2012 (version 1)

Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.

Classification spotlights

Limin Wang, Yu Qiao, and Xiaoou Tang

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.

Vector space word representations

Week 8 The Natural Language Toolkit (NLTK)‏ Except where otherwise noted, this work is licensed under:

Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

Lexical chains for summarization a summary of Silber & McCoy’s work by Keith Trnka.

Sarah Reonomy OSCON 2014 ANALYZING DATA WITH PYTHON.

Parsing the NEGRA corpus Greg Donaker June 14, 2006.

NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.

Siemens Big Data Analysis GROUP 3: MARIO MASSAD, MATTHEW TOSCHI, TYLER TRUONG.

Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.

TokensRegex August 15, 2013 Angel X. Chang.

Course G Web Search Engines 3/9/2011 Wei Xu

ELN – Natural Language Processing Giuseppe Attardi

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Partial Parsing CSCI-GA.2590 – Lecture 5A Ralph Grishman NYU.

Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.

L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.

Learn to Comment Lance Lebanoff Mentor: Mahdi. Emotion classification of text  In our neural network, one feature is the emotion detected in the image.

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

Triplet Extraction from Sentences Lorand Dali Blaž “Jožef Stefan” Institute, Ljubljana 17 th of October 2008.

Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,

Natural language processing tools Lê Đức Trọng 1.

A.F.K. by SoTel. An Introduction to SoTel SoTel created A.F.K., an Android application used to auto generate text message responses to other users. A.F.K.

TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.

Object detection, deep learning, and R-CNNs

Lecture 12 Classifiers Part 2 Topics Classifiers Maxent Classifiers Maximum Entropy Markov Models Information Extraction and chunking intro Readings: Chapter.

Kai Sheng-Tai, Richard Socher, Christopher D. Manning

Tools for Linguistic Analysis. Overview of Linguistic Tools  Dictionaries  Linguistic Inquiry and Word Count (LIWC) Linguistic Inquiry and Word Count.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

Problem Solving with NLTK MSE 2400 EaLiCaRA Dr. Tom Way.

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

Raymond J. Mooney University of Texas at Austin

Google SyntaxNet “Parsey McParseface and other SyntaxNet models are some of the most complex networks that we have trained with the TensorFlow framework.

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

Lecture 24 Distributional Word Similarity II

Visualizing and Understanding Neural Models in NLP

Natural Language Processing (NLP)

Zhe Ye Word2vec Tutorial Zhe Ye

LING 388: Computers and Language

Text Analytics Giuseppe Attardi Università di Pisa

CSCE 590 Web Scraping - NLTK

Convolutional Neural Networks for sentence classification

REU Week 1 Ivette Carreras UCF.

Creating Data Representations

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Extracting Recipes from Chemical Academic Papers

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Identrics – Team Vistula

Vector Representation of Text

PURE Learning Plan Richard Lee, James Chen,.

Lecture 19 Word Meanings II

Natural Language Processing (NLP)

CS224N Section 3: Corpora, etc.

CSCE 590 Web Scraping - NLTK

CS224N Section 3: Project,Corpora

Artificial Intelligence 2004 Speech & Natural Language Processing

Natural Language Processing Is So Difficult

Vector Representation of Text

Visual Grounding.

Natural Language Processing (NLP)

Presentation transcript:

Language and Vision: Useful Tools Presenter: Vicente Ordonez

Text Analysis Tokenization, Tagging, Parsing, Word Embeddings

Python NLTK “My cat likes eating bananas”

Python NLTK: Tokenization import nltk nltk.word_tokenize(“My cat likes eating bananas”) >>['My', 'cat', 'likes', 'eating', 'bananas']

Python NLTK: POS Tagging import nltk words = nltk.word_tokenize(“My cat likes eating bananas”) nltk.pos_tag(words) >>[(‘My', ‘PRP$'), ('cat', 'NN'), ('likes', 'VBZ'), ('eating', 'VBG'), ('bananas', 'NNS')] Penn Treebank Postagging

Python NLTK: Named Entities import nltk words = nltk.word_tokenize(“My uncle Fred’s cat likes eating bananas”) tags = nltk.pos_tag(words) nlkt.ne_chunk(tag) >>Tree('S', [('My', 'PRP$'), ('uncle', 'NN'), Tree('PERSON', [(Fred', 'NNP')]), ("'s", 'POS'), ('cat', 'NN'), ('likes', 'VBZ'), ('eating', 'VBG'), ('bananas', 'NNS')])

Python NLTK: Wordnet from nltk.corpus import wordnet wordnet.synsets('dog') // works even if you use ‘dogs’ instead >> [Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')] synset = wn.synset('dog.n.01') // You can get definition, lemmas, examples, hypernyms (“parent words”), hyponyms (“children words”), etc

Python NLTK: Wordnet Similarity from nltk.corpus import wordnet dog = wn.synset('dog.n.01') cat = wn.synset('cat.n.01') similarity_score = dog.path_similarity(cat) similarity_score = dog.wup_similarity(cat)

Python Final Advice Python Spyder NLTK Scipy Numpy Matplotlib VTK etc

Stanford Core NLP: Parsing Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String text = “My cat likes eating bananas"; Annotation document = new Annotation(text); pipeline.annotate(document); List sentences = document.get(SentencesAnnotation.class); for(CoreMap sentence: sentences) { Tree tree = sentence.get(TreeAnnotation.class); // Do something here with the tree } DEMO:

Stanford Core NLP: Dependencies Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String text = “My cat likes eating bananas"; Annotation document = new Annotation(text); pipeline.annotate(document); List sentences = document.get(SentencesAnnotation.class); for(CoreMap sentence: sentences) { SemanticGraph graph = sentence.get( CollapsedCCProcessedDependenciesAnnotation.class); // Do something here with the graph } DEMO:

Stanford Core NLP: Sentiment Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String text = “My cat likes eating bananas"; Annotation document = new Annotation(text); pipeline.annotate(document); List sentences = document.get(SentencesAnnotation.class); for(CoreMap sentence: sentences) { Tree tree = sentence.get(SentimentCoreAnnotations.AnnotatedTree.class); int sentiment = RNNCoreAnnotations.getPredictedClass(tree); // Do something here with the sentiment tree } DEMO:

Python Scikit-learn: Bag of Words from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer(analyzer = "word", \ tokenizer = tokenize, \ stop_words = “english", \ max_features = 5000) // Compute vocabulary of top 5000 most frequent words corpus_features = vectorizer.fit_transform(text_corpus) // On test data features = vectorizer.transform(‘My cat likes eating bananas’)

Word2Vec My cat likes eating bananas [ … ] [ … ] [ … ] [ … ] [ … ] [ … ] Average

Word2Vec Word Cosine distance los_angeles golden_gate oakland california san_diego pasadena seattle taiko houston chicago_illinois Closest words to “san francisco” You can also try:

Text Analysis Summary Basic Text Analysis using NTLK Splitting a sentence into words – tokenization. Extracting nouns, verbs, adjectives, etc – POS-tagging Computing word similarities – Wordnet Sentence Parsing using StanfordNLP Breaking sentences into its subject, predicate, etc. Resolving word dependencies. Sentiment Analysis Word representations Bag of Words - Scikit-learn Neural Networks – Word2vec

Image Analysis Image representations, shape, color + shape, recognition

VLFeat: HOG Features - Matlab image = imread(‘house.jpg’) hog = vl_hog(image, 8)

VLFeat: Dense SIFT - Matlab binSize = 8; magnif = 3; image_smooth = vl_imsmooth(I, sqrt((binSize/magnif)^2 -.25)) ; [frames, descriptors] = vl_dsift(image_smooth, 'size', binSize) ;

Convolutional Networks Krizhevsky, Sutskever, Hinton (2012)

Caffe import caffe caffe_root = 'CAFFE_INSTALLATION_DIRECTORY' caffe.set_mode_cpu() net = caffe.Classifier( caffe_root + 'models/bvlc_reference_caffenet/deploy.prototxt', caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel') net.transformer.set_mean('data', np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1)) net.transformer.set_raw_scale('data', 255) net.transformer.set_channel_swap('data', (2,1,0)) scores = net.predict([caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')])

Caffe >> scores [('data', (10, 3, 227, 227)), ('conv1', (10, 96, 55, 55)), ('pool1', (10, 96, 27, 27)), ('norm1', (10, 96, 27, 27)), ('conv2', (10, 256, 27, 27)), ('pool2', (10, 256, 13, 13)), ('norm2', (10, 256, 13, 13)), ('conv3', (10, 384, 13, 13)), ('conv4', (10, 384, 13, 13)), ('conv5', (10, 256, 13, 13)), ('pool5', (10, 256, 6, 6)), ('fc6', (10, 4096, 1, 1)), ('fc7', (10, 4096, 1, 1)), ('fc8', (10, 1000, 1, 1)), ('prob', (10, 1000, 1, 1))]

RCNN – Detection – Matlab / C

DPM – Detection – Matlab - C

Image Analysis Summary Overview of the VLFeat Library Basic image operators Distance functions, clustering, etc. Computing Features using Caffe / MatConvNet Obtaining category predictions Obtaining intermediate image representations Object Detection RCNN code – C/C++ - Matlab DPM code – C/C++ - Matlab