Language and Vision: Useful Tools Presenter: Vicente Ordonez.

Slides:



Advertisements
Similar presentations
Text Mining Lab Adrian and Shawndra December 4, 2012 (version 1)
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Classification spotlights
Limin Wang, Yu Qiao, and Xiaoou Tang
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Vector space word representations
Week 8 The Natural Language Toolkit (NLTK)‏ Except where otherwise noted, this work is licensed under:
Methods in Computational Linguistics II Queens College Lecture 1: Introduction.
Lexical chains for summarization a summary of Silber & McCoy’s work by Keith Trnka.
Sarah Reonomy OSCON 2014 ANALYZING DATA WITH PYTHON.
Parsing the NEGRA corpus Greg Donaker June 14, 2006.
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
Siemens Big Data Analysis GROUP 3: MARIO MASSAD, MATTHEW TOSCHI, TYLER TRUONG.
Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.
TokensRegex August 15, 2013 Angel X. Chang.
Course G Web Search Engines 3/9/2011 Wei Xu
ELN – Natural Language Processing Giuseppe Attardi
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Partial Parsing CSCI-GA.2590 – Lecture 5A Ralph Grishman NYU.
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.
Learn to Comment Lance Lebanoff Mentor: Mahdi. Emotion classification of text  In our neural network, one feature is the emotion detected in the image.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Triplet Extraction from Sentences Lorand Dali Blaž “Jožef Stefan” Institute, Ljubljana 17 th of October 2008.
Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,
Natural language processing tools Lê Đức Trọng 1.
A.F.K. by SoTel. An Introduction to SoTel SoTel created A.F.K., an Android application used to auto generate text message responses to other users. A.F.K.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Object detection, deep learning, and R-CNNs
Lecture 12 Classifiers Part 2 Topics Classifiers Maxent Classifiers Maximum Entropy Markov Models Information Extraction and chunking intro Readings: Chapter.
Kai Sheng-Tai, Richard Socher, Christopher D. Manning
Tools for Linguistic Analysis. Overview of Linguistic Tools  Dictionaries  Linguistic Inquiry and Word Count (LIWC) Linguistic Inquiry and Word Count.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Problem Solving with NLTK MSE 2400 EaLiCaRA Dr. Tom Way.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Raymond J. Mooney University of Texas at Austin
Google SyntaxNet “Parsey McParseface and other SyntaxNet models are some of the most complex networks that we have trained with the TensorFlow framework.
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
Lecture 24 Distributional Word Similarity II
Visualizing and Understanding Neural Models in NLP
Natural Language Processing (NLP)
Zhe Ye Word2vec Tutorial Zhe Ye
LING 388: Computers and Language
Text Analytics Giuseppe Attardi Università di Pisa
CSCE 590 Web Scraping - NLTK
Convolutional Neural Networks for sentence classification
REU Week 1 Ivette Carreras UCF.
Creating Data Representations
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Extracting Recipes from Chemical Academic Papers
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Identrics – Team Vistula
Vector Representation of Text
PURE Learning Plan Richard Lee, James Chen,.
Lecture 19 Word Meanings II
Natural Language Processing (NLP)
Word2Vec.
CS224N Section 3: Corpora, etc.
CSCE 590 Web Scraping - NLTK
CS224N Section 3: Project,Corpora
Artificial Intelligence 2004 Speech & Natural Language Processing
Natural Language Processing Is So Difficult
Vector Representation of Text
Visual Grounding.
Natural Language Processing (NLP)
Presentation transcript:

Language and Vision: Useful Tools Presenter: Vicente Ordonez

Text Analysis Tokenization, Tagging, Parsing, Word Embeddings

Python NLTK “My cat likes eating bananas”

Python NLTK: Tokenization import nltk nltk.word_tokenize(“My cat likes eating bananas”) >>['My', 'cat', 'likes', 'eating', 'bananas']

Python NLTK: POS Tagging import nltk words = nltk.word_tokenize(“My cat likes eating bananas”) nltk.pos_tag(words) >>[(‘My', ‘PRP$'), ('cat', 'NN'), ('likes', 'VBZ'), ('eating', 'VBG'), ('bananas', 'NNS')] Penn Treebank Postagging

Python NLTK: Named Entities import nltk words = nltk.word_tokenize(“My uncle Fred’s cat likes eating bananas”) tags = nltk.pos_tag(words) nlkt.ne_chunk(tag) >>Tree('S', [('My', 'PRP$'), ('uncle', 'NN'), Tree('PERSON', [(Fred', 'NNP')]), ("'s", 'POS'), ('cat', 'NN'), ('likes', 'VBZ'), ('eating', 'VBG'), ('bananas', 'NNS')])

Python NLTK: Wordnet from nltk.corpus import wordnet wordnet.synsets('dog') // works even if you use ‘dogs’ instead >> [Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')] synset = wn.synset('dog.n.01') // You can get definition, lemmas, examples, hypernyms (“parent words”), hyponyms (“children words”), etc

Python NLTK: Wordnet Similarity from nltk.corpus import wordnet dog = wn.synset('dog.n.01') cat = wn.synset('cat.n.01') similarity_score = dog.path_similarity(cat) similarity_score = dog.wup_similarity(cat)

Python Final Advice Python Spyder NLTK Scipy Numpy Matplotlib VTK etc

Stanford Core NLP: Parsing Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String text = “My cat likes eating bananas"; Annotation document = new Annotation(text); pipeline.annotate(document); List sentences = document.get(SentencesAnnotation.class); for(CoreMap sentence: sentences) { Tree tree = sentence.get(TreeAnnotation.class); // Do something here with the tree } DEMO:

Stanford Core NLP: Dependencies Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String text = “My cat likes eating bananas"; Annotation document = new Annotation(text); pipeline.annotate(document); List sentences = document.get(SentencesAnnotation.class); for(CoreMap sentence: sentences) { SemanticGraph graph = sentence.get( CollapsedCCProcessedDependenciesAnnotation.class); // Do something here with the graph } DEMO:

Stanford Core NLP: Sentiment Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String text = “My cat likes eating bananas"; Annotation document = new Annotation(text); pipeline.annotate(document); List sentences = document.get(SentencesAnnotation.class); for(CoreMap sentence: sentences) { Tree tree = sentence.get(SentimentCoreAnnotations.AnnotatedTree.class); int sentiment = RNNCoreAnnotations.getPredictedClass(tree); // Do something here with the sentiment tree } DEMO:

Python Scikit-learn: Bag of Words from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer(analyzer = "word", \ tokenizer = tokenize, \ stop_words = “english", \ max_features = 5000) // Compute vocabulary of top 5000 most frequent words corpus_features = vectorizer.fit_transform(text_corpus) // On test data features = vectorizer.transform(‘My cat likes eating bananas’)

Word2Vec My cat likes eating bananas [ … ] [ … ] [ … ] [ … ] [ … ] [ … ] Average

Word2Vec Word Cosine distance los_angeles golden_gate oakland california san_diego pasadena seattle taiko houston chicago_illinois Closest words to “san francisco” You can also try:

Text Analysis Summary Basic Text Analysis using NTLK Splitting a sentence into words – tokenization. Extracting nouns, verbs, adjectives, etc – POS-tagging Computing word similarities – Wordnet Sentence Parsing using StanfordNLP Breaking sentences into its subject, predicate, etc. Resolving word dependencies. Sentiment Analysis Word representations Bag of Words - Scikit-learn Neural Networks – Word2vec

Image Analysis Image representations, shape, color + shape, recognition

VLFeat: HOG Features - Matlab image = imread(‘house.jpg’) hog = vl_hog(image, 8)

VLFeat: Dense SIFT - Matlab binSize = 8; magnif = 3; image_smooth = vl_imsmooth(I, sqrt((binSize/magnif)^2 -.25)) ; [frames, descriptors] = vl_dsift(image_smooth, 'size', binSize) ;

Convolutional Networks Krizhevsky, Sutskever, Hinton (2012)

Caffe import caffe caffe_root = 'CAFFE_INSTALLATION_DIRECTORY' caffe.set_mode_cpu() net = caffe.Classifier( caffe_root + 'models/bvlc_reference_caffenet/deploy.prototxt', caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel') net.transformer.set_mean('data', np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1)) net.transformer.set_raw_scale('data', 255) net.transformer.set_channel_swap('data', (2,1,0)) scores = net.predict([caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')])

Caffe >> scores [('data', (10, 3, 227, 227)), ('conv1', (10, 96, 55, 55)), ('pool1', (10, 96, 27, 27)), ('norm1', (10, 96, 27, 27)), ('conv2', (10, 256, 27, 27)), ('pool2', (10, 256, 13, 13)), ('norm2', (10, 256, 13, 13)), ('conv3', (10, 384, 13, 13)), ('conv4', (10, 384, 13, 13)), ('conv5', (10, 256, 13, 13)), ('pool5', (10, 256, 6, 6)), ('fc6', (10, 4096, 1, 1)), ('fc7', (10, 4096, 1, 1)), ('fc8', (10, 1000, 1, 1)), ('prob', (10, 1000, 1, 1))]

RCNN – Detection – Matlab / C

DPM – Detection – Matlab - C

Image Analysis Summary Overview of the VLFeat Library Basic image operators Distance functions, clustering, etc. Computing Features using Caffe / MatConvNet Obtaining category predictions Obtaining intermediate image representations Object Detection RCNN code – C/C++ - Matlab DPM code – C/C++ - Matlab