Deep Learning and Text Mining Will Stanton Ski Hackathon Kickoff Ceremony, Feb 28, 2015.

Slides:

Advertisements

Similar presentations

Neural networks Introduction Fitting neural networks

Advertisements

Advanced topics.

Neural Networks: Backpropagation algorithm Data Mining and Semantic Web University of Belgrade School of Electrical Engineering Chair of Computer Engineering.

Nathan Wiebe, Ashish Kapoor and Krysta Svore Microsoft Research ASCR Workshop Washington DC Quantum Deep Learning.

Vector space word representations

Data Visualization STAT 890, STAT 442, CM 462

Machine Learning Neural Networks

1 Neural Networks - Basics Artificial Neural Networks - Basics Uwe Lämmel Business School Institute of Business Informatics

Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.

How to do backpropagation in a brain

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

Artificial Neural Networks

Distributed Representations of Sentences and Documents

MACHINE LEARNING AND ARTIFICIAL NEURAL NETWORKS FOR FACE VERIFICATION

KDD for Science Data Analysis Issues and Examples.

CS-424 Gregory Dudek Today’s Lecture Neural networks –Backprop example Clustering & classification: case study –Sound classification: the tapper Recurrent.

Nantes Machine Learning Meet-up 2 February 2015 Stefan Knerr CogniTalk

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

A shallow introduction to Deep Learning

NEURAL NETWORKS FOR DATA MINING

Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.

M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.

Machine Learning Tutorial Amit Gruber The Hebrew University of Jerusalem.

Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,

Artificial Intelligence, Expert Systems, and Neural Networks Group 10 Cameron Kinard Leaundre Zeno Heath Carley Megan Wiedmaier.

Introduction to Deep Learning

Joe Bradish Parallel Neural Networks. Background  Deep Neural Networks (DNNs) have become one of the leading technologies in artificial intelligence.

WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.

Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.

語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山助教: 熊信寬

Feasibility of Using Machine Learning Algorithms to Determine Future Price Points of Stocks By: Alexander Dumont.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.

DeepWalk: Online Learning of Social Representations

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

Distributed Representations for Natural Language Processing

Big data classification using neural network

Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Deep learning David Kauchak CS158 – Fall 2016.

Deep Learning Amin Sobhani.

Chilimbi, et al. (2014) Microsoft Research

School of Computer Science & Engineering

Matt Gormley Lecture 16 October 24, 2016

Intro to NLP and Deep Learning

Distributed Representations of Words and Phrases and their Compositionality Presenter: Haotian Xu.

Neural networks (3) Regularization Autoencoder

Machine Learning Ali Ghodsi Department of Statistics

Self organizing networks

Efficient Estimation of Word Representation in Vector Space

Face Recognition with Neural Networks

Overview of Machine Learning

Word Embedding Word2Vec.

Creating Data Representations

Neural Networks Geoff Hulten.

Lecture Notes for Chapter 4 Artificial Neural Networks

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Neural networks (3) Regularization Autoencoder

COSC 4335: Part2: Other Classification Techniques

Word embeddings (continued)

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Attention for translation

Word representations David Kauchak CS158 – Fall 2016.

PYTHON Deep Learning Prof. Muhammad Saeed.

Patterson: Chap 1 A Review of Machine Learning

Presentation transcript:

Deep Learning and Text Mining Will Stanton Ski Hackathon Kickoff Ceremony, Feb 28, 2015

We have a problem ●At Return Path, we process billions of s a year, from tons of senders ●We want to tag and cluster senders o Industry verticals (e-commerce, apparel, travel, etc.) o Type of customers they sell to (luxury, soccer moms, etc.) o Business model (daily deals, flash sales, etc.) ●It’s too much to do by hand!

What to do? ●Standard approaches aren’t great o Bag of words classification model (document-term matrix, LSA, LDA)  Have to manually label lots of cases first  Difficult with lots of data (especially LDA) o Bag of words clustering  Can’t easily put one company into multiple categories (ie. more general tagging)  Needs lots of tuning ● How about deep learning neural networks? o Very trendy. Let’s try it!

Neural Networks Input Layer First Hidden Layer Second Hidden Layer Output Layer Inputs x = (x1, x2, x3) Output y ●Machine learning algorithms modeled after the way the human brain works ●Learn patterns and structure by passing training data through “neurons” ●Useful for classification, regression, feature extraction, etc.

Deep Learning ●Neural networks with lots of hidden layers (hundreds) ●State of the art for machine translation, facial recognition, text classification, speech recognition o Tasks with real deep structure, that humans do automatically but computers struggle with o Should be good for company tagging!

Distributed Representations Pixels Edges Collection of faces Shapes Typical facial types (features) ●Human brain uses distributed representations ●We can use deep learning to do the same thing with words (letters -> words -> phrases -> sentences -> …)

Deep Learning Challenges ●Computationally difficult to train (ie. slow) o Each hidden layer means more parameters o Each feature means more parameters ●Real human-generated text has a near- infinite number of features and data o ie. slow would be a problem ●Solution: use word2vec

word2vec ●Published by scientists at Google in 2013 ●Python implementation in 2014 o gensim library ●Learns distributed vector representations of words (“word to vec”) using a neural net o NOTE for hardcore experts: word2vec does not strictly or necessarily train a deep neural net, but it uses deep learning technology (distributed representations, backpropagation, stochastic gradient descent, etc.) and is based on a series of deep learning papers

What is the output? ●Distributed vector representations of words o each word is encoded as a vector of floats o vec queen = (0.2, -0.3,.7, 0, …,.3) o vec woman = (0.1, -0.2,.6, 0.1, …,.2) o length of the vectors = dimension of the word representation o key concept of word2vec: words with similar vectors have a similar meaning (context)

word2vec Features ●Very fast and scalable o Google trained it on 100’s of billions of words ●Uncovers deep latent structure of word relationships o Can solve analogies like King::Man as Queen::? or Paris::France as Berlin::? o Can solve “one of these things is not like another” o Can be used for machine translation or automated sentence completion

How does it work? ●Feed the algorithm (lots of) sentences o totally unsupervised learning ●word2vec trains a neural net that encodes the context of words within sentences o “Skip-grams”: what is the probability that the word “queen” appears 1 word after “woman”, 2 words after, etc.

word2vec at Return Path ●At Return Path, we implemented word2vec on data from our Consumer Data Stream o billions of subject lines from millions of users o fed 30 million unique subject lines (300m words) and sending domains into word2vec (using Python) Lots of subject lines word2vecword vectorsinsights

Grouping companies with word2vec ●Find daily deals sites like Groupon [word for (word, score) in model.most_similar('groupon.com', topn = 100) if '.com' in word] ['grouponmail.com.au', 'specialicious.com', 'livingsocial.com', 'deem.com', 'hitthedeals.com', 'grabon -ie.com', 'grabon .com', 'kobonaty.com', 'deals.com.au', 'coupflip.com', 'ouffer.com', 'wagjag.com'] ●Find apparel sites like Gap [word for (word, score) in model.most_similar('gap.com', topn = 100) if '.com' in word] ['modcloth.com', 'bananarepublic.com', 'shopjustice.com', 'thelimited.com', 'jcrew.com', 'gymboree.com', 'abercrombie- .com', 'express.com', 'hollister- .com', 'abercrombiekids- .com', 'thredup.com', 'neimanmarcus .com']

More word2vec applications ●Find relationships between products ● model.most_similar(positive=['iphone', 'galaxy'], negative=['apple']) = ‘samsung’ ● ie. iphone::apple as galaxy::? samsung! ●Distinguish different companies ● model.doesnt_match(['sheraton','westin','aloft','walmart']) = ‘walmart’ ● ie. Wal Mart does not match Sheraton, Westin, and Aloft hotels ●Other possibilities o Find different companies with similar marketing copy o Automatically construct high-performing subject lines o Many more...

Try it yourself ●C implementation exists, but I recommend Python o gensim library: o tutorial: 2vec.htmlhttp://radimrehurek.com/gensim/models/word 2vec.html o webapp to try it out as part of tutorial o Pretrained Google News and Freebase models:

Thanks for listening! ●Many thanks to: o Data Science Association and Level 3 o Michael Walker for organizing ●Slides posted on ● me at ●Return Path is hiring! Voted #2 best midsized company to work for in the country