Deep Learning for Natural Language Processing in R

Deep Learning for Natural Language Processing in R
7/23/2018 9:12 AM Deep Learning for Natural Language Processing in R Angus Taylor Miguel Fierro © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Outline Text categorization with convolutional neural networks
7/23/2018 9:12 AM Outline Text categorization with convolutional neural networks Implementation in R using MXNet deep learning framework © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Why convolutional neural networks?
7/23/2018 9:12 AM Why convolutional neural networks? State-of-the-art performance Learns complex, hierarchical features Learning “from scratch” – no feature engineering © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Convolutions w1 w2 w3 w4 w5 w6 w7 w8 w9 7/23/2018 9:12 AM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Crepe model Convolutions over character vectors
7/23/2018 9:12 AM Crepe model Convolutions over character vectors 6 convolutional layers and 3 fully connected layers Useful for text categorization and sentiment analysis reference: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. NIPS 2015 © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Amazon categories dataset
2 million Amazon product reviews Each labelled as one of 7 product categories “Get what you pay for... Needed this one to match an existing unit. It got torn in 3 places during the assembly” Icon source: © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Data preparation “a good book!” 7/23/2018 9:12 AM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

One-hot encoding Pad with zeros Alphabet length = 69
7/23/2018 9:12 AM One-hot encoding Pad with zeros a g o d b k ! 1 … c e f Alphabet length = 69 Feature length = 1014 © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Network specification with MXNet
7/23/2018 9:12 AM Network specification with MXNet Construct network graph with symbolic expressions Start with symbols for data input and class labels: input_x <- mx.symbol.Variable(“data”) input_y <- mx.symbol.Variable(“softmax_label”) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

7/23/2018 9:12 AM Convolutional stage a g o d b k ! 1 … c e f … filters = 256 conv1 <- mx.symbol.Convolution(data = input_x, kernel = c(7, vocab_size), num_filter = 256) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

… Activation function ReLU
7/23/2018 9:12 AM Activation function ReLU … relu1 <- mx.symbol.Activation(data = conv1, act_type = “relu”) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

7/23/2018 9:12 AM Pooling stage … … pool1 <- mx.symbol.Pooling(data = relu1, pool_type = “max”, kernel = c(3, 1), stride = c(3, 1)) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Stacked convolutional layers
7/23/2018 9:12 AM Stacked convolutional layers Pooling 1 x 336 x 256 Pooling 1 x 110 x 256 Convolution 1 x 106 x 256 Convolution 1 x 102 x 256 1 … Input 69 x 1014 Convolution 1 x 1008 x 256 Convolution 1 x 330 x 256 Convolution 1 x 108 x 256 Convolution 1 x 104 x 256 © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Flattening flatten <- mx.symbol.Flatten(data = pool6)
7/23/2018 9:12 AM Flattening flatten <- mx.symbol.Flatten(data = pool6) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Fully connected layers
7/23/2018 9:12 AM Fully connected layers fc1 <- mx.symbol.FullyConnected(data = flatten, num_hidden = 1024) act_fc1 <- mx.symbol.Activation(data = fc1, act_type = “relu”) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Dropout drop1 <- mx.symbol.Dropout(act_fc1, p = 0.5)
7/23/2018 9:12 AM Dropout drop1 <- mx.symbol.Dropout(act_fc1, p = 0.5) drop2 <- mx.symbol.Dropout(act_fc2, p = 0.5) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

7/23/2018 9:12 AM Full network Classification Feature extraction fc3 <- mx.symbol.FullyConnected(data = drop2, num_hidden = 7) crepe <- mx.symbol.SoftmaxOutput(data = fc3, label = input_y, name = “softmax”) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

7/23/2018 9:12 AM input.x <- mx.symbol.Variable('data') input.y <- mx.symbol.Variable('softmax_label’) conv1 <- mx.symbol.Convolution(data=input.x, kernel=c(7, vocab.size), num_filter=num_filters) relu1 <- mx.symbol.Activation(data=conv1, act_type=act_type) pool1 <- mx.symbol.Pooling(data=relu1, pool_type=pool_type, kernel=kernel3, stride=stride) conv2 <- mx.symbol.Convolution(data=pool1, kernel=kernel7, num_filter=num_filters) relu2 <- mx.symbol.Activation(data=conv2, act_type=act_type) pool2 <- mx.symbol.Pooling(data=relu2, pool_type=pool_type, kernel=kernel3, stride=stride) conv3 <- mx.symbol.Convolution(data=pool2, kernel=kernel3, num_filter=num_filters) relu3 <- mx.symbol.Activation(data=conv3, act_type=act_type) conv4 <- mx.symbol.Convolution(data=relu3, kernel=kernel3, num_filter=num_filters) relu4 <- mx.symbol.Activation(data=conv4, act_type=act_type) conv5 <- mx.symbol.Convolution(data=relu4, kernel=kernel3, num_filter=num_filters) relu5 <- mx.symbol.Activation(data=conv5, act_type=act_type) conv6 <- mx.symbol.Convolution(data=relu5, kernel=kernel3, num_filter=num_filters) relu6 <- mx.symbol.Activation(data=conv6, act_type=act_type) pool6 <- mx.symbol.Pooling(data=relu6, pool_type=pool_type, kernel=kernel3, stride=stride) flatten <- mx.symbol.Flatten(data=pool6) fc1 <- mx.symbol.FullyConnected(data=flatten, num_hidden=fully.connected.size) act_fc1 <- mx.symbol.Activation(data=fc1, act_type=act_type) drop1 <- mx.symbol.Dropout(act_fc1, p=drop) fc2 <- mx.symbol.FullyConnected(data=drop1, num_hidden=fully.connected.size) act_fc2 <- mx.symbol.Activation(data=fc2, act_type=act_type) drop2 <- mx.symbol.Dropout(act_fc2, p=drop) fc3 <- mx.symbol.FullyConnected(data=drop2, num_hidden=num.output.classes) crepe <- mx.symbol.SoftmaxOutput(data=fc3, label=input.y, name="softmax") Full network © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Network training model <- mx.model.Feedforward.create(
7/23/2018 9:12 AM Network training model <- mx.model.Feedforward.create( symbol = crepe, X = train_data_csv, eval.data = test_data_csv, num.round = num_epochs, optimizer = “sgd”, initializer = mx.init.normal(sd = 0.05), epoch.end.callback = mx.callback.save.checkpoint(“crepe_”) ) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Network training Training time 24 hours on virtual machine with 2 GPUs
7/23/2018 9:12 AM Network training Training time 24 hours on virtual machine with 2 GPUs Accuracy 84% © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Data Science Virtual Machine
7/23/2018 9:12 AM Data Science Virtual Machine Hosted on Microsoft Azure GPU-enabled (NVIDIA Tesla K80s and M60s) Pre-configured with: R CUDA drivers MXNet framework and R package Scalable Only pay for what you use © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Thank you Blog post: aka.ms/deep-learning-for-nlp-blog
GitHub: aka.ms/deep-learning-for-nlp © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Deep Learning for Natural Language Processing in R

Similar presentations

Presentation on theme: "Deep Learning for Natural Language Processing in R"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Deep Learning for Natural Language Processing in R

Similar presentations

Presentation on theme: "Deep Learning for Natural Language Processing in R"— Presentation transcript:

Similar presentations

About project

Feedback