Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Learning for Natural Language Processing in R

Similar presentations


Presentation on theme: "Deep Learning for Natural Language Processing in R"— Presentation transcript:

1 Deep Learning for Natural Language Processing in R
7/23/2018 9:12 AM Deep Learning for Natural Language Processing in R Angus Taylor Miguel Fierro © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

2 Outline Text categorization with convolutional neural networks
7/23/2018 9:12 AM Outline Text categorization with convolutional neural networks Implementation in R using MXNet deep learning framework © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

3 Why convolutional neural networks?
7/23/2018 9:12 AM Why convolutional neural networks? State-of-the-art performance Learns complex, hierarchical features Learning “from scratch” – no feature engineering © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4 Convolutions w1 w2 w3 w4 w5 w6 w7 w8 w9 7/23/2018 9:12 AM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5 Crepe model Convolutions over character vectors
7/23/2018 9:12 AM Crepe model Convolutions over character vectors 6 convolutional layers and 3 fully connected layers Useful for text categorization and sentiment analysis reference: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. NIPS 2015 © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6 Amazon categories dataset
2 million Amazon product reviews Each labelled as one of 7 product categories “Get what you pay for... Needed this one to match an existing unit. It got torn in 3 places during the assembly” Icon source: © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

7 Data preparation “a good book!” 7/23/2018 9:12 AM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

8 One-hot encoding Pad with zeros Alphabet length = 69
7/23/2018 9:12 AM One-hot encoding Pad with zeros a g o d b k ! 1 c e f Alphabet length = 69 Feature length = 1014 © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

9 Network specification with MXNet
7/23/2018 9:12 AM Network specification with MXNet Construct network graph with symbolic expressions Start with symbols for data input and class labels: input_x <- mx.symbol.Variable(“data”) input_y <- mx.symbol.Variable(“softmax_label”) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

10 7/23/2018 9:12 AM Convolutional stage a g o d b k ! 1 c e f filters = 256 conv1 <- mx.symbol.Convolution(data = input_x, kernel = c(7, vocab_size), num_filter = 256) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

11 … Activation function ReLU
7/23/2018 9:12 AM Activation function ReLU relu1 <- mx.symbol.Activation(data = conv1, act_type = “relu”) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

12 7/23/2018 9:12 AM Pooling stage pool1 <- mx.symbol.Pooling(data = relu1, pool_type = “max”, kernel = c(3, 1), stride = c(3, 1)) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

13 Stacked convolutional layers
7/23/2018 9:12 AM Stacked convolutional layers Pooling 1 x 336 x 256 Pooling 1 x 110 x 256 Convolution 1 x 106 x 256 Convolution 1 x 102 x 256 1 Input 69 x 1014 Convolution 1 x 1008 x 256 Convolution 1 x 330 x 256 Convolution 1 x 108 x 256 Convolution 1 x 104 x 256 © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

14 Flattening flatten <- mx.symbol.Flatten(data = pool6)
7/23/2018 9:12 AM Flattening flatten <- mx.symbol.Flatten(data = pool6) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

15 Fully connected layers
7/23/2018 9:12 AM Fully connected layers fc1 <- mx.symbol.FullyConnected(data = flatten, num_hidden = 1024) act_fc1 <- mx.symbol.Activation(data = fc1, act_type = “relu”) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

16 Dropout drop1 <- mx.symbol.Dropout(act_fc1, p = 0.5)
7/23/2018 9:12 AM Dropout drop1 <- mx.symbol.Dropout(act_fc1, p = 0.5) drop2 <- mx.symbol.Dropout(act_fc2, p = 0.5) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

17 7/23/2018 9:12 AM Full network Classification Feature extraction fc3 <- mx.symbol.FullyConnected(data = drop2, num_hidden = 7) crepe <- mx.symbol.SoftmaxOutput(data = fc3, label = input_y, name = “softmax”) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

18 7/23/2018 9:12 AM input.x <- mx.symbol.Variable('data') input.y <- mx.symbol.Variable('softmax_label’) conv1 <- mx.symbol.Convolution(data=input.x, kernel=c(7, vocab.size), num_filter=num_filters) relu1 <- mx.symbol.Activation(data=conv1, act_type=act_type) pool1 <- mx.symbol.Pooling(data=relu1, pool_type=pool_type, kernel=kernel3, stride=stride) conv2 <- mx.symbol.Convolution(data=pool1, kernel=kernel7, num_filter=num_filters) relu2 <- mx.symbol.Activation(data=conv2, act_type=act_type) pool2 <- mx.symbol.Pooling(data=relu2, pool_type=pool_type, kernel=kernel3, stride=stride) conv3 <- mx.symbol.Convolution(data=pool2, kernel=kernel3, num_filter=num_filters) relu3 <- mx.symbol.Activation(data=conv3, act_type=act_type) conv4 <- mx.symbol.Convolution(data=relu3, kernel=kernel3, num_filter=num_filters) relu4 <- mx.symbol.Activation(data=conv4, act_type=act_type) conv5 <- mx.symbol.Convolution(data=relu4, kernel=kernel3, num_filter=num_filters) relu5 <- mx.symbol.Activation(data=conv5, act_type=act_type) conv6 <- mx.symbol.Convolution(data=relu5, kernel=kernel3, num_filter=num_filters) relu6 <- mx.symbol.Activation(data=conv6, act_type=act_type) pool6 <- mx.symbol.Pooling(data=relu6, pool_type=pool_type, kernel=kernel3, stride=stride) flatten <- mx.symbol.Flatten(data=pool6) fc1 <- mx.symbol.FullyConnected(data=flatten, num_hidden=fully.connected.size) act_fc1 <- mx.symbol.Activation(data=fc1, act_type=act_type) drop1 <- mx.symbol.Dropout(act_fc1, p=drop) fc2 <- mx.symbol.FullyConnected(data=drop1, num_hidden=fully.connected.size) act_fc2 <- mx.symbol.Activation(data=fc2, act_type=act_type) drop2 <- mx.symbol.Dropout(act_fc2, p=drop) fc3 <- mx.symbol.FullyConnected(data=drop2, num_hidden=num.output.classes) crepe <- mx.symbol.SoftmaxOutput(data=fc3, label=input.y, name="softmax") Full network © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

19 Network training model <- mx.model.Feedforward.create(
7/23/2018 9:12 AM Network training model <- mx.model.Feedforward.create( symbol = crepe, X = train_data_csv, eval.data = test_data_csv, num.round = num_epochs, optimizer = “sgd”, initializer = mx.init.normal(sd = 0.05), epoch.end.callback = mx.callback.save.checkpoint(“crepe_”) ) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

20 Network training Training time 24 hours on virtual machine with 2 GPUs
7/23/2018 9:12 AM Network training Training time 24 hours on virtual machine with 2 GPUs Accuracy 84% © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

21 Data Science Virtual Machine
7/23/2018 9:12 AM Data Science Virtual Machine Hosted on Microsoft Azure GPU-enabled (NVIDIA Tesla K80s and M60s) Pre-configured with: R CUDA drivers MXNet framework and R package Scalable Only pay for what you use © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

22 Thank you Blog post: aka.ms/deep-learning-for-nlp-blog
GitHub: aka.ms/deep-learning-for-nlp © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Deep Learning for Natural Language Processing in R"

Similar presentations


Ads by Google