Deploy Tensorflow on PySpark

Deploy Tensorflow on PySpark
2016 April AILab

1. Hyperparameter Tuning
2. Deploying models at scale

Hyperparameter Tuning
Machine learning practitioners rerun the same model multiple times with different hyperparameters in order to find the best set. This is a classical technique called hyperparameter tuning.

Hyperparameter Tuning
We can use Spark to broadcast the common elements such as data and model description, and then schedule the individual repetitive computations across a cluster of machines Fold 1 Fold 2 Fold 3 Distributing Cross Validation sets

Hyperparameter

Deploying Models at Scale
Broadcasted Trained Model ImageNet data Model Batch 1 Batch1 Labels Predicting Model Batch2 Labels Batch 2 Predicting Model Batch 3 Batch3 Labels Predicting ImageNet is a large collection of images from the internet that is commonly used as a benchmark in image recognition tasks.

We are now going to take an existing neural network model that has already been trained on a large corpus (the Inception V3 model), and we are going to apply it to images downloaded from the internet. : military uniform : suit, suit of clothes : academic gown, academic robe, judge's robe : bearskin, busby, shako : pickelhaube Inception-v3. image classifier is a CNN which is developed by GoogleInc

The model is first distributed to the workers of the clusters, using Spark’s built-in broadcasting mechanism: This model is loaded on each node and applied to images model sc.broadcast worker1 model model worker2 …

Read Test Data (ImageNet)
image_batch_size=3 >>> batched_data = read_file_index() batch1 batch2

Distribute Input data >>> urls = sc.parallelize(batched_data)
RDD (Batches) Tests images (id, url) Batches … model im [(id1, url1) (id2, url2) (id3, url3)] split into Batches batch1 sc.parallelize mapper1 [(id4, url4) (id5, url5) (id6, url6)] model [(id7, url7) (id8, url8) (id9, url9)] batch2 mapper2 …

Splitting Input data >>> labeled_images = urls.flatMap(apply_batch) Tensorflow graph Tensorflow graph Get model data (CNN graph) Get model data (CNN graph) Predict image based on trained model Broadcasted Decoding dictionary for labels Broadcasted Decoding dictionary for labels … model map(apply_batch) model Loads a human readable English name for each softmax node (1008 output nodes) We have too many nodes in the output layer. Each class is a node. It uses encoded output labels for memory efficiency batch1 batch1 mapper1 mapper1 model map(apply_batch) model batch2 batch2 mapper2 mapper2 … …

GraphDef The foundation of computation in TensorFlow is the Graph object. This holds a network of nodes, each representing one operation, connected to each other as inputs and outputs. After you've created a Graph object, you can save it out by calling as_graph_def(), which returns a GraphDef object.

Prediction Fetch an image from the web and uses the trained CNN to infer the topics of this image Download image from the internet Run CNN: Compute forward propagation based on image_data (input nodes/data) and trained weights Softmax_tensor is the output_layer tensor of the CNN Sort the probabilities Decode labels

Prediction output_layer tensor of the CNN (1008 output nodes) [
Remove one dimension (2 dimensions to 1 dimension) ] predictions =np.squeeze(predictions) top_k = predictions.argsort()[-5:][::-1] Indexes of top 5 labels sorted by their probabilities

Get Result … … … model model batch1 batch1 mapper1 mapper1 model model
map(apply_batch) model collect batch1 batch1 mapper1 mapper1 labels model map(apply_batch) model batch2 batch2 mapper2 mapper2 … …

Deploy Tensorflow on PySpark

Similar presentations

Presentation on theme: "Deploy Tensorflow on PySpark"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Deploy Tensorflow on PySpark

Similar presentations

Presentation on theme: "Deploy Tensorflow on PySpark"— Presentation transcript:

Similar presentations

About project

Feedback