Presentation is loading. Please wait.

Presentation is loading. Please wait.

雲端計算.

Similar presentations


Presentation on theme: "雲端計算."— Presentation transcript:

1 雲端計算

2 Tensorflow Use multiple features to further improve the effectiveness of a model Debug issues in model input data

3 Tensorflow

4 Load the necessary libraries and import data
import math from IPython import display from matplotlib import cm from matplotlib import gridspec import matplotlib.pyplot as plt import numpy as np import pandas as pd import sklearn.metrics as metrics import tensorflow as tf from tensorflow.python.data import Dataset tf.logging.set_verbosity(tf.logging.ERROR) pd.options.display.max_rows = 10 pd.options.display.float_format = '{:.1f}'.format california_housing_dataframe = pd.read_csv(" sep=",")

5 Prepares input features from data set
def preprocess_features(california_housing_dataframe): selected_features = california_housing_dataframe[ [“latitude”, “longitude”, “housing_median_age”, "total_rooms", “total_bedrooms”, “population”, “households”, "median_income"]] processed_features = selected_features.copy() # Create a synthetic feature. processed_features["rooms_per_person"] = ( california_housing_dataframe["total_rooms"] / california_housing_dataframe["population"]) return processed_features 複製selected features 加入新的feature

6 Prepares target features (labels) from data set
def preprocess_targets(california_housing_dataframe): output_targets = pd.DataFrame() # Scale the target to be in units of thousands of dollars. output_targets["median_house_value"] = ( california_housing_dataframe["median_house_value"] / ) return output_targets 建立空的 Daraframe 定義target feature

7 Training set Choose the first examples, out of the total of training_examples = preprocess_features(california_housing_dataframe.head(12000)) print(training_examples.describe()) 把前12000筆資料當作training set

8 Training targets (Label)
Choose the first examples, out of the total of training_targets = preprocess_targets(california_housing_dataframe.head(12000)) print(training_targets.describe())

9  Validation set Choose the last 5000 examples, out of the total of validation_examples = preprocess_features(california_housing_dataframe.tail(5000)) print(validation_examples.describe()) 把最後5000筆資料當作Validation set

10  Validation target Choose the last 5000 examples, out of the total of validation_targets = preprocess_targets(california_housing_dataframe.tail(5000)) print(validation_targets.describe())

11 Plot Latitude/Longitude vs. Median House Value
plt.figure(figsize=(13, 8)) ax = plt.subplot(1, 2, 1) ax.set_title("Validation Data") ax.set_autoscaley_on(False) ax.set_ylim([32, 43]) ax.set_autoscalex_on(False) ax.set_xlim([-126, -112]) plt.scatter(validation_examples["longitude"], validation_examples["latitude"], cmap="coolwarm", c=validation_targets["median_house_value"] / validation_targets["median_house_value"].max()) ax = plt.subplot(1,2,2) ax.set_title("Training Data") ax.set_autoscaley_on(False) ax.set_ylim([32, 43]) ax.set_autoscalex_on(False) ax.set_xlim([-126, -112]) plt.scatter(training_examples["longitude"], training_examples["latitude"], cmap="coolwarm", c=training_targets["median_house_value"] / training_targets["median_house_value"].max()) _ = plt.plot()

12 Compare with real map

13 Find the bug For any given feature or column, the distribution of values between the train and validation splits should be roughly equal. The fact shows that we likely have a fault in the way that our train and validation split was created. If we don't randomize the data properly before creating training and validation splits, then we may be in trouble if the data is given to us in some sorted order, which appears to be the case here.

14 Define input function def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None): # Convert pandas data into a dict of np arrays. features = {key:np.array(value) for key,value in dict(features).items()} # Construct a dataset, and configure batching/repeating. ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit ds = ds.batch(batch_size).repeat(num_epochs) # Shuffle the data, if specified. if shuffle: ds = ds.shuffle(10000) # Return the next batch of data. features, labels = ds.make_one_shot_iterator().get_next() return features, labels

15 Construct the tensorFlow feature columns
def construct_feature_columns(input_features): """Construct the TensorFlow Feature Columns. Args: input_features: The names of the numerical input features to use. Returns: A set of feature columns """ return set([tf.feature_column.numeric_column(my_feature) for my_feature in input_features]) Name of numerical input features set():建立一個無序不重複元素的集合

16 Train model

17 驗收 Train model

18 驗收 Train model


Download ppt "雲端計算."

Similar presentations


Ads by Google