雲端計算
Tensorflow python
Tensorflow
Load the necessary libraries and import data import math from IPython import display from matplotlib import cm from matplotlib import gridspec import matplotlib.pyplot as plt import numpy as np import pandas as pd import sklearn.metrics as metrics import tensorflow as tf from tensorflow.python.data import Dataset tf.logging.set_verbosity(tf.logging.ERROR) pd.options.display.max_rows = 10 pd.options.display.float_format = '{:.1f}'.format california_housing_dataframe = pd.read_csv("https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv", sep=",") california_housing_dataframe = california_housing_dataframe.reindex( np.random.permutation(california_housing_dataframe.index)) california_housing_dataframe["median_house_value"] /= 1000.0 print(california_housing_dataframe)
Set up the input function def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None): # Convert pandas data into a dict of np arrays. features = {key:np.array(value) for key,value in dict(features).items()} # Construct a dataset, and configure batching/repeating. ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit ds = ds.batch(batch_size).repeat(num_epochs) # Shuffle the data, if specified. if shuffle: ds = ds.shuffle(buffer_size=10000) # Return the next batch of data. features, labels = ds.make_one_shot_iterator().get_next() return features, labels """Trains a linear regression model of one feature. 參數 Args: features: pandas DataFrame of features targets: pandas DataFrame of targets batch_size: Size of batches to be passed to the model shuffle: True or False. Whether to shuffle the data. num_epochs: Number of epochs for which data should be repeated. None = repeat indefinitely Returns: Tuple of (features, labels) for next data batch """ buffer_size :size of the dataset from which shuffle will randomly sample
Train the model
Synthetic feature: 由多個feature組成 california_housing_dataframe["rooms_per_person"] = ( california_housing_dataframe["total_rooms"] / california_housing_dataframe["population"]) calibration_data = train_model( learning_rate=0.05, steps=500, batch_size=5, input_feature="rooms_per_person") 用rooms_per_person當作feature
Use rooms_per_person as the input_feature to train_model()
Identify outliers Scatter plot of predictions vs. targets plt.figure(figsize=(15, 6)) plt.subplot(1, 2, 1) plt.scatter(calibration_data["predictions"], calibration_data["targets"]) subplot(m, n, k): 把圖形視窗切成 m x n 格,畫在第k格。 順序由左而右,由上而下。
Identify outliers Plot a histogram of rooms_per_person plt.subplot(1, 2, 2) _ = california_housing_dataframe["rooms_per_person"].hist()
Identify outliers
Clip outliers Setting the outlier values of feature to some reasonable minimum or maximum.
驗收: Clip outliers & histogram Setting the outlier values of rooms_per_person to some reasonable minimum or maximum. Plot a histogram of rooms_per_person Hint: use min()
Python3.5 modules
modules
Packages
Packages
驗收 建一個ramdom_List.py從0~100產生10個隨機的數字 可使用: random.sample(range(),) 建一個compare.py使用ramdom_List.py產生的數字找出最大最小值並做 排序 可使用: max()、min()、sorted()、sorted(,reverse=True)