0. 小结


  • TensorFlow的安装
  • TensorFlow的基本概念:Tensor 张量,Session 会话
  • TensorFlow中的数学计算
  • TensorFlow中权重和偏置初始值的生成:
n_features = 3
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))

bias = tf.Variable(tf.zeros(n_labels))
  • 激活函数Softmax()函数,将得分转换为概率
  • One-Hot Encoding,将抽象的标签或输入,转换成可计算的数值
  • Cross Entropy 交叉熵的计算,用于衡量两个向量之间的距离,常用于损失函数,衡量预测结果与标签之间的误差
  • Normalized Inputs输入数据的正规化,将其处理为较小方差,均值为0的输入数据
  • 使用交叉验证的方式,评价模型是否过拟合。(将数据分为训练集,验证集和测试集)
  • 随机梯度下降SGD,以及其改进方法Momentum和权值衰减
  • 超参数调整,mini_batch / inters_num / epoch
  • 最后是一个lab项目,通过给定的字母图片进行学习,最终调整其超参数,使其准确率超过80%


6. Installing TensorFlow


conda create --name=IntroToTensorFlow python=3 anaconda
source activate IntroToTensorFlow
conda install -c conda-forge tensorflow


import tensorflow as tf

# Create TensorFlow object called tensor
hello_constant = tf.constant('Hello World!')

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(hello_constant)

输出b'Hello World!'


import tensorflow as tf
import numpy as np

# 使用 NumPy 生成假数据(phony data), 总共 100 个点.
x_data = np.float32(np.random.rand(2, 100)) # 随机输入
y_data = np.dot([0.100, 0.200], x_data) + 0.300

# 构造一个线性模型
b = tf.Variable(tf.zeros([1]))
W = tf.Variable(tf.random_uniform([1, 2], -1.0, 1.0))
y = tf.matmul(W, x_data) + b

# 最小化方差
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# 初始化变量
init = tf.initialize_all_variables()

# 启动图 (graph)
sess = tf.Session()

# 拟合平面
for step in range(0, 201):
    if step % 20 == 0:
        print(step, sess.run(W), sess.run(b))

# 得到最佳拟合结果 W: [[0.100  0.200]], b: [0.300]
0 [[-0.29607844  0.49112239]] [ 0.74048108]
20 [[-0.0404522   0.21079057]] [ 0.36591059]
40 [[ 0.06075135  0.19667548]] [ 0.32181653]
60 [[ 0.08866727  0.19777994]] [ 0.3069748]
80 [[ 0.09665523  0.19910237]] [ 0.30218849]
100 [[ 0.09899885  0.19968568]] [ 0.3006795]
120 [[ 0.0996977  0.1998966]] [ 0.30020973]
140 [[ 0.09990822  0.19996706]] [ 0.3000645]
160 [[ 0.09997206  0.19998969]] [ 0.3000198]
180 [[ 0.09999147  0.1999968 ]] [ 0.30000606]
200 [[ 0.09999739  0.19999902]] [ 0.30000183]

7. 分析示例代码:

import tensorflow as tf

# Create TensorFlow object called hello_constant
hello_constant = tf.constant('Hello World!')

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(hello_constant)

Tensor 张量:

在TensorFlow中,数据不是以int string等存储,而是被包装在一个叫做tensor的对象中,比如上面的hello_constant = tf.constant('Hello World!')hello_constant就是一个0维的string tensor,另外还可以以其他的类型,其他的维度来定义tensor:

# A is a 0-dimensional int32 tensor
A = tf.constant(1234) 
# B is a 1-dimensional int32 tensor
B = tf.constant([123,456,789]) 
# C is a 2-dimensional int32 tensor
C = tf.constant([ [123,456,789], [222,333,444] ])

Session 会话:



一个TensorFlow Session是一个运行上述图的一个环境,session负责将操作指派给GPU或是CPU,比如下面的代码:

with tf.Session() as sess:
    output = sess.run(hello_constant)


# A is a 0-dimensional int32 tensor
A = tf.constant(1234) 
# B is a 1-dimensional int32 tensor
B = tf.constant([123,456,789]) 
# D is a 0-dimensional string tensor
D = tf.constant("Hello World!")

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(A)
    print("constant(1234):            ",type(output))
    output = sess.run(B)
    print("constant([123,456,789]):   ",type(output))
    output = sess.run(D)
    print("constant('Hello World!'):  ",type(output))
constant(1234):             <class 'numpy.int32'>
constant([123,456,789]):    <class 'numpy.ndarray'>
constant('Hello World!'):   <class 'bytes'>

8. Quiz: Tensorflow Input



x = tf.placeholder(tf.string)

with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'Hello World'})
x = tf.placeholder(tf.string)

with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'Hello World'})

9. Quiz: Tensorflow Math


x = tf.add(5, 2)  # 7
z = tf.subtract(10, 4) # 6
y = tf.multiply(2, 5)  # 10


x = tf.subtract(tf.constant(2.0),tf.constant(1)) 

出错为:TypeError: Input 'y' of 'Sub' Op has type int32 that does not match type float32 of argument 'x'.


x = tf.subtract(tf.cast(tf.constant(2.0), tf.int32), tf.constant(1))   # 1


# Quiz Solution
# Note: You can't run code in this tab
import tensorflow as tf

# TODO: Convert the following to TensorFlow:
x = tf.constant(10)
y = tf.constant(2)
z = tf.subtract(tf.divide(x,y),tf.cast(tf.constant(1), tf.float64))

# TODO: Print z from a session
with tf.Session() as sess:
    output = sess.run(z)

10. Transition to Classification


  • 使用tf.session
  • 使用tf.constant()
  • 使用tf.placeholder()feed_dict取得input
  • 使用数学运算符号tf.add(), tf.subtract(), tf.multiply(), and tf.divide()

13. Training Your Logistic Classifier 训练逻辑分类器


  • X是输入,作为输入的每一个图像,有且只有一个标签。
  • W和b分别是权重和偏置,根据带有标签的图像集,训练出最合适的W和b。
  • y是预测结果,y经过激活函数softmax函数转换后,得到其对应的概率。

14. TensorFlow Linear Function

假设我们要把图片识别为数字,y = Wx + b,x表示pixel的value,y表示识别后得到的数字,W表示权重,是决定x的影响因子。

Weights and Bias in TensorFlow:



  • tf.Variable():



n_features = 120
n_labels = 5
weights = tf.Variable(5)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    output = sess.run(weights)


<tf.Variable 'Variable_32:0' shape=() dtype=int32_ref>
  • weight: tf.truncated_normal()


n_features = 3
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))

init = tf.global_variables_initializer()
with tf.Session() as sess:
    output = sess.run(weights)


[[ 1.66512358  0.47815749  0.11390464  0.04166538 -0.59684688]
 [ 1.19306767  1.85732388  1.12105191 -0.04347496  0.31718659]
 [-0.8618989  -0.52991307  0.99672502  0.87029117  0.34897801]]
(3, 5)

features(3) 表示特征数,labels(5)表示输出的标签数目,weight的形状为(3,5)。

  • bias: tf.zeros()


n_features = 3
n_labels = 5
bias = tf.Variable(tf.zeros(n_labels))

init = tf.global_variables_initializer()
with tf.Session() as sess:
    output = sess.run(bias)

15. Quiz: Linear Function



  • 分别随机生成权重和偏置,然后计算预测结果:
# Quiz Solution
import tensorflow as tf

def get_weights(n_features, n_labels):
    Return TensorFlow weights
    :param n_features: Number of features
    :param n_labels: Number of labels
    :return: TensorFlow weights
    # TODO: Return weights
    return tf.Variable(tf.truncated_normal((n_features, n_labels)))

def get_biases(n_labels):
    Return TensorFlow bias
    :param n_labels: Number of labels
    :return: TensorFlow bias
    # TODO: Return biases
    return tf.Variable(tf.zeros(n_labels))

def linear(input, w, b):
    Return linear function in TensorFlow
    :param input: TensorFlow input
    :param w: TensorFlow weights
    :param b: TensorFlow biases
    :return: TensorFlow linear function
    # TODO: Linear Function (xW + b)
    return tf.add(tf.matmul(input, w), b)

Since xW in xW + b is matrix multiplication, you have to use the tf.matmul() function instead of tf.multiply(). Don’t forget that order matters in matrix multiplication, so tf.matmul(a,b) is not the same as tf.matmul(b,a).

  • 获取input并计算:
from tensorflow.examples.tutorials.mnist import input_data

def mnist_features_labels(n_labels):
    Gets the first <n> labels from the MNIST dataset
    :param n_labels: Number of labels to use
    :return: Tuple of feature list and label list
    mnist_features = []
    mnist_labels = []

    mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

    # In order to make quizzes run faster, we're only looking at 10000 images
    for mnist_feature, mnist_label in zip(*mnist.train.next_batch(10000)):

        # Add features and labels if it's for the first <n>th labels
        if mnist_label[:n_labels].any():

    return mnist_features, mnist_labels

# Number of features (28*28 image is 784 features)
n_features = 784
# Number of labels
n_labels = 3

# Features and Labels
features = tf.placeholder(tf.float32)
labels = tf.placeholder(tf.float32)

# Weights and Biases
w = get_weights(n_features, n_labels)
b = get_biases(n_labels)

# Linear Function xW + b
logits = linear(features, w, b)

# Training data
train_features, train_labels = mnist_features_labels(n_labels)

注意首先要使用函数 global_variables_initializer 进行初始化操作:

with tf.Session() as session:

    # Softmax
    prediction = tf.nn.softmax(logits)

    # Cross entropy
    # This quantifies how far off the predictions were.
    # You'll learn more about this in future lessons.
    cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), reduction_indices=1)

    # Training loss
    # You'll learn more about this in future lessons.
    loss = tf.reduce_mean(cross_entropy)

    # Rate at which the weights are changed
    # You'll learn more about this in future lessons.
    learning_rate = 0.08

    # Gradient Descent
    # This is the method used to train the model
    # You'll learn more about this in future lessons.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Run optimizer and get loss
    _, l = session.run(
        [optimizer, loss],
        feed_dict={features: train_features, labels: train_labels})

# Print loss
print('Loss: {}'.format(l))

16. Linear Update 线性更新



import numpy as np
t = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
u = np.array([1, 2, 3])
print(t + u)
[[ 2  4  6]
 [ 5  7  9]
 [ 8 10 12]
 [11 13 15]]

针对所有的元素都加了[1, 2, 3]

17. Quiz: Softmax



# Solution is available in the other "solution.py" tab
import numpy as np
def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    # TODO: Compute and return softmax(x)
    exp_x = np.exp(x)
    sum_exp_x = np.sum(exp_x, axis=0)
    y = exp_x / sum_exp_x
    return y

logits = [3.0, 1.0, 0.2]
[0.8360188  0.11314284 0.05083836]

18. Quiz: TensorFlow Softmax Workspaces


x = tf.nn.softmax([3.0, 1.0, 0.2])

with tf.Session() as sess:
    output = sess.run(x)
[0.8360188  0.11314284 0.05083836]


# Quiz Solution
import tensorflow as tf
def run():
    output = None
    logit_data = [3.0, 1.0, 0.1]
    logits = tf.placeholder(tf.float32)

    softmax = tf.nn.softmax(logits)

    with tf.Session() as sess:
        output = sess.run(softmax, feed_dict={logits: logit_data})

    return output


输出[ 0.84008306 0.11369288 0.04622407]

  1. 如果将计算的结果,乘以10,则随后softmax计算出来的概率,要么趋于0要么趋于1。

  2. 如果将计算结果除以10,那最终的概率呈现均已分布。


[  1.00000000e+00   2.06115369e-09   2.54366569e-13]

[ 0.32757813  0.35133022  0.32109165]

19. One-Hot Encoding


21. Cross Entropy 交叉熵




22. Minimizing Cross Entropy



23. Practical Aspects of Learning


  1. 如何把图像像素输入到分类器(How do you fill image pixels to this classfier?)
  2. 在哪里初始化最优化过程(Where do you initialize the optimization?)

24. Quiz: Numerical Stability


a = 1000000000
for i in range(1000000):
    a = a + 1e-6
print(a - 1000000000)


25. Normalized Inputs and Initial Weights



  1. 平均值接近0
  2. 较小的方差





理想的权值和偏置初始值,有利于做梯度下降。一种简单的方法是: 从高斯分布上随机获取初始权重,使这些权重的均值为0,标准差为σ(是离均差平方的算术平均数的平方根)。




26. Measuring Performance



27. Transition: Overfitting


交叉验证是一种评估 ML 模型的方法,具体方法是通过使用可用输入数据子集训练多个 ML 模型并使用补充数据子集对其进行评估。使用交叉验证来检测过拟合,即无法泛化模式。



32. Stochastic Gradient Descent 随机梯度下降


33. Momentum and Learning Rate Decay

Momentum and Learning Rate Decay 动量和学习速率衰减

参考 参数的更新

34. Parameter Hyperspace 超参数调整

参考 超参数的验证

35. Quiz 2: Mini-batch

import math
def batches(batch_size, features, labels):
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    assert len(features) == len(labels)
    outout_batches = []
    sample_size = len(features)
    for start_i in range(0, sample_size, batch_size):
        end_i = start_i + batch_size
        batch = [features[start_i:end_i], labels[start_i:end_i]]
    return outout_batches
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
36. Epochs

【书】深度学习入门-03-神经网络 中,有比较详细的用法。

An epoch is a single forward and backward pass of the whole dataset. This is used to increase the accuracy of the model without requiring more data. This section will cover epochs in TensorFlow and how to choose the right number of epochs.

The following TensorFlow code trains a model using 10 epochs.

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches  # Helper function created in Mini-batching section

def print_epoch_stats(epoch_i, sess, last_features, last_labels):
    Print cost and validation accuracy of an epoch
    current_cost = sess.run(
        feed_dict={features: last_features, labels: last_labels})
    valid_accuracy = sess.run(
        feed_dict={features: valid_features, labels: valid_labels})
    print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format(

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
valid_features = mnist.validation.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
valid_labels = mnist.validation.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
learning_rate = tf.placeholder(tf.float32)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

init = tf.global_variables_initializer()

batch_size = 128
epochs = 10
learn_rate = 0.001

train_batches = batches(batch_size, train_features, train_labels)

with tf.Session() as sess:

    # Training cycle
    for epoch_i in range(epochs):

        # Loop over all batches
        for batch_features, batch_labels in train_batches:
            train_feed_dict = {
                features: batch_features,
                labels: batch_labels,
                learning_rate: learn_rate}
            sess.run(optimizer, feed_dict=train_feed_dict)

        # Print cost and validation accuracy of an epoch
        print_epoch_stats(epoch_i, sess, batch_features, batch_labels)

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))


Epoch: 0    - Cost: 11.0     Valid Accuracy: 0.204
Epoch: 1    - Cost: 9.95     Valid Accuracy: 0.229
Epoch: 2    - Cost: 9.18     Valid Accuracy: 0.246
Epoch: 3    - Cost: 8.59     Valid Accuracy: 0.264
Epoch: 4    - Cost: 8.13     Valid Accuracy: 0.283
Epoch: 5    - Cost: 7.77     Valid Accuracy: 0.301
Epoch: 6    - Cost: 7.47     Valid Accuracy: 0.316
Epoch: 7    - Cost: 7.2      Valid Accuracy: 0.328
Epoch: 8    - Cost: 6.96     Valid Accuracy: 0.342
Epoch: 9    - Cost: 6.73     Valid Accuracy: 0.36 
Test Accuracy: 0.3801000118255615


Epoch: 65   - Cost: 0.122    Valid Accuracy: 0.868
Epoch: 66   - Cost: 0.121    Valid Accuracy: 0.868
Epoch: 67   - Cost: 0.12     Valid Accuracy: 0.868
Epoch: 68   - Cost: 0.119    Valid Accuracy: 0.868
Epoch: 69   - Cost: 0.118    Valid Accuracy: 0.868
Epoch: 70   - Cost: 0.118    Valid Accuracy: 0.868
Epoch: 71   - Cost: 0.117    Valid Accuracy: 0.868
Epoch: 72   - Cost: 0.116    Valid Accuracy: 0.868
Epoch: 73   - Cost: 0.115    Valid Accuracy: 0.868
Epoch: 74   - Cost: 0.115    Valid Accuracy: 0.868
Epoch: 75   - Cost: 0.114    Valid Accuracy: 0.868
Epoch: 76   - Cost: 0.113    Valid Accuracy: 0.868
Epoch: 77   - Cost: 0.113    Valid Accuracy: 0.868
Epoch: 78   - Cost: 0.112    Valid Accuracy: 0.868
Epoch: 79   - Cost: 0.111    Valid Accuracy: 0.868
Epoch: 80   - Cost: 0.111    Valid Accuracy: 0.869
Test Accuracy: 0.86909999418258667

37. Intro TensorFlow Neural Network

The notebook has 3 problems for you to solve:

  • Problem 1: Normalize the features
  • Problem 2: Use TensorFlow operations to create features, labels, weight, and biases tensors
  • Problem 3: Tune the learning rate, number of steps, and batch size for the best accuracy




37.1 导入所需要的库

import hashlib
import os
import pickle
from urllib.request import urlretrieve

import numpy as np
from PIL import Image
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
from sklearn.utils import resample
from tqdm import tqdm
from zipfile import ZipFile

print('All modules imported.')

37.2 下载数据

def download(url, file):
    Download file from <url>
    :param url: URL to file
    :param file: Local file path
    if not os.path.isfile(file):
        print('Downloading ' + file + '...')
        urlretrieve(url, file)
        print('Download Finished')

# Download the training and test dataset.
download('https://s3.amazonaws.com/udacity-sdc/notMNIST_train.zip', 'notMNIST_train.zip')
download('https://s3.amazonaws.com/udacity-sdc/notMNIST_test.zip', 'notMNIST_test.zip')

# Make sure the files aren't corrupted
assert hashlib.md5(open('notMNIST_train.zip', 'rb').read()).hexdigest() == 'c8673b3f28f489e9cdf3a3d74e2ac8fa',\
        'notMNIST_train.zip file is corrupted.  Remove the file and try again.'
assert hashlib.md5(open('notMNIST_test.zip', 'rb').read()).hexdigest() == '5d3c7e653e63471c88df796156a9dfa9',\
        'notMNIST_test.zip file is corrupted.  Remove the file and try again.'

# Wait until you see that all files have been downloaded.
print('All files downloaded.')
Downloading notMNIST_train.zip...
Download Finished
Downloading notMNIST_test.zip...
Download Finished
All files downloaded.

37.3 解压数据

def uncompress_features_labels(file):
    Uncompress features and labels from a zip file
    :param file: The zip file to extract the data from
    features = []
    labels = []

    with ZipFile(file) as zipf:
        # Progress Bar
        filenames_pbar = tqdm(zipf.namelist(), unit='files')
        # Get features and labels from all files
        for filename in filenames_pbar:
            # Check if the file is a directory
            if not filename.endswith('/'):
                with zipf.open(filename) as image_file:
                    image = Image.open(image_file)
                    # Load image data as 1 dimensional array
                    # We're using float32 to save on memory space
                    feature = np.array(image, dtype=np.float32).flatten()

                # Get the the letter from the filename.  This is the letter of the image.
                label = os.path.split(filename)[1][0]

    return np.array(features), np.array(labels)

# Get the features and labels from the zip files
train_features, train_labels = uncompress_features_labels('notMNIST_train.zip')
test_features, test_labels = uncompress_features_labels('notMNIST_test.zip')

# Limit the amount of data to work with a docker container
docker_size_limit = 150000
train_features, train_labels = resample(train_features, train_labels, n_samples=docker_size_limit)

# Set flags for feature engineering.  This will prevent you from skipping an important step.
is_features_normal = False
is_labels_encod = False

# Wait until you see that all features and labels have been uncompressed.
print('All features and labels uncompressed.')

37.4 Problem 1 正规化输入数据




# Problem 1 - Implement Min-Max scaling for grayscale image data
def normalize_grayscale(image_data):
    Normalize the image data with Min-Max scaling to a range of [0.1, 0.9]
    :param image_data: The image data to be normalized
    :return: Normalized image data
    # TODO: Implement Min-Max scaling for grayscale image data
    a = 0.1
    b = 0.9
    grayscale_min = 0
    grayscale_max = 255
    return a + ( ( (image_data - grayscale_min)*(b - a) )/( grayscale_max - grayscale_min ) )
# Test Cases
    normalize_grayscale(np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 255])),
    [0.1, 0.103137254902, 0.106274509804, 0.109411764706, 0.112549019608, 0.11568627451, 0.118823529412, 0.121960784314,
     0.125098039216, 0.128235294118, 0.13137254902, 0.9],
    normalize_grayscale(np.array([0, 1, 10, 20, 30, 40, 233, 244, 254,255])),
    [0.1, 0.103137254902, 0.13137254902, 0.162745098039, 0.194117647059, 0.225490196078, 0.830980392157, 0.865490196078,
     0.896862745098, 0.9])

if not is_features_normal:
    train_features = normalize_grayscale(train_features)
    test_features = normalize_grayscale(test_features)
    is_features_normal = True

print('Tests Passed!')

37.5 One-Hot Encoded


if not is_labels_encod:
    # Turn labels into numbers and apply One-Hot Encoding
    encoder = LabelBinarizer()
    train_labels = encoder.transform(train_labels)
    test_labels = encoder.transform(test_labels)

    # Change to float32, so it can be multiplied against the features in TensorFlow, which are float32
    train_labels = train_labels.astype(np.float32)
    test_labels = test_labels.astype(np.float32)
    is_labels_encod = True

print('Labels One-Hot Encoded')
  • 查看 print(test_labels)后,为1. 0. 0. ..., 0. 0. 0.]

  • 其形状print(test_labels.shape)输出为(10000, 10),因为是A-J的字母,故个数为10个。

37.6 获取训练数据和验证数据

assert is_features_normal, 'You skipped the step to normalize the features'
assert is_labels_encod, 'You skipped the step to One-Hot Encode the labels'

# Get randomized datasets for training and validation
train_features, valid_features, train_labels, valid_labels = train_test_split(

print('Training features and labels randomized and split.')

37.7 将数据保存到pickle中

# Save the data for easy access
pickle_file = 'notMNIST.pickle'
if not os.path.isfile(pickle_file):
    print('Saving data to pickle file...')
        with open('notMNIST.pickle', 'wb') as pfile:
                    'train_dataset': train_features,
                    'train_labels': train_labels,
                    'valid_dataset': valid_features,
                    'valid_labels': valid_labels,
                    'test_dataset': test_features,
                    'test_labels': test_labels,
                pfile, pickle.HIGHEST_PROTOCOL)
    except Exception as e:
        print('Unable to save data to', pickle_file, ':', e)

print('Data cached in pickle file.')


37.8 获取数据

%matplotlib inline

# Load the modules
import pickle
import math

import numpy as np
import tensorflow as tf
from tqdm import tqdm
import matplotlib.pyplot as plt

# Reload the data
pickle_file = 'notMNIST.pickle'
with open(pickle_file, 'rb') as f:
  pickle_data = pickle.load(f)
  train_features = pickle_data['train_dataset']
  train_labels = pickle_data['train_labels']
  valid_features = pickle_data['valid_dataset']
  valid_labels = pickle_data['valid_labels']
  test_features = pickle_data['test_dataset']
  test_labels = pickle_data['test_labels']
  del pickle_data  # Free up memory

print('Data and modules loaded.')

38.8 Problem 2

  • features
    • Placeholder tensor for feature data (train_features/valid_features/test_features)
  • labels
    • Placeholder tensor for label data (train_labels/valid_labels/test_labels)
  • weights
  • biases


features_count = 784
labels_count = 10

# Problem 2 - Set the features and labels tensors
features = tf.placeholder(tf.float32)
labels = tf.placeholder(tf.float32)

# Problem 2 - Set the weights and biases tensors
weights = tf.Variable(tf.truncated_normal((features_count, labels_count)))
biases = tf.Variable(tf.zeros(labels_count))


#Test Cases
from tensorflow.python.ops.variables import Variable

assert features._op.name.startswith('Placeholder'), 'features must be a placeholder'
assert labels._op.name.startswith('Placeholder'), 'labels must be a placeholder'
assert isinstance(weights, Variable), 'weights must be a TensorFlow variable'
assert isinstance(biases, Variable), 'biases must be a TensorFlow variable'

assert features._shape == None or (\
    features._shape.dims[0].value is None and\
    features._shape.dims[1].value in [None, 784]), 'The shape of features is incorrect'
assert labels._shape  == None or (\
    labels._shape.dims[0].value is None and\
    labels._shape.dims[1].value in [None, 10]), 'The shape of labels is incorrect'
assert weights._variable._shape == (784, 10), 'The shape of weights is incorrect'
assert biases._variable._shape == (10), 'The shape of biases is incorrect'

assert features._dtype == tf.float32, 'features must be type float32'
assert labels._dtype == tf.float32, 'labels must be type float32'

# Feed dicts for training, validation, and test session
train_feed_dict = {features: train_features, labels: train_labels}
valid_feed_dict = {features: valid_features, labels: valid_labels}
test_feed_dict = {features: test_features, labels: test_labels}

# Linear Function WX + b
logits = tf.matmul(features, weights) + biases

prediction = tf.nn.softmax(logits)

# Cross entropy
cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), axis=1)

# some students have encountered challenges using this function, and have resolved issues
# using https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits
# please see this thread for more detail https://discussions.udacity.com/t/accuracy-0-10-in-the-intro-to-tensorflow-lab/272469/9

# Training loss
loss = tf.reduce_mean(cross_entropy)

# Create an operation that initializes all variables
init = tf.global_variables_initializer()

# Test Cases
with tf.Session() as session:
    session.run(loss, feed_dict=train_feed_dict)
    session.run(loss, feed_dict=valid_feed_dict)
    session.run(loss, feed_dict=test_feed_dict)
    biases_data = session.run(biases)

assert not np.count_nonzero(biases_data), 'biases must be zeros'

print('Tests Passed!')
# Determine if the predictions are correct
is_correct_prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(labels, 1))
# Calculate the accuracy of the predictions
accuracy = tf.reduce_mean(tf.cast(is_correct_prediction, tf.float32))

print('Accuracy function created.')

38.9 Problem 3 调整超参数

# TODO: Find the best parameters for each configuration
epochs = 5
batch_size = 100
learning_rate = 0.2

# Gradient Descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)    

# The accuracy measured against the validation set
validation_accuracy = 0.0

# Measurements use for graphing loss and accuracy
log_batch_step = 50
batches = []
loss_batch = []
train_acc_batch = []
valid_acc_batch = []

with tf.Session() as session:
    batch_count = int(math.ceil(len(train_features)/batch_size))

    for epoch_i in range(epochs):
        # Progress bar
        batches_pbar = tqdm(range(batch_count), desc='Epoch {:>2}/{}'.format(epoch_i+1, epochs), unit='batches')
        # The training cycle
        for batch_i in batches_pbar:
            # Get a batch of training features and labels
            batch_start = batch_i*batch_size
            batch_features = train_features[batch_start:batch_start + batch_size]
            batch_labels = train_labels[batch_start:batch_start + batch_size]

            # Run optimizer and get loss
            _, l = session.run(
                [optimizer, loss],
                feed_dict={features: batch_features, labels: batch_labels})

            # Log every 50 batches
            if not batch_i % log_batch_step:
                # Calculate Training and Validation accuracy
                training_accuracy = session.run(accuracy, feed_dict=train_feed_dict)
                validation_accuracy = session.run(accuracy, feed_dict=valid_feed_dict)

                # Log batches
                previous_batch = batches[-1] if batches else 0
                batches.append(log_batch_step + previous_batch)

        # Check accuracy against Validation data
        validation_accuracy = session.run(accuracy, feed_dict=valid_feed_dict)

loss_plot = plt.subplot(211)
loss_plot.plot(batches, loss_batch, 'g')
loss_plot.set_xlim([batches[0], batches[-1]])
acc_plot = plt.subplot(212)
acc_plot.plot(batches, train_acc_batch, 'r', label='Training Accuracy')
acc_plot.plot(batches, valid_acc_batch, 'x', label='Validation Accuracy')
acc_plot.set_ylim([0, 1.0])
acc_plot.set_xlim([batches[0], batches[-1]])

print('Validation accuracy at {}'.format(validation_accuracy))


  • epochs = 1 、 batch_size = 50 、 learning_rate = 0.01


  • epochs = 1 、 batch_size = 100 、 learning_rate = 0.01


  • epochs = 5 、 batch_size = 100 、 learning_rate = 0.2


39. epochs和batch_size以及iters_num

1. batch_size:



x, t = get_data()
network = init_network()
batch_size = 100 # 批数量
accuracy_cnt = 0

for i in range(0, len(x), batch_size):
    x_batch = x[i:i+batch_size]
    y_batch = predict(network, x_batch)

    # 沿着第1维方向(以第1维为轴)找到值最大的元素的索引
    p = np.argmax(y_batch, axis=1) 
    accuracy_cnt += np.sum(p == t[i:i+batch_size])

print("Accuracy:" + str(float(accuracy_cnt) / len(x)))
  1. 首先range函数,生成了以batch_size为一段的数据,比如list( range(0, 10, 3) )结果为[0, 3, 6, 9]
  2. x_batch中获取的是一批输入数据,然后利用numpy的广播特性进行计算。

2. iter_num:


# coding: utf-8
import sys, os
sys.path.append(os.pardir)  # 为了导入父目录的文件而进行的设定
import numpy as np
import matplotlib.pyplot as plt
from dataset.mnist import load_mnist
from two_layer_net import TwoLayerNet

# 读入数据
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

iters_num = 1000  # 适当设定循环的次数
train_size = x_train.shape[0]
batch_size = 100
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]
    # 计算梯度
    grad = network.numerical_gradient(x_batch, t_batch) # 数值微分法
    #grad = network.gradient(x_batch, t_batch) # 反向传播法
    # 更新参数
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key]
    # 记录学习过程
    loss = network.loss(x_batch, t_batch)

# 绘制图形
markers = {'train': 'o', 'test': 's'}
x = np.arange(iters_num)
plt.plot(x, train_loss_list)
plt.ylim(0, 5)


上面mini-batch大小为100,需要从60000个训练数据中随机取出100个,对着100个数据的mini-batch求梯度,使用SGD更新参数。 这里梯度法的更新次数是1000,每更新一次,都对训练数据计算损失函数的值,并添加到数组中,用图像来表示就是上图。

3. epoch




# coding: utf-8
import sys, os
sys.path.append(os.pardir)  # 为了导入父目录的文件而进行的设定
import numpy as np
import matplotlib.pyplot as plt
from dataset.mnist import load_mnist
from two_layer_net import TwoLayerNet

# 读入数据
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

iters_num = 10000  # 适当设定循环的次数
train_size = x_train.shape[0]
batch_size = 100
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]
    # 计算梯度,这里使用数值微分法
    grad = network.numerical_gradient(x_batch, t_batch) # 数值微分法
    #grad = network.gradient(x_batch, t_batch) # 反向传播法
    # 更新参数
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key]
    loss = network.loss(x_batch, t_batch)
    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        test_acc = network.accuracy(x_test, t_test)
        print("train acc, test acc | " + str(train_acc) + ", " + str(test_acc))

# 绘制图形
markers = {'train': 'o', 'test': 's'}
x = np.arange(len(train_acc_list))
plt.plot(x, train_acc_list, label='train acc')
plt.plot(x, test_acc_list, label='test acc', linestyle='--')
plt.ylim(0, 1.0)
plt.legend(loc='lower right')


0.1043 , 0.1041
0.904633333333 , 0.9079
0.921 , 0.9236
0.9321 , 0.9338
0.9436 , 0.9426
0.95025 , 0.9494
0.956133333333 , 0.9531
0.960166666667 , 0.9564
0.9638 , 0.959
0.965933333333 , 0.9607
0.9682 , 0.9619
0.970266666667 , 0.9621
0.97075 , 0.9641
0.973583333333 , 0.9669
0.974083333333 , 0.9663
0.975666666667 , 0.9664
0.9777 , 0.9683


通过if i % iter_per_epoch == 0:控制精度计算的频率:

iters_num = 10000  # 适当设定循环的次数
train_size = x_train.shape[0]
batch_size = 100
# 训练数据的数量 / batch的大小(一次处理的数量)
iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)