Haste makes waste

Nano01(自動運転)-U03-Lesson12-TensorFlow

Posted on By lijun

image

0. 小结

本篇中主要内容包括:

  • TensorFlow的安装
  • TensorFlow的基本概念:Tensor 张量,Session 会话
  • TensorFlow中的数学计算
  • TensorFlow中权重和偏置初始值的生成:
n_features = 3
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))

bias = tf.Variable(tf.zeros(n_labels))
  • 激活函数Softmax()函数,将得分转换为概率
  • One-Hot Encoding,将抽象的标签或输入,转换成可计算的数值
  • Cross Entropy 交叉熵的计算,用于衡量两个向量之间的距离,常用于损失函数,衡量预测结果与标签之间的误差
  • Normalized Inputs输入数据的正规化,将其处理为较小方差,均值为0的输入数据
  • 使用交叉验证的方式,评价模型是否过拟合。(将数据分为训练集,验证集和测试集)
  • 随机梯度下降SGD,以及其改进方法Momentum和权值衰减
  • 超参数调整,mini_batch / inters_num / epoch
  • 最后是一个lab项目,通过给定的字母图片进行学习,最终调整其超参数,使其准确率超过80%

本章内容较多,如果光看视频,很难理解,结合另一本书«深度学习入门(基于Python的理论与实现)»才算大致弄懂,但很多代码细节没有看透。

6. Installing TensorFlow

Conda中安装TensorFlow:

conda create --name=IntroToTensorFlow python=3 anaconda
source activate IntroToTensorFlow
conda install -c conda-forge tensorflow

测试代码1:

import tensorflow as tf

# Create TensorFlow object called tensor
hello_constant = tf.constant('Hello World!')

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(hello_constant)
    print(output)

输出b'Hello World!'

测试代码2:

import tensorflow as tf
import numpy as np

# 使用 NumPy 生成假数据(phony data), 总共 100 个点.
x_data = np.float32(np.random.rand(2, 100)) # 随机输入
y_data = np.dot([0.100, 0.200], x_data) + 0.300

# 构造一个线性模型
# 
b = tf.Variable(tf.zeros([1]))
W = tf.Variable(tf.random_uniform([1, 2], -1.0, 1.0))
y = tf.matmul(W, x_data) + b

# 最小化方差
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# 初始化变量
init = tf.initialize_all_variables()

# 启动图 (graph)
sess = tf.Session()
sess.run(init)

# 拟合平面
for step in range(0, 201):
    sess.run(train)
    if step % 20 == 0:
        print(step, sess.run(W), sess.run(b))

# 得到最佳拟合结果 W: [[0.100  0.200]], b: [0.300]
0 [[-0.29607844  0.49112239]] [ 0.74048108]
20 [[-0.0404522   0.21079057]] [ 0.36591059]
40 [[ 0.06075135  0.19667548]] [ 0.32181653]
60 [[ 0.08866727  0.19777994]] [ 0.3069748]
80 [[ 0.09665523  0.19910237]] [ 0.30218849]
100 [[ 0.09899885  0.19968568]] [ 0.3006795]
120 [[ 0.0996977  0.1998966]] [ 0.30020973]
140 [[ 0.09990822  0.19996706]] [ 0.3000645]
160 [[ 0.09997206  0.19998969]] [ 0.3000198]
180 [[ 0.09999147  0.1999968 ]] [ 0.30000606]
200 [[ 0.09999739  0.19999902]] [ 0.30000183]

7. 分析示例代码:

import tensorflow as tf

# Create TensorFlow object called hello_constant
hello_constant = tf.constant('Hello World!')

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(hello_constant)
    print(output)

Tensor 张量:

在TensorFlow中,数据不是以int string等存储,而是被包装在一个叫做tensor的对象中,比如上面的hello_constant = tf.constant('Hello World!')hello_constant就是一个0维的string tensor,另外还可以以其他的类型,其他的维度来定义tensor:

# A is a 0-dimensional int32 tensor
A = tf.constant(1234) 
# B is a 1-dimensional int32 tensor
B = tf.constant([123,456,789]) 
# C is a 2-dimensional int32 tensor
C = tf.constant([ [123,456,789], [222,333,444] ])

Session 会话:

我们把上面运行的代码,可以用下面的图graph来描述:

image

一个TensorFlow Session是一个运行上述图的一个环境,session负责将操作指派给GPU或是CPU,比如下面的代码:

with tf.Session() as sess:
    output = sess.run(hello_constant)
    print(output)

上面的代码将张量hello_constant中的值计算出来,首先使用tf.session生成一个session,再使用sess.run()提取出张量hello_constant中的值。

# A is a 0-dimensional int32 tensor
A = tf.constant(1234) 
# B is a 1-dimensional int32 tensor
B = tf.constant([123,456,789]) 
# D is a 0-dimensional string tensor
D = tf.constant("Hello World!")

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(A)
    print("constant(1234):            ",type(output))
    
    output = sess.run(B)
    print("constant([123,456,789]):   ",type(output))
    
    output = sess.run(D)
    print("constant('Hello World!'):  ",type(output))
    
constant(1234):             <class 'numpy.int32'>
constant([123,456,789]):    <class 'numpy.ndarray'>
constant('Hello World!'):   <class 'bytes'>

8. Quiz: Tensorflow Input

上一个章节中,我们将一个tensor传给session然后返回了结果,这里tensor设定的是常量,如果需要使用非常量呢,这里需要使用tf.placeholder()feed_dict.

示例代码:

x = tf.placeholder(tf.string)

with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'Hello World'})
x = tf.placeholder(tf.string)

with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'Hello World'})

9. Quiz: Tensorflow Math

示例代码:

x = tf.add(5, 2)  # 7
z = tf.subtract(10, 4) # 6
y = tf.multiply(2, 5)  # 10

如果是类型不同的进行了运算,会出错:

x = tf.subtract(tf.constant(2.0),tf.constant(1)) 

出错为:TypeError: Input 'y' of 'Sub' Op has type int32 that does not match type float32 of argument 'x'.

需要修改为:

x = tf.subtract(tf.cast(tf.constant(2.0), tf.int32), tf.constant(1))   # 1

quiz:

# Quiz Solution
# Note: You can't run code in this tab
import tensorflow as tf

# TODO: Convert the following to TensorFlow:
x = tf.constant(10)
y = tf.constant(2)
z = tf.subtract(tf.divide(x,y),tf.cast(tf.constant(1), tf.float64))

# TODO: Print z from a session
with tf.Session() as sess:
    output = sess.run(z)
    print(output)

10. Transition to Classification

回顾一下,通过上面的学习,我们掌握了:

  • 使用tf.session
  • 使用tf.constant()
  • 使用tf.placeholder()feed_dict取得input
  • 使用数学运算符号tf.add(), tf.subtract(), tf.multiply(), and tf.divide()

13. Training Your Logistic Classifier 训练逻辑分类器

image

  • X是输入,作为输入的每一个图像,有且只有一个标签。
  • W和b分别是权重和偏置,根据带有标签的图像集,训练出最合适的W和b。
  • y是预测结果,y经过激活函数softmax函数转换后,得到其对应的概率。

14. TensorFlow Linear Function

假设我们要把图片识别为数字,y = Wx + b,x表示pixel的value,y表示识别后得到的数字,W表示权重,是决定x的影响因子。

Weights and Bias in TensorFlow:

神经网络训练的最终目的是,得到一个最好的权重weight和偏置bias,其预测结果最大程度与label标签相同。

在TensorFlow中,要使用weights和bias,那定义为的tensor需要是可以修改的,tf.placeholder()tf.constant()都不能修改,需要使用tf.Variabble()

  • tf.Variable():

上述函数生成一个能修改的带有初始值的tensor,这个tensor的状态存储再session中,所以需要手动初始化tensor状态。使用tf.global_variables_initializer()函数去初始化所有tensor变量的状态。

示例代码:

n_features = 120
n_labels = 5
weights = tf.Variable(5)
print(weights)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    output = sess.run(weights)
    print(output)

输出:

<tf.Variable 'Variable_32:0' shape=() dtype=int32_ref>
5
  • weight: tf.truncated_normal()

上述函数能生成一个正态分布的随机数,示例代码如下:

n_features = 3
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    output = sess.run(weights)
    print(output)    
    print(output.shape)

输出如下:

[[ 1.66512358  0.47815749  0.11390464  0.04166538 -0.59684688]
 [ 1.19306767  1.85732388  1.12105191 -0.04347496  0.31718659]
 [-0.8618989  -0.52991307  0.99672502  0.87029117  0.34897801]]
(3, 5)

features(3) 表示特征数,labels(5)表示输出的标签数目,weight的形状为(3,5)。

  • bias: tf.zeros()

生成全是0的的tensor:

n_features = 3
n_labels = 5
bias = tf.Variable(tf.zeros(n_labels))

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    output = sess.run(bias)
    print(output)    
    print(output.shape)

15. Quiz: Linear Function

image

使用TensorFlow识别上述的手写数字图像。相关代码如下:

  • 分别随机生成权重和偏置,然后计算预测结果:
# Quiz Solution
import tensorflow as tf

def get_weights(n_features, n_labels):
    """
    Return TensorFlow weights
    :param n_features: Number of features
    :param n_labels: Number of labels
    :return: TensorFlow weights
    """
    # TODO: Return weights
    return tf.Variable(tf.truncated_normal((n_features, n_labels)))


def get_biases(n_labels):
    """
    Return TensorFlow bias
    :param n_labels: Number of labels
    :return: TensorFlow bias
    """
    # TODO: Return biases
    return tf.Variable(tf.zeros(n_labels))


def linear(input, w, b):
    """
    Return linear function in TensorFlow
    :param input: TensorFlow input
    :param w: TensorFlow weights
    :param b: TensorFlow biases
    :return: TensorFlow linear function
    """
    # TODO: Linear Function (xW + b)
    return tf.add(tf.matmul(input, w), b)

Since xW in xW + b is matrix multiplication, you have to use the tf.matmul() function instead of tf.multiply(). Don’t forget that order matters in matrix multiplication, so tf.matmul(a,b) is not the same as tf.matmul(b,a).

  • 获取input并计算:
from tensorflow.examples.tutorials.mnist import input_data

def mnist_features_labels(n_labels):
    """
    Gets the first <n> labels from the MNIST dataset
    :param n_labels: Number of labels to use
    :return: Tuple of feature list and label list
    """
    mnist_features = []
    mnist_labels = []

    mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

    # In order to make quizzes run faster, we're only looking at 10000 images
    for mnist_feature, mnist_label in zip(*mnist.train.next_batch(10000)):

        # Add features and labels if it's for the first <n>th labels
        if mnist_label[:n_labels].any():
            mnist_features.append(mnist_feature)
            mnist_labels.append(mnist_label[:n_labels])

    return mnist_features, mnist_labels


# Number of features (28*28 image is 784 features)
n_features = 784
# Number of labels
n_labels = 3

# Features and Labels
features = tf.placeholder(tf.float32)
labels = tf.placeholder(tf.float32)

# Weights and Biases
w = get_weights(n_features, n_labels)
b = get_biases(n_labels)

# Linear Function xW + b
logits = linear(features, w, b)

# Training data
train_features, train_labels = mnist_features_labels(n_labels)

注意首先要使用函数 global_variables_initializer 进行初始化操作:

with tf.Session() as session:
    session.run(tf.global_variables_initializer())

    # Softmax
    prediction = tf.nn.softmax(logits)

    # Cross entropy
    # This quantifies how far off the predictions were.
    # You'll learn more about this in future lessons.
    cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), reduction_indices=1)

    # Training loss
    # You'll learn more about this in future lessons.
    loss = tf.reduce_mean(cross_entropy)

    # Rate at which the weights are changed
    # You'll learn more about this in future lessons.
    learning_rate = 0.08

    # Gradient Descent
    # This is the method used to train the model
    # You'll learn more about this in future lessons.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Run optimizer and get loss
    _, l = session.run(
        [optimizer, loss],
        feed_dict={features: train_features, labels: train_labels})

# Print loss
print('Loss: {}'.format(l))

16. Linear Update 线性更新

image

最后需要将bias偏置加到WX上去,这里用到numpy和TensorFlow的广播特性:

import numpy as np
t = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
u = np.array([1, 2, 3])
print(t + u)
[[ 2  4  6]
 [ 5  7  9]
 [ 8 10 12]
 [11 13 15]]

针对所有的元素都加了[1, 2, 3]

17. Quiz: Softmax

image

上面计算了线性函数的结果,现在需要将结果转换为概率,这里使用softmax(x)函数:

# Solution is available in the other "solution.py" tab
import numpy as np
def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    # TODO: Compute and return softmax(x)
    
    exp_x = np.exp(x)
    sum_exp_x = np.sum(exp_x, axis=0)
    y = exp_x / sum_exp_x
    
    return y

logits = [3.0, 1.0, 0.2]
print(softmax(logits))
[0.8360188  0.11314284 0.05083836]

18. Quiz: TensorFlow Softmax Workspaces

在tensorflow中,有对应的函数直接计算softmax值,代码如下:

x = tf.nn.softmax([3.0, 1.0, 0.2])

with tf.Session() as sess:
    output = sess.run(x)
    print(output)
[0.8360188  0.11314284 0.05083836]

Quiz的代码:

# Quiz Solution
import tensorflow as tf
def run():
    output = None
    logit_data = [3.0, 1.0, 0.1]
    logits = tf.placeholder(tf.float32)

    softmax = tf.nn.softmax(logits)

    with tf.Session() as sess:
        output = sess.run(softmax, feed_dict={logits: logit_data})

    return output

print(run())

输出[ 0.84008306 0.11369288 0.04622407]

  1. 如果将计算的结果,乘以10,则随后softmax计算出来的概率,要么趋于0要么趋于1。

  2. 如果将计算结果除以10,那最终的概率呈现均已分布。

下面是分别乘以10和除以10的结果:

[  1.00000000e+00   2.06115369e-09   2.54366569e-13]

[ 0.32757813  0.35133022  0.32109165]

19. One-Hot Encoding

image

21. Cross Entropy 交叉熵

交叉熵误差

衡量两个向量之间的距离的方法,称为交叉熵。

image

22. Minimizing Cross Entropy

image

这个损失函数,求出了对于所有训练集样本的交叉熵的均值,损失函数是一个关于权值和偏置的函数,我们希望该损失函数的值最小。(说明预测值越靠近标签值)

23. Practical Aspects of Learning

下面介绍一些计算导数的工具,以及梯度下降法的优缺点。在训练第一个模型时,有如下两个问题需要解决:

  1. 如何把图像像素输入到分类器(How do you fill image pixels to this classfier?)
  2. 在哪里初始化最优化过程(Where do you initialize the optimization?)

24. Quiz: Numerical Stability

数值计算时,需要考虑极大值和极小值的问题,如下的示例代码:

a = 1000000000
for i in range(1000000):
    a = a + 1e-6
print(a - 1000000000)

得到的结果是0.953674316406,与预期值1不同,说明计算时产生了误差。

25. Normalized Inputs and Initial Weights

正规化input数据:

好的输入数据能够减少优化过程,好的输入数据有两个特点:

  1. 平均值接近0
  2. 较小的方差

image

基于上面的原则,将图像数据的RGB值进行如下的处理:

image

权值初始值生成:

理想的权值和偏置初始值,有利于做梯度下降。一种简单的方法是: 从高斯分布上随机获取初始权重,使这些权重的均值为0,标准差为σ(是离均差平方的算术平均数的平方根)。

完成上面的工作后,就可以进行训练了。

x表示输入,W表示权重,b表示偏置,S()表示softmax函数,Li表示标签值,D()表示交叉熵,最后对交叉熵取平均值,得到最后结果。

image

26. Measuring Performance

image

将数据集分为训练集,验证集和测试集。

27. Transition: Overfitting

了解下什么是交叉验证(Cross-Validation),交叉验证是深度学习的基础之一。

交叉验证是一种评估 ML 模型的方法,具体方法是通过使用可用输入数据子集训练多个 ML 模型并使用补充数据子集对其进行评估。使用交叉验证来检测过拟合,即无法泛化模式。

image

上面的练习中,小于30个的改变,都有可能是噪声导致的,超过30个才可能是调整权重的结果,所以只有第一个选项是yes。

32. Stochastic Gradient Descent 随机梯度下降

随机梯度下降(S.G.D)是深度学习的核心知识点,随机梯度下降能够适应各种数据和模型的大小。

33. Momentum and Learning Rate Decay

Momentum and Learning Rate Decay 动量和学习速率衰减

参考 参数的更新

34. Parameter Hyperspace 超参数调整

参考 超参数的验证

35. Quiz 2: Mini-batch

import math
def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)
    outout_batches = []
    
    sample_size = len(features)
    for start_i in range(0, sample_size, batch_size):
        end_i = start_i + batch_size
        batch = [features[start_i:end_i], labels[start_i:end_i]]
        outout_batches.append(batch)
        
    return outout_batches
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


# TODO: Set batch size
batch_size = 128
assert batch_size is not None, 'You must set the batch size'

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    # TODO: Train optimizer on all batches
    for batch_features, batch_labels in batches(batch_size, train_features, train_labels):
        sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels})

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

结果输出为:

Extracting /datasets/ud730/mnist/train-images-idx3-ubyte.gz
Extracting /datasets/ud730/mnist/train-labels-idx1-ubyte.gz
Extracting /datasets/ud730/mnist/t10k-images-idx3-ubyte.gz
Extracting /datasets/ud730/mnist/t10k-labels-idx1-ubyte.gz
Test Accuracy: 0.11879999935626984

36. Epochs

【书】深度学习入门-03-神经网络 中,有比较详细的用法。

An epoch is a single forward and backward pass of the whole dataset. This is used to increase the accuracy of the model without requiring more data. This section will cover epochs in TensorFlow and how to choose the right number of epochs.

The following TensorFlow code trains a model using 10 epochs.

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches  # Helper function created in Mini-batching section


def print_epoch_stats(epoch_i, sess, last_features, last_labels):
    """
    Print cost and validation accuracy of an epoch
    """
    current_cost = sess.run(
        cost,
        feed_dict={features: last_features, labels: last_labels})
    valid_accuracy = sess.run(
        accuracy,
        feed_dict={features: valid_features, labels: valid_labels})
    print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format(
        epoch_i,
        current_cost,
        valid_accuracy))

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
valid_features = mnist.validation.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
valid_labels = mnist.validation.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
learning_rate = tf.placeholder(tf.float32)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

init = tf.global_variables_initializer()

batch_size = 128
epochs = 10
learn_rate = 0.001

train_batches = batches(batch_size, train_features, train_labels)

with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch_i in range(epochs):

        # Loop over all batches
        for batch_features, batch_labels in train_batches:
            train_feed_dict = {
                features: batch_features,
                labels: batch_labels,
                learning_rate: learn_rate}
            sess.run(optimizer, feed_dict=train_feed_dict)

        # Print cost and validation accuracy of an epoch
        print_epoch_stats(epoch_i, sess, batch_features, batch_labels)

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

上面设定的Epochs是10,结果如下,准确率不高:

Epoch: 0    - Cost: 11.0     Valid Accuracy: 0.204
Epoch: 1    - Cost: 9.95     Valid Accuracy: 0.229
Epoch: 2    - Cost: 9.18     Valid Accuracy: 0.246
Epoch: 3    - Cost: 8.59     Valid Accuracy: 0.264
Epoch: 4    - Cost: 8.13     Valid Accuracy: 0.283
Epoch: 5    - Cost: 7.77     Valid Accuracy: 0.301
Epoch: 6    - Cost: 7.47     Valid Accuracy: 0.316
Epoch: 7    - Cost: 7.2      Valid Accuracy: 0.328
Epoch: 8    - Cost: 6.96     Valid Accuracy: 0.342
Epoch: 9    - Cost: 6.73     Valid Accuracy: 0.36 
Test Accuracy: 0.3801000118255615

将其调整为0,可以得到不错的结果:

Epoch: 65   - Cost: 0.122    Valid Accuracy: 0.868
Epoch: 66   - Cost: 0.121    Valid Accuracy: 0.868
Epoch: 67   - Cost: 0.12     Valid Accuracy: 0.868
Epoch: 68   - Cost: 0.119    Valid Accuracy: 0.868
Epoch: 69   - Cost: 0.118    Valid Accuracy: 0.868
Epoch: 70   - Cost: 0.118    Valid Accuracy: 0.868
Epoch: 71   - Cost: 0.117    Valid Accuracy: 0.868
Epoch: 72   - Cost: 0.116    Valid Accuracy: 0.868
Epoch: 73   - Cost: 0.115    Valid Accuracy: 0.868
Epoch: 74   - Cost: 0.115    Valid Accuracy: 0.868
Epoch: 75   - Cost: 0.114    Valid Accuracy: 0.868
Epoch: 76   - Cost: 0.113    Valid Accuracy: 0.868
Epoch: 77   - Cost: 0.113    Valid Accuracy: 0.868
Epoch: 78   - Cost: 0.112    Valid Accuracy: 0.868
Epoch: 79   - Cost: 0.111    Valid Accuracy: 0.868
Epoch: 80   - Cost: 0.111    Valid Accuracy: 0.869
Test Accuracy: 0.86909999418258667

37. Intro TensorFlow Neural Network

The notebook has 3 problems for you to solve:

  • Problem 1: Normalize the features
  • Problem 2: Use TensorFlow operations to create features, labels, weight, and biases tensors
  • Problem 3: Tune the learning rate, number of steps, and batch size for the best accuracy

数据集使用的是不同字体的A到J字母的图片,如下图是字母A:

image

通过数据集notMNIST的学习,训练一个神经网络,使得它的准确率超过80%.

37.1 导入所需要的库

import hashlib
import os
import pickle
from urllib.request import urlretrieve

import numpy as np
from PIL import Image
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
from sklearn.utils import resample
from tqdm import tqdm
from zipfile import ZipFile

print('All modules imported.')

37.2 下载数据

def download(url, file):
    """
    Download file from <url>
    :param url: URL to file
    :param file: Local file path
    """
    if not os.path.isfile(file):
        print('Downloading ' + file + '...')
        urlretrieve(url, file)
        print('Download Finished')

# Download the training and test dataset.
download('https://s3.amazonaws.com/udacity-sdc/notMNIST_train.zip', 'notMNIST_train.zip')
download('https://s3.amazonaws.com/udacity-sdc/notMNIST_test.zip', 'notMNIST_test.zip')

# Make sure the files aren't corrupted
assert hashlib.md5(open('notMNIST_train.zip', 'rb').read()).hexdigest() == 'c8673b3f28f489e9cdf3a3d74e2ac8fa',\
        'notMNIST_train.zip file is corrupted.  Remove the file and try again.'
assert hashlib.md5(open('notMNIST_test.zip', 'rb').read()).hexdigest() == '5d3c7e653e63471c88df796156a9dfa9',\
        'notMNIST_test.zip file is corrupted.  Remove the file and try again.'

# Wait until you see that all files have been downloaded.
print('All files downloaded.')
Downloading notMNIST_train.zip...
Download Finished
Downloading notMNIST_test.zip...
Download Finished
All files downloaded.

37.3 解压数据

def uncompress_features_labels(file):
    """
    Uncompress features and labels from a zip file
    :param file: The zip file to extract the data from
    """
    features = []
    labels = []

    with ZipFile(file) as zipf:
        # Progress Bar
        filenames_pbar = tqdm(zipf.namelist(), unit='files')
        
        # Get features and labels from all files
        for filename in filenames_pbar:
            # Check if the file is a directory
            if not filename.endswith('/'):
                with zipf.open(filename) as image_file:
                    image = Image.open(image_file)
                    image.load()
                    # Load image data as 1 dimensional array
                    # We're using float32 to save on memory space
                    feature = np.array(image, dtype=np.float32).flatten()

                # Get the the letter from the filename.  This is the letter of the image.
                label = os.path.split(filename)[1][0]

                features.append(feature)
                labels.append(label)
    return np.array(features), np.array(labels)

# Get the features and labels from the zip files
train_features, train_labels = uncompress_features_labels('notMNIST_train.zip')
test_features, test_labels = uncompress_features_labels('notMNIST_test.zip')

# Limit the amount of data to work with a docker container
docker_size_limit = 150000
train_features, train_labels = resample(train_features, train_labels, n_samples=docker_size_limit)

# Set flags for feature engineering.  This will prevent you from skipping an important step.
is_features_normal = False
is_labels_encod = False

# Wait until you see that all features and labels have been uncompressed.
print('All features and labels uncompressed.')

37.4 Problem 1 正规化输入数据

将输入数据从0-255范围,正规化到0.1-0.9范围,使用公式为:

image

代码为:

# Problem 1 - Implement Min-Max scaling for grayscale image data
def normalize_grayscale(image_data):
    """
    Normalize the image data with Min-Max scaling to a range of [0.1, 0.9]
    :param image_data: The image data to be normalized
    :return: Normalized image data
    """
    # TODO: Implement Min-Max scaling for grayscale image data
    a = 0.1
    b = 0.9
    grayscale_min = 0
    grayscale_max = 255
    return a + ( ( (image_data - grayscale_min)*(b - a) )/( grayscale_max - grayscale_min ) )
    
### DON'T MODIFY ANYTHING BELOW ###
# Test Cases
np.testing.assert_array_almost_equal(
    normalize_grayscale(np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 255])),
    [0.1, 0.103137254902, 0.106274509804, 0.109411764706, 0.112549019608, 0.11568627451, 0.118823529412, 0.121960784314,
     0.125098039216, 0.128235294118, 0.13137254902, 0.9],
    decimal=3)
np.testing.assert_array_almost_equal(
    normalize_grayscale(np.array([0, 1, 10, 20, 30, 40, 233, 244, 254,255])),
    [0.1, 0.103137254902, 0.13137254902, 0.162745098039, 0.194117647059, 0.225490196078, 0.830980392157, 0.865490196078,
     0.896862745098, 0.9])

if not is_features_normal:
    train_features = normalize_grayscale(train_features)
    test_features = normalize_grayscale(test_features)
    is_features_normal = True

print('Tests Passed!')

37.5 One-Hot Encoded

将标签数据进行One-Hot编码

if not is_labels_encod:
    # Turn labels into numbers and apply One-Hot Encoding
    encoder = LabelBinarizer()
    encoder.fit(train_labels)
    train_labels = encoder.transform(train_labels)
    test_labels = encoder.transform(test_labels)

    # Change to float32, so it can be multiplied against the features in TensorFlow, which are float32
    train_labels = train_labels.astype(np.float32)
    test_labels = test_labels.astype(np.float32)
    is_labels_encod = True

print('Labels One-Hot Encoded')
  • 查看 print(test_labels)后,为1. 0. 0. ..., 0. 0. 0.]

  • 其形状print(test_labels.shape)输出为(10000, 10),因为是A-J的字母,故个数为10个。

37.6 获取训练数据和验证数据

assert is_features_normal, 'You skipped the step to normalize the features'
assert is_labels_encod, 'You skipped the step to One-Hot Encode the labels'

# Get randomized datasets for training and validation
train_features, valid_features, train_labels, valid_labels = train_test_split(
    train_features,
    train_labels,
    test_size=0.05,
    random_state=832289)

print('Training features and labels randomized and split.')

37.7 将数据保存到pickle中

# Save the data for easy access
pickle_file = 'notMNIST.pickle'
if not os.path.isfile(pickle_file):
    print('Saving data to pickle file...')
    try:
        with open('notMNIST.pickle', 'wb') as pfile:
            pickle.dump(
                {
                    'train_dataset': train_features,
                    'train_labels': train_labels,
                    'valid_dataset': valid_features,
                    'valid_labels': valid_labels,
                    'test_dataset': test_features,
                    'test_labels': test_labels,
                },
                pfile, pickle.HIGHEST_PROTOCOL)
    except Exception as e:
        print('Unable to save data to', pickle_file, ':', e)
        raise

print('Data cached in pickle file.')

通过上面的处理,将所有的处理结果,都保存到了pickle文件中。即使这个notebook运行中断,也可以从下面的部分开始直接运行。

37.8 获取数据

%matplotlib inline

# Load the modules
import pickle
import math

import numpy as np
import tensorflow as tf
from tqdm import tqdm
import matplotlib.pyplot as plt

# Reload the data
pickle_file = 'notMNIST.pickle'
with open(pickle_file, 'rb') as f:
  pickle_data = pickle.load(f)
  train_features = pickle_data['train_dataset']
  train_labels = pickle_data['train_labels']
  valid_features = pickle_data['valid_dataset']
  valid_labels = pickle_data['valid_labels']
  test_features = pickle_data['test_dataset']
  test_labels = pickle_data['test_labels']
  del pickle_data  # Free up memory


print('Data and modules loaded.')

38.8 Problem 2

  • features
    • Placeholder tensor for feature data (train_features/valid_features/test_features)
  • labels
    • Placeholder tensor for label data (train_labels/valid_labels/test_labels)
  • weights
  • biases

代码如下:

features_count = 784
labels_count = 10

# Problem 2 - Set the features and labels tensors
features = tf.placeholder(tf.float32)
labels = tf.placeholder(tf.float32)

# Problem 2 - Set the weights and biases tensors
weights = tf.Variable(tf.truncated_normal((features_count, labels_count)))
biases = tf.Variable(tf.zeros(labels_count))

### DON'T MODIFY ANYTHING BELOW ###

#Test Cases
from tensorflow.python.ops.variables import Variable

assert features._op.name.startswith('Placeholder'), 'features must be a placeholder'
assert labels._op.name.startswith('Placeholder'), 'labels must be a placeholder'
assert isinstance(weights, Variable), 'weights must be a TensorFlow variable'
assert isinstance(biases, Variable), 'biases must be a TensorFlow variable'

assert features._shape == None or (\
    features._shape.dims[0].value is None and\
    features._shape.dims[1].value in [None, 784]), 'The shape of features is incorrect'
assert labels._shape  == None or (\
    labels._shape.dims[0].value is None and\
    labels._shape.dims[1].value in [None, 10]), 'The shape of labels is incorrect'
assert weights._variable._shape == (784, 10), 'The shape of weights is incorrect'
assert biases._variable._shape == (10), 'The shape of biases is incorrect'

assert features._dtype == tf.float32, 'features must be type float32'
assert labels._dtype == tf.float32, 'labels must be type float32'

# Feed dicts for training, validation, and test session
train_feed_dict = {features: train_features, labels: train_labels}
valid_feed_dict = {features: valid_features, labels: valid_labels}
test_feed_dict = {features: test_features, labels: test_labels}

# Linear Function WX + b
logits = tf.matmul(features, weights) + biases

prediction = tf.nn.softmax(logits)

# Cross entropy
cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), axis=1)

# some students have encountered challenges using this function, and have resolved issues
# using https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits
# please see this thread for more detail https://discussions.udacity.com/t/accuracy-0-10-in-the-intro-to-tensorflow-lab/272469/9

# Training loss
loss = tf.reduce_mean(cross_entropy)

# Create an operation that initializes all variables
init = tf.global_variables_initializer()

# Test Cases
with tf.Session() as session:
    session.run(init)
    session.run(loss, feed_dict=train_feed_dict)
    session.run(loss, feed_dict=valid_feed_dict)
    session.run(loss, feed_dict=test_feed_dict)
    biases_data = session.run(biases)

assert not np.count_nonzero(biases_data), 'biases must be zeros'

print('Tests Passed!')
# Determine if the predictions are correct
is_correct_prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(labels, 1))
# Calculate the accuracy of the predictions
accuracy = tf.reduce_mean(tf.cast(is_correct_prediction, tf.float32))

print('Accuracy function created.')

38.9 Problem 3 调整超参数

# TODO: Find the best parameters for each configuration
epochs = 5
batch_size = 100
learning_rate = 0.2

### DON'T MODIFY ANYTHING BELOW ###
# Gradient Descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)    

# The accuracy measured against the validation set
validation_accuracy = 0.0

# Measurements use for graphing loss and accuracy
log_batch_step = 50
batches = []
loss_batch = []
train_acc_batch = []
valid_acc_batch = []

with tf.Session() as session:
    session.run(init)
    batch_count = int(math.ceil(len(train_features)/batch_size))

    for epoch_i in range(epochs):
        
        # Progress bar
        batches_pbar = tqdm(range(batch_count), desc='Epoch {:>2}/{}'.format(epoch_i+1, epochs), unit='batches')
        
        # The training cycle
        for batch_i in batches_pbar:
            # Get a batch of training features and labels
            batch_start = batch_i*batch_size
            batch_features = train_features[batch_start:batch_start + batch_size]
            batch_labels = train_labels[batch_start:batch_start + batch_size]

            # Run optimizer and get loss
            _, l = session.run(
                [optimizer, loss],
                feed_dict={features: batch_features, labels: batch_labels})

            # Log every 50 batches
            if not batch_i % log_batch_step:
                # Calculate Training and Validation accuracy
                training_accuracy = session.run(accuracy, feed_dict=train_feed_dict)
                validation_accuracy = session.run(accuracy, feed_dict=valid_feed_dict)

                # Log batches
                previous_batch = batches[-1] if batches else 0
                batches.append(log_batch_step + previous_batch)
                loss_batch.append(l)
                train_acc_batch.append(training_accuracy)
                valid_acc_batch.append(validation_accuracy)

        # Check accuracy against Validation data
        validation_accuracy = session.run(accuracy, feed_dict=valid_feed_dict)

loss_plot = plt.subplot(211)
loss_plot.set_title('Loss')
loss_plot.plot(batches, loss_batch, 'g')
loss_plot.set_xlim([batches[0], batches[-1]])
acc_plot = plt.subplot(212)
acc_plot.set_title('Accuracy')
acc_plot.plot(batches, train_acc_batch, 'r', label='Training Accuracy')
acc_plot.plot(batches, valid_acc_batch, 'x', label='Validation Accuracy')
acc_plot.set_ylim([0, 1.0])
acc_plot.set_xlim([batches[0], batches[-1]])
acc_plot.legend(loc=4)
plt.tight_layout()
plt.show()

print('Validation accuracy at {}'.format(validation_accuracy))

调整上面的参数值,可以看到最后一种效果最好,准确率最高:

  • epochs = 1 、 batch_size = 50 、 learning_rate = 0.01

image

  • epochs = 1 、 batch_size = 100 、 learning_rate = 0.01

image

  • epochs = 5 、 batch_size = 100 、 learning_rate = 0.2

image

39. epochs和batch_size以及iters_num

1. batch_size:

打包式的输入数据称为“批”,批处理对于计算机的运算大有好处,可以大幅度缩短每张图像的处理时间。

比如如下代码:

x, t = get_data()
network = init_network()
batch_size = 100 # 批数量
accuracy_cnt = 0

for i in range(0, len(x), batch_size):
    x_batch = x[i:i+batch_size]
    y_batch = predict(network, x_batch)

    # 沿着第1维方向(以第1维为轴)找到值最大的元素的索引
    p = np.argmax(y_batch, axis=1) 
    accuracy_cnt += np.sum(p == t[i:i+batch_size])

print("Accuracy:" + str(float(accuracy_cnt) / len(x)))
  1. 首先range函数,生成了以batch_size为一段的数据,比如list( range(0, 10, 3) )结果为[0, 3, 6, 9]
  2. x_batch中获取的是一批输入数据,然后利用numpy的广播特性进行计算。

2. iter_num:

梯度法的更新次数是1000,每更新一次都对训练数据计算损失函数的值,并把该值添加到数组中。

# coding: utf-8
import sys, os
sys.path.append(os.pardir)  # 为了导入父目录的文件而进行的设定
import numpy as np
import matplotlib.pyplot as plt
from dataset.mnist import load_mnist
from two_layer_net import TwoLayerNet

# 读入数据
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

iters_num = 1000  # 适当设定循环的次数
train_size = x_train.shape[0]
batch_size = 100
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]
    
    # 计算梯度
    grad = network.numerical_gradient(x_batch, t_batch) # 数值微分法
    #grad = network.gradient(x_batch, t_batch) # 反向传播法
    
    # 更新参数
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key]
    
    # 记录学习过程
    loss = network.loss(x_batch, t_batch)
    train_loss_list.append(loss)
    

# 绘制图形
markers = {'train': 'o', 'test': 's'}
x = np.arange(iters_num)
plt.plot(x, train_loss_list)
plt.xlabel("loss")
plt.ylabel("iteration")
plt.ylim(0, 5)
plt.show()

image

上面mini-batch大小为100,需要从60000个训练数据中随机取出100个,对着100个数据的mini-batch求梯度,使用SGD更新参数。 这里梯度法的更新次数是1000,每更新一次,都对训练数据计算损失函数的值,并添加到数组中,用图像来表示就是上图。

3. epoch

神经网络的学习过程中,会定期对训练数据和测试数据记录识别精度,这里每经过一个epoch,我们就记录下训练数据和测试数据的识别精度。

epoch是一个单位,一个epoch表示学习中所有训练数据均被使用过一次的更新次数。比如在10000个训练数据中,用batch_size为100进行学习,那需要重复随机梯度下降法100次,所有的训练数据都被看过,这是100就是一个epoch。

代码如下:

# coding: utf-8
import sys, os
sys.path.append(os.pardir)  # 为了导入父目录的文件而进行的设定
import numpy as np
import matplotlib.pyplot as plt
from dataset.mnist import load_mnist
from two_layer_net import TwoLayerNet

# 读入数据
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

iters_num = 10000  # 适当设定循环的次数
train_size = x_train.shape[0]
batch_size = 100
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]
    
    # 计算梯度,这里使用数值微分法
    grad = network.numerical_gradient(x_batch, t_batch) # 数值微分法
    #grad = network.gradient(x_batch, t_batch) # 反向传播法
    
    # 更新参数
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key]
    
    loss = network.loss(x_batch, t_batch)
    train_loss_list.append(loss)
    
    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        test_acc = network.accuracy(x_test, t_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print("train acc, test acc | " + str(train_acc) + ", " + str(test_acc))

# 绘制图形
markers = {'train': 'o', 'test': 's'}
x = np.arange(len(train_acc_list))
plt.plot(x, train_acc_list, label='train acc')
plt.plot(x, test_acc_list, label='test acc', linestyle='--')
plt.xlabel("epochs")
plt.ylabel("accuracy")
plt.ylim(0, 1.0)
plt.legend(loc='lower right')
plt.show()

输出如下:

0.1043 , 0.1041
0.904633333333 , 0.9079
0.921 , 0.9236
0.9321 , 0.9338
0.9436 , 0.9426
0.95025 , 0.9494
0.956133333333 , 0.9531
0.960166666667 , 0.9564
0.9638 , 0.959
0.965933333333 , 0.9607
0.9682 , 0.9619
0.970266666667 , 0.9621
0.97075 , 0.9641
0.973583333333 , 0.9669
0.974083333333 , 0.9663
0.975666666667 , 0.9664
0.9777 , 0.9683

image

通过if i % iter_per_epoch == 0:控制精度计算的频率:

iters_num = 10000  # 适当设定循环的次数
train_size = x_train.shape[0]
batch_size = 100
...
# 训练数据的数量 / batch的大小(一次处理的数量)
iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    ...
    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        ...