0. 小结
3. Number of Parameters
7. 2-Layer Neural Network
8. Quiz: TensorFlow ReLu
11. Backprop 反向传播
12. Deep Neural Network in TensorFlow
- 12.1 示例代码：
- 12.2 代码解析
13. Training a Deep Learning Network
14. Save and Restore TensorFlow Models
15. Fine tuning
- 15.1 Naming Error
18. Regularization
19. Dropout
21. Quiz: TensorFlow Dropout
22. Quiz 2: TensorFlow Dropout

0. 小结

本章主要介绍了一些优化深度学习的方法，以及在TensorFlow中如何实现：

什么是ReLU函数，以及如何用TensorFlow实现
反向传播
TensorFlow中的深度神经网络，包括如何初始化，定义权重，以及各种超参数等
如何在TensorFlow中保存变量，保存模型称为本地文件，并在下一次使用时load数据
如何防止过拟合，权值衰减在TensorFlow中如何实施
同样为了防止过拟合，使用Dropout方法，了解Dropout在TensorFlow中如何实施

3. Number of Parameters

计算下面网络中的参数个数：

上面的参数，有权重和偏置，权重(input,label)，偏置(label,)

= size of W + size of b
= 28x28x10 + 10
= 7850

可以参考如下代码：

n_features = 3
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))
bias = tf.Variable(tf.zeros(n_labels))

7. 2-Layer Neural Network

ReLU 是个非线性函数，当x大于0时，y等于x；否则y为0，该函数的导数如下图：

第一层由输入x和其对应的权重w及偏置bias构成，结果经由ReLU函数，传递给下一层神经网络。
第二层由上一层的中间结果，以及该层的权重w和偏置bias构成，计算出来的结果，最终传递给激活函数如softmax函数，计算出概率。

8. Quiz: TensorFlow ReLu

ReLU函数(f(x) = max(0, x))，也是一种激活函数，它在TensorFlow中用 tf.nn.relu()来定义，示例代码如下：

# Hidden Layer with ReLU activation function
hidden_layer = tf.add(tf.matmul(features, hidden_weights), hidden_biases)
hidden_layer = tf.nn.relu(hidden_layer)

output = tf.add(tf.matmul(hidden_layer, output_weights), output_biases)

上面的代码：

将 tf.nn.relu() 应用到了 hidden_layer 隐藏层。
添加了一个新的层output layer，output layer的输入数据是前一层hidden_layer的输出(非线性Relu函数处理后的)

In this quiz, you’ll use TensorFlow’s ReLU function to turn the linear model below into a nonlinear model.

代码如下：

# Solution is available in the other "solution.py" tab
import tensorflow as tf

output = None
hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[1.0, 2.0, 3.0, 4.0], [-1.0, -2.0, -3.0, -4.0], [11.0, 12.0, 13.0, 14.0]])

# TODO: Create Model
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])

hidden_layer = tf.nn.relu(hidden_layer)

output  = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

# TODO: save and print session results on a variable named "output"

init = tf.global_variables_initializer()

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    sess.run(init)
    result = sess.run(output)
    print(result)

输出结果如下：

[[  5.11000013   8.44000053]
 [  0.           0.        ]
 [ 24.01000214  38.23999786]]

11. Backprop 反向传播

这部分就是误差反向传播法的内容，详细要参考这里。

12. Deep Neural Network in TensorFlow

12.1 示例代码：

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

import tensorflow as tf

# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128  # Decrease batch size if you don't have enough memory
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

n_hidden_layer = 256 # layer number of features

# Store layers weight & bias
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.matmul(layer_1, weights['out']) + biases['out']

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        # Display logs per epoch step
        if epoch % display_step == 0:
            c = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
            print("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(c))
    print("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # Decrease test_size if you don't have enough memory
    test_size = 256
    print("Accuracy:", accuracy.eval({x: mnist.test.images[:test_size], y: mnist.test.labels[:test_size]}))

12.2 代码解析

使用TensorFlow提供的MNIST数据，已经做好了batch和one-hot编码处理：

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

Learning Parameters

定义各类超参数：

import tensorflow as tf

# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128  # Decrease batch size if you don't have enough memory
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

Hidden Layer Parameters

定义隐藏层上的神经元数量：

n_hidden_layer = 256 # layer number of features

Weights and Biases 权重和偏置

这里只有两层神经网络，所有只有hidden_layer和out的权重及偏置

# Store layers weight & bias
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

input 输入数据

# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

需要将28*28的单通道数据([None, 28, 28, 1])，通过reshape转换成一行数据，一行中有784个像素点数据。

Multilayer Perceptron 多层感知

下面代码先计算xW+b，然后将其传递给ReLU层，最后传递给下一个xW+b，得到最后的logits layer：

# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']),\
    biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])

Optimizer 优化器

# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

session

TensorFlow中提供的MNIST library库，能够批量接收数据集，使用mnist.train.next_batch()函数返回训练数据的一个子集subset。

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})

13. Training a Deep Learning Network

更深的模型，往往会呈现出层次化的结构，比如第一层学习到线条，第二层有形状，第三层逐渐抽象成人脸的形状，逐步获取数据中的抽象内容，这正是深度学习所期待的。

类似的描述，在CNN中也有，参考6.2 基于分层结构的信息提取

14. Save and Restore TensorFlow Models

训练一个模型往往也要几个小时时间，一段中断了TensorFlow的session，那可能会丢失训练好的权重和偏置等数据。

通过TensorFlow的函数tf.train.Saver，能将各种tf.Variable变量保存到本地文件中。

14.1 Saving Variables

下面的代码中，保存了两个变量 weights and bias 。

import tensorflow as tf

# The file path to save the data
save_file = './model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

with tf.Session() as sess:
    # Initialize all the Variables
    sess.run(tf.global_variables_initializer())

    # Show the values of weights and bias
    print('Weights:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

    # Save the model
    saver.save(sess, save_file)

Weights:
[[ 0.74129212  1.16585362  0.18823986]
 [ 0.84469903 -0.30504367 -0.9390443 ]]
Bias:
[-0.0300845   0.12080105  0.38587224]

最后在本地的文件夹中，有model.ckpt.meta文件被保存。

14.2 Loading Variables

将保存好的数据load到新的model中：

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

with tf.Session() as sess:
    # Load the weights and bias
    saver.restore(sess, save_file)

    # Show the values of weights and bias
    print('Weight:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

INFO:tensorflow:Restoring parameters from ./model.ckpt
Weight:
[[ 0.74129212  1.16585362  0.18823986]
 [ 0.84469903 -0.30504367 -0.9390443 ]]
Bias:
[-0.0300845   0.12080105  0.38587224]

可以看到上面得到的数据，以前一个保存的时候是一样的，如果将代码修改为：

with tf.Session() as sess:
    # Load the weights and bias
    #saver.restore(sess, save_file)
    sess.run(tf.global_variables_initializer())
    # Show the values of weights and bias
    print('Weight:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

即不使用保存的数据，那么得到的结果不同：

Weight:
[[ 0.51144725  0.18832855  0.00272263]
 [ 0.05852098 -0.44724768 -0.96787697]]
Bias:
[-0.05925143 -1.33713555  0.32981932]

14.3 Save a Trained Model

训练一个模型，并保存其权重值：

先新建一个模型

# Remove previous Tensors and Operations
tf.reset_default_graph()

from tensorflow.examples.tutorials.mnist import input_data
import numpy as np

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('.', one_hot=True)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

再训练一个模型，并保存其权重：

import math

save_file = './train_model.ckpt'
batch_size = 128
n_epochs = 100

saver = tf.train.Saver()

# Launch the graph
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # Training cycle
    for epoch in range(n_epochs):
        total_batch = math.ceil(mnist.train.num_examples / batch_size)

        # Loop over all batches
        for i in range(total_batch):
            batch_features, batch_labels = mnist.train.next_batch(batch_size)
            sess.run(
                optimizer,
                feed_dict={features: batch_features, labels: batch_labels})

        # Print status for every 10 epochs
        if epoch % 10 == 0:
            valid_accuracy = sess.run(
                accuracy,
                feed_dict={
                    features: mnist.validation.images,
                    labels: mnist.validation.labels})
            print('Epoch {:<3} - Validation Accuracy: {}'.format(
                epoch,
                valid_accuracy))

    # Save the model
    saver.save(sess, save_file)
    print('Trained Model Saved.')

14.4 Load a Trained Model

load一个训练好的模型：

saver = tf.train.Saver()

# Launch the graph
with tf.Session() as sess:
    saver.restore(sess, save_file)

    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: mnist.test.images, labels: mnist.test.labels})

print('Test Accuracy: {}'.format(test_accuracy))

15. Fine tuning

将保存好的变量，直接load到一个修改后的model中，会产生错误。

15.1 Naming Error

import tensorflow as tf

# Remove the previous weights and bias
tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]))
weights = tf.Variable(tf.truncated_normal([2, 3]))

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - ERROR
    saver.restore(sess, save_file)

上面weights和bias的name，与之前保存的model时不同，所以出现了下面的错误：

Assign requires shapes of both tensors to match

修改后代码如下，添加了name属性的指定：

import tensorflow as tf

tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]), name='weights_0')
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')
weights = tf.Variable(tf.truncated_normal([2, 3]) ,name='weights_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - No Error
    saver.restore(sess, save_file)

print('Loaded Weights and Bias successfully.')

18. Regularization

本节关于如何防止过拟合，可以参考正则化

通常有两种方式：

Early Termination: 当发现性能不再上升时，就停止训练
L2 Regularization: L2正则化，惩罚那些权重高的，即这里的权值衰减

19. Dropout

参考 dropout，随机删除一些流经激活函数的数据。

21. Quiz: TensorFlow Dropout

Dropout是一种防止过拟合的方式，它随机删除一些神经元，如下图：

在TensorFlow中，通过函数tf.nn.dropout()可以实现dropout，示例代码如下：

keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

The tf.nn.dropout() function takes in two parameters:

hidden_layer: the tensor to which you would like to apply dropout
keep_prob: the probability of keeping (i.e. not dropping) any given unit

通过keep_prob控制要drop的量，一般在训练的时候，设置为0.5,而在test的时候，要max model的作用，设置为1，即不drop任何神经元。

下面是一段示例代码：

...

keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

...

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(epochs):
        for batch_i in range(batches):
            ....

            sess.run(optimizer, feed_dict={
                features: batch_features,
                labels: batch_labels,
                keep_prob: 0.5})

    validation_accuracy = sess.run(accuracy, feed_dict={
        features: test_features,
        labels: test_labels,
        keep_prob: 1.0})

22. Quiz 2: TensorFlow Dropout

下面将实际使用dropout，代码如下：

# Quiz Solution
# Note: You can't run code in this tab
import tensorflow as tf

hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# set random seed
tf.set_random_seed(123456)

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[0.0, 2.0, 3.0, 4.0], [0.1, 0.2, 0.3, 0.4], [11.0, 12.0, 13.0, 14.0]])

# TODO: Create Model with Dropout
keep_prob = tf.placeholder(tf.float32)
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

# TODO: save and print session results as variable named "output"
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    output = sess.run(logits, feed_dict={keep_prob: 0.5})
    print(output)

输出结果如下：

[[  9.55999947  16.        ]
 [  0.11200001   0.67200011]
 [ 43.30000305  48.15999985]]

Haste makes waste

Nano01(自動運転)-U03-Lesson13-Deep Neural Networks深度神经网络

0. 小结

3. Number of Parameters

7. 2-Layer Neural Network

8. Quiz: TensorFlow ReLu

11. Backprop 反向传播

12. Deep Neural Network in TensorFlow

12.1 示例代码：

12.2 代码解析

13. Training a Deep Learning Network

14. Save and Restore TensorFlow Models

14.1 Saving Variables

14.2 Loading Variables

14.3 Save a Trained Model

14.4 Load a Trained Model

15. Fine tuning

15.1 Naming Error

18. Regularization

19. Dropout

21. Quiz: TensorFlow Dropout

22. Quiz 2: TensorFlow Dropout

Haste makes waste

目录

0. 小结

3. Number of Parameters

7. 2-Layer Neural Network

8. Quiz: TensorFlow ReLu

11. Backprop 反向传播

12. Deep Neural Network in TensorFlow

12.1 示例代码：

12.2 代码解析

13. Training a Deep Learning Network

14. Save and Restore TensorFlow Models

14.1 Saving Variables

14.2 Loading Variables

14.3 Save a Trained Model

14.4 Load a Trained Model

15. Fine tuning

15.1 Naming Error

18. Regularization

19. Dropout

21. Quiz: TensorFlow Dropout

22. Quiz 2: TensorFlow Dropout