Haste makes waste

Nano01(自動運転)-U03-Lesson13-Deep Neural Networks深度神经网络

Posted on By lijun

0. 小结

本章主要介绍了一些优化深度学习的方法,以及在TensorFlow中如何实现:

  1. 什么是ReLU函数,以及如何用TensorFlow实现
  2. 反向传播
  3. TensorFlow中的深度神经网络,包括如何初始化,定义权重,以及各种超参数等
  4. 如何在TensorFlow中保存变量,保存模型称为本地文件,并在下一次使用时load数据
  5. 如何防止过拟合,权值衰减在TensorFlow中如何实施
  6. 同样为了防止过拟合,使用Dropout方法,了解Dropout在TensorFlow中如何实施

3. Number of Parameters

计算下面网络中的参数个数:

image

上面的参数,有权重和偏置,权重(input,label),偏置(label,)

= size of W + size of b
= 28x28x10 + 10
= 7850

可以参考如下代码:

n_features = 3
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))
bias = tf.Variable(tf.zeros(n_labels))

7. 2-Layer Neural Network

image

ReLU 是个非线性函数,当x大于0时,y等于x;否则y为0,该函数的导数如下图: image

  1. 第一层由输入x和其对应的权重w及偏置bias构成,结果经由ReLU函数,传递给下一层神经网络。
  2. 第二层由上一层的中间结果,以及该层的权重w和偏置bias构成,计算出来的结果,最终传递给激活函数如softmax函数,计算出概率。

8. Quiz: TensorFlow ReLu

ReLU函数(f(x) = max(0, x)),也是一种激活函数,它在TensorFlow中用 tf.nn.relu()来定义,示例代码如下:

# Hidden Layer with ReLU activation function
hidden_layer = tf.add(tf.matmul(features, hidden_weights), hidden_biases)
hidden_layer = tf.nn.relu(hidden_layer)

output = tf.add(tf.matmul(hidden_layer, output_weights), output_biases)

上面的代码:

  1. 将 tf.nn.relu() 应用到了 hidden_layer 隐藏层。
  2. 添加了一个新的层output layer,output layer的输入数据是前一层hidden_layer的输出(非线性Relu函数处理后的)

In this quiz, you’ll use TensorFlow’s ReLU function to turn the linear model below into a nonlinear model.

代码如下:

# Solution is available in the other "solution.py" tab
import tensorflow as tf

output = None
hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[1.0, 2.0, 3.0, 4.0], [-1.0, -2.0, -3.0, -4.0], [11.0, 12.0, 13.0, 14.0]])
# TODO: Create Model
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])

hidden_layer = tf.nn.relu(hidden_layer)

output  = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])
# TODO: save and print session results on a variable named "output"

init = tf.global_variables_initializer()

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    sess.run(init)
    result = sess.run(output)
    print(result)

输出结果如下:

[[  5.11000013   8.44000053]
 [  0.           0.        ]
 [ 24.01000214  38.23999786]]

11. Backprop 反向传播

image

这部分就是误差反向传播法的内容,详细要参考这里。

12. Deep Neural Network in TensorFlow

12.1 示例代码:

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

import tensorflow as tf

# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128  # Decrease batch size if you don't have enough memory
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

n_hidden_layer = 256 # layer number of features

# Store layers weight & bias
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.matmul(layer_1, weights['out']) + biases['out']

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        # Display logs per epoch step
        if epoch % display_step == 0:
            c = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
            print("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(c))
    print("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # Decrease test_size if you don't have enough memory
    test_size = 256
    print("Accuracy:", accuracy.eval({x: mnist.test.images[:test_size], y: mnist.test.labels[:test_size]}))

12.2 代码解析

  • 使用TensorFlow提供的MNIST数据,已经做好了batch和one-hot编码处理:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)
  • Learning Parameters

定义各类超参数:

import tensorflow as tf

# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128  # Decrease batch size if you don't have enough memory
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)
  • Hidden Layer Parameters

定义隐藏层上的神经元数量:

n_hidden_layer = 256 # layer number of features
  • Weights and Biases 权重和偏置

这里只有两层神经网络,所有只有hidden_layer和out的权重及偏置

# Store layers weight & bias
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}
  • input 输入数据
# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

需要将28*28的单通道数据([None, 28, 28, 1]),通过reshape转换成一行数据,一行中有784个像素点数据。

  • Multilayer Perceptron 多层感知

image

下面代码先计算xW+b,然后将其传递给ReLU层,最后传递给下一个xW+b,得到最后的logits layer:

# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']),\
    biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])
  • Optimizer 优化器
# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)
  • session

TensorFlow中提供的MNIST library库,能够批量接收数据集,使用mnist.train.next_batch()函数返回训练数据的一个子集subset。

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})

13. Training a Deep Learning Network

更深的模型,往往会呈现出层次化的结构,比如第一层学习到线条,第二层有形状,第三层逐渐抽象成人脸的形状,逐步获取数据中的抽象内容,这正是深度学习所期待的。

image

类似的描述,在CNN中也有,参考6.2 基于分层结构的信息提取

14. Save and Restore TensorFlow Models

训练一个模型往往也要几个小时时间,一段中断了TensorFlow的session,那可能会丢失训练好的权重和偏置等数据。

通过TensorFlow的函数tf.train.Saver,能将各种tf.Variable变量保存到本地文件中。

14.1 Saving Variables

下面的代码中,保存了两个变量 weights and bias 。

import tensorflow as tf

# The file path to save the data
save_file = './model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

with tf.Session() as sess:
    # Initialize all the Variables
    sess.run(tf.global_variables_initializer())

    # Show the values of weights and bias
    print('Weights:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

    # Save the model
    saver.save(sess, save_file)
Weights:
[[ 0.74129212  1.16585362  0.18823986]
 [ 0.84469903 -0.30504367 -0.9390443 ]]
Bias:
[-0.0300845   0.12080105  0.38587224]

最后在本地的文件夹中,有model.ckpt.meta文件被保存。

14.2 Loading Variables

将保存好的数据load到新的model中:

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

with tf.Session() as sess:
    # Load the weights and bias
    saver.restore(sess, save_file)

    # Show the values of weights and bias
    print('Weight:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))
INFO:tensorflow:Restoring parameters from ./model.ckpt
Weight:
[[ 0.74129212  1.16585362  0.18823986]
 [ 0.84469903 -0.30504367 -0.9390443 ]]
Bias:
[-0.0300845   0.12080105  0.38587224]

可以看到上面得到的数据,以前一个保存的时候是一样的,如果将代码修改为:

with tf.Session() as sess:
    # Load the weights and bias
    #saver.restore(sess, save_file)
    sess.run(tf.global_variables_initializer())
    # Show the values of weights and bias
    print('Weight:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

即不使用保存的数据,那么得到的结果不同:

Weight:
[[ 0.51144725  0.18832855  0.00272263]
 [ 0.05852098 -0.44724768 -0.96787697]]
Bias:
[-0.05925143 -1.33713555  0.32981932]

14.3 Save a Trained Model

训练一个模型,并保存其权重值:

  • 先新建一个模型
# Remove previous Tensors and Operations
tf.reset_default_graph()

from tensorflow.examples.tutorials.mnist import input_data
import numpy as np

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('.', one_hot=True)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
  • 再训练一个模型,并保存其权重:
import math

save_file = './train_model.ckpt'
batch_size = 128
n_epochs = 100

saver = tf.train.Saver()

# Launch the graph
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # Training cycle
    for epoch in range(n_epochs):
        total_batch = math.ceil(mnist.train.num_examples / batch_size)

        # Loop over all batches
        for i in range(total_batch):
            batch_features, batch_labels = mnist.train.next_batch(batch_size)
            sess.run(
                optimizer,
                feed_dict={features: batch_features, labels: batch_labels})

        # Print status for every 10 epochs
        if epoch % 10 == 0:
            valid_accuracy = sess.run(
                accuracy,
                feed_dict={
                    features: mnist.validation.images,
                    labels: mnist.validation.labels})
            print('Epoch {:<3} - Validation Accuracy: {}'.format(
                epoch,
                valid_accuracy))

    # Save the model
    saver.save(sess, save_file)
    print('Trained Model Saved.')

14.4 Load a Trained Model

load一个训练好的模型:

saver = tf.train.Saver()

# Launch the graph
with tf.Session() as sess:
    saver.restore(sess, save_file)

    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: mnist.test.images, labels: mnist.test.labels})

print('Test Accuracy: {}'.format(test_accuracy))

15. Fine tuning

将保存好的变量,直接load到一个修改后的model中,会产生错误。

15.1 Naming Error

import tensorflow as tf

# Remove the previous weights and bias
tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]))
weights = tf.Variable(tf.truncated_normal([2, 3]))

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - ERROR
    saver.restore(sess, save_file)

上面weights和bias的name,与之前保存的model时不同,所以出现了下面的错误:

Assign requires shapes of both tensors to match

修改后代码如下,添加了name属性的指定:

import tensorflow as tf

tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]), name='weights_0')
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')
weights = tf.Variable(tf.truncated_normal([2, 3]) ,name='weights_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - No Error
    saver.restore(sess, save_file)

print('Loaded Weights and Bias successfully.')

18. Regularization

本节关于如何防止过拟合,可以参考 正则化

通常有两种方式:

  • Early Termination: 当发现性能不再上升时,就停止训练
  • L2 Regularization: L2正则化,惩罚那些权重高的,即这里的权值衰减

19. Dropout

参考 dropout,随机删除一些流经激活函数的数据。

21. Quiz: TensorFlow Dropout

Dropout是一种防止过拟合的方式,它随机删除一些神经元,如下图:

image

在TensorFlow中,通过函数tf.nn.dropout()可以实现dropout,示例代码如下:

keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

The tf.nn.dropout() function takes in two parameters:

  • hidden_layer: the tensor to which you would like to apply dropout
  • keep_prob: the probability of keeping (i.e. not dropping) any given unit

通过keep_prob控制要drop的量,一般在训练的时候,设置为0.5,而在test的时候,要max model的作用,设置为1,即不drop任何神经元。

下面是一段示例代码:

...

keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

...

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(epochs):
        for batch_i in range(batches):
            ....

            sess.run(optimizer, feed_dict={
                features: batch_features,
                labels: batch_labels,
                keep_prob: 0.5})

    validation_accuracy = sess.run(accuracy, feed_dict={
        features: test_features,
        labels: test_labels,
        keep_prob: 1.0})

22. Quiz 2: TensorFlow Dropout

下面将实际使用dropout,代码如下:

# Quiz Solution
# Note: You can't run code in this tab
import tensorflow as tf

hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# set random seed
tf.set_random_seed(123456)

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[0.0, 2.0, 3.0, 4.0], [0.1, 0.2, 0.3, 0.4], [11.0, 12.0, 13.0, 14.0]])

# TODO: Create Model with Dropout
keep_prob = tf.placeholder(tf.float32)
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

# TODO: save and print session results as variable named "output"
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    output = sess.run(logits, feed_dict={keep_prob: 0.5})
    print(output)

输出结果如下:

[[  9.55999947  16.        ]
 [  0.11200001   0.67200011]
 [ 43.30000305  48.15999985]]