Tensorflow Tutorial

Chris Rytting

Tensorflow is a open source library which is especially useful for building neural networks. In this notebook, we:

  • Build a neural network to recognize neural networks.

  • Visualize the architecture of the neural network using tensorboard, a feature of tensorflow.

First, though, we need to import the library itself and import our dataset:

In [27]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

Now we'll define some helper functions. In general, a neural net's parameters consist of weights and bias variables. These weights and biases are numbers which operate on inputs to produce outputs. Weights and biases are instantiated randomly, which produces terrible results. This is manifest in a high loss function. During training, then, these parameters are adjusted appropriately to better results and incrementally decrease loss.

In [31]:
def weight_variable(shape, name):
        initial: a tensor of weights. This tensor is instantiated with shape 'shape' and its entries consist of
    independent and identically distributed (iid) draws from a truncated normal distribution with mean 0 and standard
    deviation 0.1
#     initial = tf.truncated_normal(shape, stddev=0.1)
    initial = tf.random_uniform(shape,-0.1, 0.1)
    return tf.Variable(initial, name = name)

def bias_variable(shape, name):
        initial: a tensor of biases. This tensor is instantiated with shape 'shape' and its entries consist of
    independent and identically distributed (iid) draws from a truncated normal distribution with mean 0 and standard
    deviation 0.1
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial, name = name)

def conv2d(x, W, name):
        x: a four-dimensional input tensor
        W: a four-dimensional filter tensor to be used in computing convolution.
        convolution: a four-dimensional tensor, x convolved by W according to strides and padding.
    (For details on this special convolution, see https://www.tensorflow.org/api_docs/python/tf/nn/conv2d)
    convolution = tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME', name = name)
    return convolution
def max_pool_2x2(x, name):
        x: a four-dimensional input tensor
        maxpool: a four-dimensional tensor, each entry is the max element of the subtensor dictated by ksize,
        strides, and padding.
    (For details on this operation, see https://www.tensorflow.org/api_docs/python/tf/nn/max_pool)
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name = name)

Construction of graph

Now, we will decide the topology of our neural net. We will have two convolutional layers, a fully connected layer, and an output layer:

In [32]:
#Clear away any old graphs

#Use an interactive session, so that we don't have to specify a session when we evaluate tensors from the computation 
sess = tf.InteractiveSession()

with tf.name_scope('Neural_Net') as scope:
    #Placholders for input to neural net
    x = tf.placeholder(tf.float32, shape=[None, 784])
    y_ = tf.placeholder(tf.float32, shape=[None, 10])
    with tf.name_scope('Layer_1'):
        W_conv1 = weight_variable([5, 5, 1, 32], 'Weights')
        b_conv1 = bias_variable([32], 'Biases')
        x_image = tf.reshape(x, [-1,28,28,1])
        h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1, 'Convolution') + b_conv1)
        h_pool1 = max_pool_2x2(h_conv1, 'Pool')
    with tf.name_scope('Layer_2'):
        W_conv2 = weight_variable([5, 5, 32, 64], 'Weights')
        b_conv2 = bias_variable([64], 'Biases')
        h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2, 'Convolution') + b_conv2)
        h_pool2 = max_pool_2x2(h_conv2, 'Pool')
    with tf.name_scope('Fully_Connected_Layer'):
        W_fc1 = weight_variable([7 * 7 * 64, 1024], 'Weights')
        b_fc1 = bias_variable([1024], 'Biases')
        h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
        h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
        keep_prob = tf.placeholder(tf.float32)
        h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
    with tf.name_scope('Output_Layer'):
        W_fc2 = weight_variable([1024, 10], 'Weights')
        b_fc2 = bias_variable([10], 'Biases')
        y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2, name = 'Softmax')

with tf.name_scope('Cost') as scope:
    cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
with tf.name_scope('Optimizer') as scope:
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
with tf.name_scope('Accuracy') as scope:
    correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

w1_hist = tf.summary.histogram('W_conv1', W_conv1)
w2_hist = tf.summary.histogram('W_conv2', W_conv2)
accuracy_sum = tf.summary.scalar('accuracy', accuracy)
cost_sum = tf.summary.scalar('cost', cross_entropy)

merged = tf.summary.merge_all()
writer = tf.summary.FileWriter('tf_logs', sess.graph)
In [33]:
for i in range(1500):
    batch = mnist.train.next_batch(50)
    if i%10 == 0:
        summary, _, train_accuracy = sess.run([merged, train_step, accuracy], feed_dict={x:batch[0], y_: batch[1], keep_prob: 1.0})
        writer.add_summary(summary, i)
        _ = sess.run([train_step], feed_dict={x:batch[0], y_: batch[1], keep_prob: 1.0})
    if i%100 == 0:
        print("step %d, training accuracy %g"%(i, train_accuracy))
step 0, training accuracy 0.1
step 100, training accuracy 0.86
step 200, training accuracy 0.9
step 300, training accuracy 0.96
step 400, training accuracy 0.92
step 500, training accuracy 0.92
step 600, training accuracy 0.96
step 700, training accuracy 0.94
step 800, training accuracy 0.96
step 900, training accuracy 0.92
step 1000, training accuracy 1
step 1100, training accuracy 0.98
step 1200, training accuracy 0.98
step 1300, training accuracy 0.98
step 1400, training accuracy 0.96
In [34]:

Tensorboard yields the following visualizations of the graph, accuracy, loss/cost, and the weights:


(Main graph, a bit unravelled)