神经算术逻辑单元，你了解吗？

qiaoweiyiyi · 发表于 2018-10-9 17:09:09

本帖最后由 qiaoweiyiyi 于 2018-10-9 17:11 编辑

文 / Akil Hylton

DeepMind 最近发布了一篇新的论文---《神经算术逻辑单元（NALU）》（https://arxiv.org/abs/1808.00508），这是一篇很有趣的论文，它解决了深度学习中的一个重要问题，即教导神经网络计算。令人惊讶的是，尽管神经网络已经能够在许多任务，如肺癌分类中获得卓绝表现，却往往在一些简单任务，像计算数字上苦苦挣扎。

在一个展示网络如何努力从新数据中插入特征的实验中，我们的研究发现，他们能够用 -5 到 5 之间的数字将训练数据分类，准确度近乎完美，但对于训练数据之外的数字，网络几乎无法归纳概括。

论文提供了一个解决方案，分成两个部分。以下我将简单介绍一下 NAC 的工作原理，以及它如何处理加法和减法等操作。之后，我会介绍 NALU，它可以处理更复杂的操作，如乘法和除法。我提供了可以尝试演示这些代码的代码，您可以阅读上述的论文了解更多详情。

第一神经网络（NAC）

神经累加器（简称 NAC）是其输入的一种线性变换。什么意思呢？它是一个转换矩阵，是 tanh（W_hat）和 sigmoid（M_hat）的元素乘积。最后，转换矩阵 W 乘以输入（x）。

Python 中的 NAC

import tensorflow as tf

# NAC

W_hat = tf.Variable(tf.truncated_normal(shape, stddev=0.02))

M_hat = tf.Variable(tf.truncated_normal(shape, stddev=0.02))

W = tf.tanh(W_hat) * tf.sigmoid(M_hat)

# Forward propogation

a = tf.matmul(in_dim, W)
复制代码
NAC

第二神经网络 (NALU)

神经算术逻辑单元，或者我们简称之为 NALU，是由两个 NAC 单元组成。第一个 NAC g 等于 sigmoid（Gx）。第二个 NAC 在一个等于 exp 的日志空间 m 中运行 (W(log(|x| + epsilon)))

Python 中的 NALU

import tensorflow as tf
# NALU
G = tf.Variable(tf.truncated_normal(shape, stddev=0.02))
m = tf.exp(tf.matmul(tf.log(tf.abs(in_dim) + epsilon), W))
g = tf.sigmoid(tf.matmul(in_dim, G))
y = g * a + (1 - g) * m

复制代码

NALU

通过学习添加来测试 NAC

现在让我们进行测试，首先将 NAC 转换为函数。

# Neural Accumulator
def NAC(in_dim, out_dim):
in_features = in_dim.shape[1]
# define W_hat and M_hat
W_hat = tf.get_variable(name = 'W_hat', initializer=tf.initializers.random_uniform(minval=-2, maxval=2),shape=[in_features, out_dim], trainable=True)
M_hat = tf.get_variable(name = 'M_hat', initializer=tf.initializers.random_uniform(minval=-2, maxval=2), shape=[in_features, out_dim], trainable=True)
W = tf.nn.tanh(W_hat) * tf.nn.sigmoid(M_hat)
a = tf.matmul(in_dim, W)
return a, W

复制代码

NAC function in Python

Python 中的 NAC 功能

接下来，让我们创建一些玩具数据，用于训练和测试数据。 NumPy 有一个名为 numpy.arrange 的优秀 API，我们将利用它来创建数据集。

# Generate a series of input number X1 and X2 for training
x1 = np.arange(0,10000,5, dtype=np.float32)
x2 = np.arange(5,10005,5, dtype=np.float32)
y_train = x1 + x2
x_train = np.column_stack((x1,x2))
print(x_train.shape)
print(y_train.shape)
# Generate a series of input number X1 and X2 for testing
x1 = np.arange(1000,2000,8, dtype=np.float32)
x2 = np.arange(1000,1500,4, dtype= np.float32)
x_test = np.column_stack((x1,x2))
y_test = x1 + x2
print()
print(x_test.shape)
print(y_test.shape)

复制代码

添加玩具数据

现在，我们可以定义样板代码来训练模型。我们首先定义占位符 X 和 Y，用以在运行时提供数据。接下来我们定义的是 NAC 网络（y_pred，W = NAC（in_dim = X，out_dim = 1））。对于损失，我们使用 tf.reduce_sum()。我们将有两个超参数，alpha，即学习率和我们想要训练网络的时期数。在运行训练循环之前，我们需要定义一个优化器，这样我们就可以使用 tf.train.AdamOptimizer() 来减少损失。

# Define the placeholder to feed the value at run time
X = tf.placeholder(dtype=tf.float32, shape =[None , 2]) # Number of samples x Number of features (number of inputs to be added)
Y = tf.placeholder(dtype=tf.float32, shape=[None,])
# define the network
# Here the network contains only one NAC cell (for testing)
y_pred, W = NAC(in_dim=X, out_dim=1)
y_pred = tf.squeeze(y_pred) # Remove extra dimensions if any
# Mean Square Error (MSE)
loss = tf.reduce_mean( (y_pred - Y) **2)
# training parameters
alpha = 0.05 # learning rate
epochs = 22000
optimize = tf.train.AdamOptimizer(learning_rate=alpha).minimize(loss)
with tf.Session() as sess:
#init = tf.global_variables_initializer()
cost_history = []
sess.run(tf.global_variables_initializer())
# pre training evaluate
print("Pre training MSE: ", sess.run (loss, feed_dict={X: x_test, Y:y_test}))
print()
for i in range(epochs):
_, cost = sess.run([optimize, loss ], feed_dict={X:x_train, Y: y_train})
print("epoch: {}, MSE: {}".format( i,cost) )
cost_history.append(cost)
# plot the MSE over each iteration
plt.plot(np.arange(epochs),np.log(cost_history)) # Plot MSE on log scale
plt.xlabel("Epoch")
plt.ylabel("MSE")
plt.show()
print()
print(W.eval())
print()
# post training loss
print("Post training MSE: ", sess.run(loss, feed_dict={X: x_test, Y: y_test}))
print("Actual sum: ", y_test[0:10])
print()
print("Predicted sum: ", sess.run(y_pred[0:10], feed_dict={X: x_test, Y: y_test}))

复制代码

训练之后，成本图的样子：

NAC 训练之后的成本

Actual sum:  [2000. 2012. 2024. 2036. 2048. 2060. 2072. 2084. 2096. 2108.]

Predicted sum:  [1999.9021 2011.9015 2023.9009 2035.9004 2047.8997 2059.8992 2071.8984
2083.898  2095.8975 2107.8967]

虽然 NAC 可以处理诸如加法和减法之类的操作，但是它无法处理乘法和除法。于是，就有了 NALU 的用武之地。它能够处理更复杂的操作，例如乘法和除法。

通过学习乘法来测试 NALU

为此，我们将添加片段以使 NAC 成为 NALU。

神经累加器（NAC）是其输入的线性变换。神经算术逻辑单元（NALU）使用两个带有绑定的权重的 NACs 来启用加法或者减法（较小的紫色单元）和乘法/除法（较大的紫色单元），由一个门（橙色单元）来控制。

# The Neural Arithmetic Logic Unit
def NALU(in_dim, out_dim):
shape = (int(in_dim.shape[-1]), out_dim)
epsilon = 1e-7
# NAC
W_hat = tf.Variable(tf.truncated_normal(shape, stddev=0.02))
M_hat = tf.Variable(tf.truncated_normal(shape, stddev=0.02))
G = tf.Variable(tf.truncated_normal(shape, stddev=0.02))
W = tf.tanh(W_hat) * tf.sigmoid(M_hat)
# Forward propogation
a = tf.matmul(in_dim, W)
# NALU
m = tf.exp(tf.matmul(tf.log(tf.abs(in_dim) + epsilon), W))
g = tf.sigmoid(tf.matmul(in_dim, G))
y = g * a + (1 - g) * m
return y

复制代码

Python 中的 NALU 函数

现在，再次创建一些玩具数据，这次我们将进行两行更改。

# Test the Network by learning the multiplication
# Generate a series of input number X1 and X2 for training
x1 = np.arange(0,10000,5, dtype=np.float32)
x2 = np.arange(5,10005,5, dtype=np.float32)
y_train = x1 * x2
x_train = np.column_stack((x1,x2))
print(x_train.shape)
print(y_train.shape)
# Generate a series of input number X1 and X2 for testing
x1 = np.arange(1000,2000,8, dtype=np.float32)
x2 = np.arange(1000,1500,4, dtype= np.float32)
x_test = np.column_stack((x1,x2))
y_test = x1 * x2
print()
print(x_test.shape)
print(y_test.shape)

复制代码

用于乘法的玩具数据

第 8 行和第 20 行是进行更改的地方，将加法运算符切换为乘法。

现在我们可以训练的是 NALU 网络。我们唯一需要更改的地方是定义 NAC 网络改成 NALU（y_pred = NALU（in_dim = X，out_dim = 1））。

# Define the placeholder to feed the value at run time
X = tf.placeholder(dtype=tf.float32, shape =[None , 2]) # Number of samples x Number of features (number of inputs to be added)
Y = tf.placeholder(dtype=tf.float32, shape=[None,])
# Define the network
# Here the network contains only one NAC cell (for testing)
y_pred = NALU(in_dim=X, out_dim=1)
y_pred = tf.squeeze(y_pred) # Remove extra dimensions if any
# Mean Square Error (MSE)
loss = tf.reduce_mean( (y_pred - Y) **2)
# training parameters
alpha = 0.05 # learning rate
epochs = 22000
optimize = tf.train.AdamOptimizer(learning_rate=alpha).minimize(loss)
with tf.Session() as sess:
#init = tf.global_variables_initializer()
cost_history = []
sess.run(tf.global_variables_initializer())
# pre training evaluate
print("Pre training MSE: ", sess.run (loss, feed_dict={X: x_test, Y: y_test}))
print()
for i in range(epochs):
_, cost = sess.run([optimize, loss ], feed_dict={X: x_train, Y: y_train})
print("epoch: {}, MSE: {}".format( i,cost) )
cost_history.append(cost)
# Plot the loss over each iteration
plt.plot(np.arange(epochs),np.log(cost_history)) # Plot MSE on log scale
plt.xlabel("Epoch")
plt.ylabel("MSE")
plt.show()
# post training loss
print("Post training MSE: ", sess.run(loss, feed_dict={X: x_test, Y: y_test}))
print("Actual product: ", y_test[0:10])
print()
print("Predicted product: ", sess.run(y_pred[0:10], feed_dict={X: x_test, Y: y_test}))

复制代码

NALU 训练后的成本
Actual product:  [1000000. 1012032. 1024128. 1036288. 1048512. 1060800. 1073152. 1085568.
1098048. 1110592.]

Predicted product:  [1000000.2  1012032. 1024127.56 1036288.6  1048512.06 1060800.8
1073151.6  1085567.6  1098047.6  1110592.8 ]

在 TensorFlow 中全面实现

注：链接地址

https://github.com/ahylton19/simpleNALU-tf

神经算术逻辑单元，你了解吗？

站长推荐 /2