基于矩阵的计算

上面一章的 `update_mini_batch` 函数是每个数据集独立计算后向传播,后面再做平均值,

    def update_mini_batch(self, mini_batch, eta):
        """Update the network's weights and biases by applying
        gradient descent using backpropagation to a single mini batch.
        The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
        is the learning rate."""
        # 初始化梯度值矩阵为 0
        # Nabla算子,在中文中也叫向量微分算子、劈形算子、倒三角算子
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            # 迭代计算梯度矩阵和
            # 获取当前样本通过反向传播算法得到的 delta 梯度值
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            # 把 mini_batch 里面每个数据算出来的梯度做加和,后面再取平均
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        # 把梯度值取平均,并乘以系数 eta,然后更新权重和偏置矩阵
        self.weights = [w-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb for b, nb in zip(self.biases, nabla_b)]

这里介绍一下如何通过矩阵的方式直接计算一个 mini batch 的梯度值向量。代码来源:https://github.com/hindSellouk/Matrix-Based-Backpropagation/blob/master/Network1.py

执行代码:

基于矩阵运算的算法一个 Epoch 大约为 3 秒:

不基于矩阵运算的算法一个 Epoch 大约为 11 秒:

Last updated