Building a Neural Network from Scratch in C++ for MNIST Digit Recognition

As a memorandum for a university assignment I implemented: I created a neural network from scratch in C++ to classify MNIST handwritten digits, implementing everything from matrix operations to backpropagation without using any deep learning frameworks.

repo: https://github.com/romophic/MNIST_cpp

1. Principles

I will explain how data is processed within the constructed neural network.

1.1 Flow from Input to Output

A neural network is made up of multiple stacked layers. Data flows in one direction from the input layer to the hidden layers, and finally to the output layer. This is called forward propagation.

When data is transmitted from one layer to the next, the strength of each signal is modified by weights. If we let the input data be $\bm{x}$ , the signal transmitted to the next layer be $\bm{u}$ , and the matrix consolidating the weights be $\bm{W}$ , this relationship can be expressed simply as a matrix product:

\bm{u} = \bm{W} \bm{x}

This equation means that signals from all neurons in the previous layer are multiplied by their respective weights and gathered at the neurons in the next layer. The gathered signal $\bm{u}$ is not sent to the next layer as is, but passes through a filter called an activation function. This determines the neuron’s firing (whether to transmit the signal to the next node). In this experiment, two types of functions were used.

For the hidden layer, the ReLU function was used. This outputs 0 if the input is negative, and outputs the value as is if it is positive.

f(x) = \max(0, x)

This allows filtering out unnecessary information and transmitting only the important features to the next layer. Because the calculation is extremely simple, it has the advantage of speeding up the learning process.

For the final output layer, the Sigmoid function was used. This converts any given value into a number between $0$ and $1$ .

f(x) = \frac{1}{1 + e^{-x}}

Since the output falls between $0$ and $1$ , it can be interpreted as the probability of the digit being a specific number.

1.2 Learning Mechanism (Backpropagation)

Initially, the weights $\bm{W}$ are set to random values, so the network will not produce the correct answers. Therefore, we calculate the error between the network’s answer and the training data. To minimize this error, we incrementally adjust the weights backward from the output layer to the input layer. By repeating this process, the neural network gradually becomes capable of making correct judgments.

2. Methodology

The implementation utilizes C++23 with Clang version 21.1.7 as the compiler. The Eigen library was used for matrix calculations. The details of the model constructed in this experiment are as follows:

Input layer: 784 nodes (corresponding to $28 \times 28$ pixel image data)
Hidden layer: 128 nodes (Activation function: ReLU)
Output layer: 10 nodes (Activation function: Sigmoid, corresponding to digits 0-9)

To ensure smooth learning, the initial values of the weights must be carefully determined. He initialization was used for the hidden layer weights, and Xavier initialization was used for the output layer weights. These methods set random numbers to preserve the variance of the data, which prevents the learning process from stalling early on.

2.1 Learning Algorithm

The source code for the learning component is explained in stages. Below is an overview of the implementation of the class NeuralNetwork, which handles the neural network.

Listing 1: NeuralNetwork Class Implementation (Overview)

1
class NeuralNetwork {
2
 private:
3
  Eigen::MatrixXd w_ih;            // Input layer -> Hidden layer weights
4
  Eigen::MatrixXd w_ho;            // Hidden layer -> Output layer weights
5
  Eigen::VectorXd hidden_outputs;  // Hidden layer outputs
6

7
 public:
8
  // Initialization
9
  NeuralNetwork();
10
  // Perform forward propagation
11
  Eigen::VectorXd query(const Eigen::VectorXd& _inputs);
12
  // Train
13
  void train(const Eigen::VectorXd& _inputs, const Eigen::VectorXd& _targets) {}
14
};

It holds the weight matrices for the input, hidden, and output layers as variables. Next, the content of NeuralNetwork responsible for initialization is shown below.

Listing 2: Initialization Implementation

1
NeuralNetwork() {
2
  // Hidden layer
3
  double weight_scale_ih = sqrt(2.0 / INPUT_NODES); // He initialization coefficient
4
  w_ih = Eigen::MatrixXd::Random(HIDDEN_NODES, INPUT_NODES) * weight_scale_ih; // He initialization
5
  // Output layer
6
  double weight_scale_ho = sqrt(1.0 / HIDDEN_NODES);  // Xavier initialization coefficient
7
  w_ho = Eigen::MatrixXd::Random(OUTPUT_NODES, HIDDEN_NODES) * weight_scale_ho;  // Xavier initialization
8
}

The contents of the function query, which performs forward propagation, are shown below.

Listing 3: Forward Propagation Implementation

1
// Perform forward propagation
2
Eigen::VectorXd query(const Eigen::VectorXd& _inputs) {
3
  // Hidden layer
4
  Eigen::VectorXd hidden_inputs = w_ih * _inputs;   // The result of multiplying the weight matrix and the input matrix is the output
5
  hidden_outputs = hidden_inputs.unaryExpr(&relu);  // Apply ReLU to the output
6

7
  // Output layer
8
  Eigen::VectorXd final_inputs =
9
      w_ho * hidden_outputs;  // The result of multiplying the weight matrix and the hidden layer output matrix is the output
10
  Eigen::VectorXd final_outputs = final_inputs.unaryExpr(&sigmoid);  // Apply Sigmoid to the output
11
  return final_outputs;
12
}

Finally, the implementation of train, the section where learning occurs, is explained.

Listing 4: Learning (Backpropagation) Implementation

1
// Learning
2
void train(const Eigen::VectorXd& _inputs, const Eigen::VectorXd& _targets) {
3
  Eigen::VectorXd final_outputs = query(_inputs);  // Forward propagation
4

5
  Eigen::VectorXd output_errors = _targets - final_outputs;          // Error calculation
6
  Eigen::VectorXd hidden_errors = w_ho.transpose() * output_errors;  // Hidden layer error calculation
7

8
  // Output layer gradients
9
  Eigen::VectorXd output_gradients =
10
      output_errors.cwiseProduct(final_outputs.unaryExpr(&sigmoid_d));
11
  w_ho += LEARNING_RATE * (output_gradients * hidden_outputs.transpose());  // Update output layer weights
12

13
  // Hidden layer gradients
14
  Eigen::VectorXd hidden_gradients =
15
      hidden_errors.cwiseProduct(hidden_outputs.unaryExpr(&relu_d));
16
  w_ih += LEARNING_RATE * (hidden_gradients * _inputs.transpose());  // Update hidden layer weights
17
}

2.2 Loading the MNIST Dataset

The MNIST dataset consists of the following 4 files. Since they are recorded in big-endian format, they must be read 4 bytes at a time, reversing the byte order to reconstruct the numerical values.

train-images-idx3-ubyte: Training image data
train-labels-idx1-ubyte: Training label data
t10k-images-idx3-ubyte: Test image data
t10k-labels-idx1-ubyte: Test label data

The actual image data follows the header, with each pixel stored as a value from 0 to 255. Upon reading, all pixel values were divided by 255 to normalize them before being used as input data.

2.3 Source Code

The full source code used in the experiment is provided below.

Listing 5: main.cpp (Full text)

1
#include <algorithm>
2
#include <cmath>
3
#include <cstdlib>
4
#include <fstream>
5
#include <iostream>
6
#include <vector>
7

8
#include "Eigen/Core"
9
#include "Eigen/Dense"
10

11
using namespace std;
12

13
constexpr int INPUT_NODES = 784;        // 入力層のノード数
14
constexpr int HIDDEN_NODES = 128;       // 隠れ層のノード数
15
constexpr int OUTPUT_NODES = 10;        // 出力層のノード数
16
constexpr double LEARNING_RATE = 0.01;  // 学習率
17
constexpr int EPOCHS = 10;              // エポック数
18

19
double sigmoid(double _x) { return 1.0 / (1.0 + exp(-_x)); }  // sigmoid関数
20
double sigmoid_d(double _x) { return _x * (1.0 - _x); }       // sigmoidの微分
21

22
double relu(double _x) { return max(0.0, _x); }            // relu関数
23
double relu_d(double _y) { return _y > 0.0 ? 1.0 : 0.0; }  // reluの微分
24

25
class NeuralNetwork {
26
 private:
27
  Eigen::MatrixXd w_ih;            // 入力層 -> 隠れ層の重み
28
  Eigen::MatrixXd w_ho;            // 隠れ層 -> 出力層の重み
29
  Eigen::VectorXd hidden_outputs;  // 隠れ層の出力
30

31
 public:
32
  // 初期化
33
  NeuralNetwork() {
34
    // 隠れ層
35
    double weight_scale_ih = sqrt(2.0 / INPUT_NODES);                             // He初期化係数
36
    w_ih = Eigen::MatrixXd::Random(HIDDEN_NODES, INPUT_NODES) * weight_scale_ih;  // He初期化
37

38
    // 出力層
39
    double weight_scale_ho = sqrt(1.0 / HIDDEN_NODES);  // Xavier初期化係数
40
    w_ho = Eigen::MatrixXd::Random(OUTPUT_NODES, HIDDEN_NODES) * weight_scale_ho;  // Xavier初期化
41
  }
42

43
  // 順伝播を行う
44
  Eigen::VectorXd query(const Eigen::VectorXd& _inputs) {
45
    // 隠れ層
46
    Eigen::VectorXd hidden_inputs = w_ih * _inputs;   // 重み行列と入力行列を掛けた結果を出力とする
47
    hidden_outputs = hidden_inputs.unaryExpr(&relu);  // 出力にReluを適応する
48

49
    // 出力層
50
    Eigen::VectorXd final_inputs =
51
        w_ho * hidden_outputs;  // 重み行列と隠れ層の出力行列を掛けた結果を出力とする
52
    Eigen::VectorXd final_outputs = final_inputs.unaryExpr(&sigmoid);  // 出力にSigmoidを適応する
53
    return final_outputs;
54
  }
55

56
  // 学習
57
  void train(const Eigen::VectorXd& _inputs, const Eigen::VectorXd& _targets) {
58
    Eigen::VectorXd final_outputs = query(_inputs);  // 順伝搬
59

60
    Eigen::VectorXd output_errors = _targets - final_outputs;          // 誤差計算
61
    Eigen::VectorXd hidden_errors = w_ho.transpose() * output_errors;  // 隠れ層の誤差計算
62

63
    // 出力層の勾配
64
    Eigen::VectorXd output_gradients =
65
        output_errors.cwiseProduct(final_outputs.unaryExpr(&sigmoid_d));
66
    w_ho += LEARNING_RATE * (output_gradients * hidden_outputs.transpose());  // 出力層の重み更新
67

68
    // 隠れ層の勾配
69
    Eigen::VectorXd hidden_gradients =
70
        hidden_errors.cwiseProduct(hidden_outputs.unaryExpr(&relu_d));
71
    w_ih += LEARNING_RATE * (hidden_gradients * _inputs.transpose());  // 隠れ層の重み更新
72
  }
73
};
74

75
int read_int(ifstream& file) {
76
  unsigned char bytes[4];
77
  file.read((char*)bytes, 4);
78
  return (bytes[0] << 24) | (bytes[1] << 16) | (bytes[2] << 8) | bytes[3];
79
}
80

81
void load_mnist(const string& _image_path, const string& _label_path,
82
                vector<Eigen::VectorXd>& _images, vector<int>& _labels) {
83
  ifstream img_file(_image_path, ios::binary);
84
  ifstream lbl_file(_label_path, ios::binary);
85

86
  if (not(img_file.is_open() and lbl_file.is_open())) exit(1);
87

88
  read_int(img_file);
89

90
  int num_items = read_int(img_file);
91
  int rows = read_int(img_file);
92
  int cols = read_int(img_file);
93

94
  cout << "num_items: " << num_items << endl;
95
  cout << "rows: " << rows << endl;
96
  cout << "cols: " << cols << endl;
97

98
  read_int(lbl_file);
99
  read_int(lbl_file);
100

101
  _images.reserve(num_items);
102
  _labels.resize(num_items);
103

104
  for (int i = 0; i < num_items; ++i) {
105
    unsigned char label;
106
    lbl_file.read((char*)&label, 1);
107
    _labels[i] = (int)label;
108
    Eigen::VectorXd img_vec(rows * cols);
109
    for (int j = 0; j < rows * cols; ++j) {
110
      unsigned char pixel;
111
      img_file.read((char*)&pixel, 1);
112
      img_vec[j] = pixel / 255.0;
113
    }
114
    _images.emplace_back(img_vec);
115
  }
116
}
117

118
int main() {
119
  vector<Eigen::VectorXd> train_images, test_images;
120
  vector<int> train_labels, test_labels;
121

122
  load_mnist("train-images.idx3-ubyte", "train-labels.idx1-ubyte", train_images, train_labels);
123
  load_mnist("t10k-images.idx3-ubyte", "t10k-labels.idx1-ubyte", test_images, test_labels);
124

125
  NeuralNetwork nn;
126

127
  for (int epoch = 1; epoch <= EPOCHS; ++epoch) {
128
    for (size_t i = 0; i < train_images.size(); ++i) {
129
      Eigen::VectorXd targets = Eigen::VectorXd::Constant(OUTPUT_NODES, 0.01);
130
      targets[train_labels[i]] = 0.99;
131
      nn.train(train_images[i], targets);
132
    }
133
    cout << "Epoch " << epoch << " done" << endl;
134
  }
135

136
  int correct_count = 0;
137
  for (size_t i = 0; i < test_images.size(); ++i) {
138
    Eigen::VectorXd outputs = nn.query(test_images[i]);
139
    int predicted_label;
140
    outputs.maxCoeff(&predicted_label);
141
    if (predicted_label == test_labels[i])
142
      correct_count++;
143
  }
144

145
  double accuracy = (double)correct_count / test_images.size() * 100.0;
146
  cout << "Accuracy: " << accuracy << "%" << endl;
147

148
  return 0;
149
}

Compilation Command:

clang++ -std=c++23 -O3 -march=native main.cpp

3. Results

Using 60,000 training images, learning was conducted for 10 epochs, and the accuracy was measured with 10,000 test images. The recognition accuracy was approximately 97.53%, confirming a high level of precision.

4. Discussion

Activation Function: Using ReLU in the hidden layer prevented the vanishing gradient problem, achieving efficient learning.
Learning Rate: A setting of 0.01 proved appropriate for both stability and convergence speed.

Building a Neural Network from Scratch in C++ for MNIST Digit Recognition

#1. Principles

#1.1 Flow from Input to Output

#1.2 Learning Mechanism (Backpropagation)

#2. Methodology

#2.1 Learning Algorithm

#2.2 Loading the MNIST Dataset

#2.3 Source Code

#3. Results

#4. Discussion

#5. References