A small, extensible neural network library, written from scratch in Julia, that makes it very easy to define and train models consisting of the following layer types:
Conv: n-dimensional convolutional layerPool: mean pooling, combine neuron clustersFlatten: flatten output ofConvto vectorDense: dense/fully connected layerLSTM: long short-term memory cell
⚠️ Warning
Further optimization and testing of the convolutional layer type with higher-dimensional datasets is needed. Until then, the network is rather slow and unexpected errors might occur.
First, initialize the neural network by chaining different layers and storing them in a vector.
include("nn.jl")
layers = [Conv(1 => 2, (28, 28), (5, 5)),
Pool(2, 2),
Conv(2 => 3, (12, 12), (5, 5)),
Pool(2, 2),
Flatten(3, (4, 4)),
Dense(48 => 24),
Dense(24 => 10)]Then train the network on a data batch of type Data (defined in
nn.jl). The train!() function modifies the networks parameters
based on the average gradient across all data points. Optionally, the
learning rate η can be passed (default η=1.0). The function returns
the average loss of the network.
train!(layers, batch, η=1.5)In order to achieve stochastic gradient descent, the train!() function
can be called from a for-loop. The forward!() and loss() function
can also be called manually. Have a look at the examples.
Note
train_seq!(),forward_seq!()andbackprop_seq!()are currently used for sequential datasets. However, I plan to improve the interface, astrain()!andtrain_seq!()appear almost identical.
The forward pass and gradient equations of fully connected (dense) layers are available in my Multilayer Perceptron (MLP) repository. And the forward pass of a convolutional layer is defined by this equation:
Based on the above equation, one can infer the partial derivatives of the biases, kernels and activations in a convolutional layer with respect to the loss / cost using the chain rule.