2016-04-21

Caffe Training Experiments

Caffe is an open-source deep learning framework originally created by Yangqing Jia which allows you to leverage your GPU for training neural networks. As opposed to other deep learning frameworks like Theano or Torch you don’t have to program the algorithms yourself; instead you specify your network by means of configuration files. Obviously this approach is less time consuming than programming everything on your own, but it also forces you to stay within the boundaries of the framework, of course. Practically though this won’t matter most of the time as the framework Caffe provides is quite powerful and continuously advanced.

Defining the Model and Meta-Parameters

Training of a model and its application requires at least three configuration files. The format of those configuration files follows an interface description language called protocol buffers. It supeficially resembles JSON but is significantly different and actually supposed to replace it in use cases where the data document needs to be validateable (by means of a custom schema – like this one for Caffe) and serializable.

For training you need one prototxt-file keeping the meta-parameters (config.prototxt) of the training and the model and another for defining the graph of the network (model_train_test.prototxt) – connecting the layers in an acyclical and directed fashion. Note that the data flows from bottom to top with regards to how the order of layers is specified. The example network here is composed of five layers:

data layer (one for TRAINing and one for TESTing)
inner product layer (the weights I)
rectified linear units (the hidden layer)
inner product layer (the weights II)
output layer (Soft Max for classification)
A. soft max layer giving the loss
B. accuracy layer – so we can see how the network improves while training.