ImageNet
This part is a simplified instruction of training own data using Caffe
1. Preparing the dataset
The final dataset should be like this:1
2
3
4
5
6
7
8
9
10
11
12
13data
-- train
-- train1.jpg
-- train2.jpg
-- train3.jpg
......
-- val
-- val1.jpg
-- val2.jpg
-- val3.jpg
......
-- train.txt
-- val.txt
and if the data is like this in python
X_train.shape = (5000, 1, 256, 256)
X_val.shape = (2000, 1, 256, 256)
X_test.shape = (500, 1, 256, 256)
y_train = (5000,)
y_val = (2000,)
y_test = (500,)
The following code can be used to generate the structure of dataset1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20if not os.path.exists('data'):
os.makedirs('data')
os.chdir('data')
## save image
if not os.path.exists('train'):
os.makedirs('train')
for i in xrange(Numtrain):
name = 'train' + str(i) + '.jpg'
xx = X_train[i,]
xx = xx.transpose(1,2,0)
xx = (xx - np.min(xx)) / (np.max(xx)-np.min(xx))*255.0
img = Image.fromarray(xx.astype(np.uint8), 'RGB') ## Notice that the function Image.fromarray() can only save image
## of the form 'uint8' type !!!
img.save('train/' + name)
............
Create a new file named “cell” under the “examples/imagenet” path and put the data folder into it
copy the file “examples/iamgenet/creat_imagenet.sh into newly created folder “cell” and change it as follows1
2
3
4
5
6
7
8EXAMPLE=examples/imagenet/cell
DATA=examples/imagenet/cell
TOOLS=build/tools
TRAIN_DATA_ROOT=examples/imagenet/cell/data/train
VAL_DATA_ROOT=examples/imagenet/cell/data/val
RESIZE=true # if the images do not need resize, set it to "false"
run command: ./examples/imagenet/myself/create_imagenet.sh
then lmdb file will be generated under cell folder.
2.
ERROR & REASION
Error 1: Out of memory
1 | F0420 13:29:52.527748 10836 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory |
- Reason
ERROR reasion: batch_size is too large!!! you should change the batch_size in train_val.prototxt file
Notice: When you set the configuration of caffe model, something you should notice1
2
3
4
5
6
7
8
9
10
11
12
13
14
15"batch_size": should not be larger than 100 if you use a single GPU to train the network.
"stepsize": should smaller than "max_iter" and also should be the divisor of "max_iter".
"test_iter": should be the multiple of (val_size / batch_size) [val_size is the size of val data, and the batch_size describe the batch size of val data].
"test_interval" should better be the multiple of (train_size / batch_size) [train_size is the size of your train data, and the batch_size describe the batch size of train data].
Since the result is better when you train on your whole train dataset. and this variable determine the model should be test after every how many iterations.
"stepsize": determine the "base_lr" should be reduced by "weight_decay" value after how many times iterations.
"snapshot": determine the intermidiate results should be stored every how many times of iterations.
"snapshot_prefix": determine the directory of storing the snapshot results
One example of prototxt configuration:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
net: "examples/imagenet/cell/train_val.prototxt"
test_iter: 40
test_interval: 40
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 10000
momentum: 0.9
weight_decay: 0.0005
snapshot: 1000
snapshot_prefix: "examples/imagenet/cell/model/caffenet_train"
solver_mode: GPU
name: "CaffeNet"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 227
mean_file: "examples/imagenet/cell/imagenet_mean.binaryproto"
}
data_param {
source: "examples/imagenet/cell/train_lmdb"
batch_size: 250
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 227
mean_file: "examples/imagenet/cell/imagenet_mean.binaryproto"
}
data_param {
source: "examples/imagenet/cell/val_lmdb"
batch_size: 50
backend: LMDB
}
}