如何使用Tensorflow slim模型训练自己的数据集

TfSlim简介

TfSlim提供的预训练模型

准备数据集,生成TFRecord文件

  1. 整理自己的图片数据集目录结构
    数据集根目录下建立train和val2个文件夹,分布放置训练数据和验证数据, 每个类别一个目录

  2. 生成TFRecord文件
    参考
    https://github.com/tensorflow/models/tree/master/research/inception#how-to-construct-a-new-dataset-for-retraining

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# location to where to save the TFRecord data.
OUTPUT_DIRECTORY=$HOME/my-custom-data/

# build the preprocessing script.
cd tensorflow-models/inception
bazel build //inception:build_image_data

# convert the data.
bazel-bin/inception/build_image_data \
--train_directory="${TRAIN_DIR}" \
--validation_directory="${VALIDATION_DIR}" \
--output_directory="${OUTPUT_DIRECTORY}" \
--labels_file="${LABELS_FILE}" \
--train_shards=128 \
--validation_shards=24 \
--num_threads=8

research/slim/datasets下创建自己的dataset文件,例如mydata.py
把flowers.py中内容复制过来,
按照数据实际情况修改下面几行:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
_FILE_PATTERN = 'flowers_%s_*.tfrecord'

SPLITS_TO_SIZES = {'train': 3320, 'validation': 350}

_NUM_CLASSES = 5

_ITEMS_TO_DESCRIPTIONS = {
'image': 'A color image of varying size.',
'label': 'A single integer between 0 and 4',
}

```python
修改`research/slim/datasets/dataset_factory.py`, 增加自己的数据集mydata
from datasets import mydata

datasets_map = {
'cifar10': cifar10,
'flowers': flowers,
'imagenet': imagenet,
'mnist': mnist,
'mydata': mydata,
}

## 从头训练模型(From Scratch)

具体参数需要按照实际训练情况修改
```shell

CUDA_VISIBLE_DEVICES=2 nohup python train_image_classifier.py --train_dir=/tmp/md_train --dataset_name=mydata --dataset_split_name=train --dataset_dir=/data5/mydata_tfrecording/ --model_name=mobilenet_v1 > /tmp/md.txt &

基于预训练模型优化(fune turning)

1
2
3
4
5
6
7
8
9
10
CUDA_VISIBLE_DEVICES=3 nohup  \
python train_image_classifier.py \
--train_dir=/tmp/m2_train \
--dataset_dir=/data5/zxt/fdata/log \
--dataset_name=dishes \
--dataset_split_name=train \
--model_name=mobilenet_v1 \
--checkpoint_path=/data5/model/mobilenet_v1_1.0_224.ckpt
--checkpoint_exclude_scopes=MobilenetV1/Logits,MobilenetV1/AuxLogits \
--trainable_scopes=MobilenetV1/Logits,MobilenetV1/AuxLogits > /tmp/m3.txt &

评估模型

修改代码错误research/slim/eval_image_classifier.py , 具体错误参考https://github.com/tensorflow/models/issues/694

1
2
3
4
5
#line 156修改
Change
slim.metrics.streaming_recall_at_k(logits, labels, 5)
to
slim.metrics.streaming_sparse_recall_at_k(logits, labels, 5)

然后运行就可以了!

1
CUDA_VISIBLE_DEVICES=2 python eval_image_classifier.py     --alsologtostderr     --checkpoint_path=/tmp/m2_train/ --eval_dir=/tmp/m2_eval --dataset_dir=/data5/zxt/fdata/log     --dataset_name=dishes     --dataset_split_name=validation --model_name=mobilenet_v1

最后放个可以同时训练多个模型的python脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import os
import tensorflow as tf

slim = tf.contrib.slim

SLIM_DIR = '/data5/zxt/models/research/slim/'
LOG_DIR = '/data5/zxt/flowers/train_log/'
MODELS = ['inception_v3', 'inception_resnet_v2']

model = "inception_v4"
DATASET_NAME = 'flowers'
DATASET_DIR = '/data5/zxt/flowers/log'

CMD_TRAIN = 'CUDA_VISIBLE_DEVICES={0} nohup python train_image_classifier.py --learning_rate=0.01 --num_epochs_per_decay=2.0 --optimizer=adam --train_dir={1}/{2}_train ' \
'--dataset_name={3} --dataset_dir={4} --dataset_split_name=train --model_name={2} > {1}/{3}_{2}_train.txt & '
CMD_VAL = 'CUDA_VISIBLE_DEVICES={0} nohup python eval_image_classifier.py --alsologtostderr --checkpoint_path={1}/{2}_train' \
' --eval_dir={1}/{2}_eval --dataset_name={3} --dataset_dir={4} --dataset_split_name=validation ' \
'--model_name={2} --preprocessing_name inception --eval_image_size 299 --eval_loop=True > {1}/{3}_{2}_eval.txt &'


def race():
# os.makedirs(LOG_DIR)
for index, model in enumerate(MODELS):
index += 1
# print(index)
cmd_train = CMD_TRAIN.format(index * 2 - 1, LOG_DIR, model, DATASET_NAME, DATASET_DIR)
cmd_eval = CMD_VAL.format(index*2, LOG_DIR, model, DATASET_NAME, DATASET_DIR)

print(cmd_train)
print(cmd_eval)
# os.system(cmd_train)
# os.system(cmd_eval)
# os.system("CUDA_VISIBLE_DEVICES=0 tensorboard --logdir {0} &".format(LOG_DIR))


if __name__ == '__main__':
race()

参考资料

本文独立博客地址

Contents
  1. 1. TfSlim简介
  2. 2. TfSlim提供的预训练模型
  3. 3. 准备数据集,生成TFRecord文件
  4. 4. 基于预训练模型优化(fune turning)
  5. 5. 评估模型
  6. 6. 参考资料
,