如何使用Tensorflow slim模型训练自己的数据集
TfSlim简介
TfSlim提供的预训练模型
准备数据集,生成TFRecord文件
整理自己的图片数据集目录结构
数据集根目录下建立train和val2个文件夹,分布放置训练数据和验证数据, 每个类别一个目录
生成TFRecord文件
参考
https://github.com/tensorflow/models/tree/master/research/inception#how-to-construct-a-new-dataset-for-retraining
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| # location to where to save the TFRecord data. OUTPUT_DIRECTORY=$HOME/my-custom-data/
# build the preprocessing script. cd tensorflow-models/inception bazel build //inception:build_image_data
# convert the data. bazel-bin/inception/build_image_data \ --train_directory="${TRAIN_DIR}" \ --validation_directory="${VALIDATION_DIR}" \ --output_directory="${OUTPUT_DIRECTORY}" \ --labels_file="${LABELS_FILE}" \ --train_shards=128 \ --validation_shards=24 \ --num_threads=8
|
在research/slim/datasets
下创建自己的dataset文件,例如mydata.py
把flowers.py中内容复制过来,
按照数据实际情况修改下面几行:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
| _FILE_PATTERN = 'flowers_%s_*.tfrecord'
SPLITS_TO_SIZES = {'train': 3320, 'validation': 350}
_NUM_CLASSES = 5
_ITEMS_TO_DESCRIPTIONS = { 'image': 'A color image of varying size.', 'label': 'A single integer between 0 and 4', }
```python 修改`research/slim/datasets/dataset_factory.py`, 增加自己的数据集mydata from datasets import mydata
datasets_map = { 'cifar10': cifar10, 'flowers': flowers, 'imagenet': imagenet, 'mnist': mnist, 'mydata': mydata, }
具体参数需要按照实际训练情况修改 ```shell
CUDA_VISIBLE_DEVICES=2 nohup python train_image_classifier.py --train_dir=/tmp/md_train --dataset_name=mydata --dataset_split_name=train --dataset_dir=/data5/mydata_tfrecording/ --model_name=mobilenet_v1 > /tmp/md.txt &
|
基于预训练模型优化(fune turning)
1 2 3 4 5 6 7 8 9 10
| CUDA_VISIBLE_DEVICES=3 nohup \ python train_image_classifier.py \ --train_dir=/tmp/m2_train \ --dataset_dir=/data5/zxt/fdata/log \ --dataset_name=dishes \ --dataset_split_name=train \ --model_name=mobilenet_v1 \ --checkpoint_path=/data5/model/mobilenet_v1_1.0_224.ckpt --checkpoint_exclude_scopes=MobilenetV1/Logits,MobilenetV1/AuxLogits \ --trainable_scopes=MobilenetV1/Logits,MobilenetV1/AuxLogits > /tmp/m3.txt &
|
评估模型
修改代码错误research/slim/eval_image_classifier.py
, 具体错误参考https://github.com/tensorflow/models/issues/694
1 2 3 4 5
| Change slim.metrics.streaming_recall_at_k(logits, labels, 5) to slim.metrics.streaming_sparse_recall_at_k(logits, labels, 5)
|
然后运行就可以了!
1
| CUDA_VISIBLE_DEVICES=2 python eval_image_classifier.py --alsologtostderr --checkpoint_path=/tmp/m2_train/ --eval_dir=/tmp/m2_eval --dataset_dir=/data5/zxt/fdata/log --dataset_name=dishes --dataset_split_name=validation --model_name=mobilenet_v1
|
最后放个可以同时训练多个模型的python脚本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
| import os import tensorflow as tf
slim = tf.contrib.slim
SLIM_DIR = '/data5/zxt/models/research/slim/' LOG_DIR = '/data5/zxt/flowers/train_log/' MODELS = ['inception_v3', 'inception_resnet_v2']
model = "inception_v4" DATASET_NAME = 'flowers' DATASET_DIR = '/data5/zxt/flowers/log'
CMD_TRAIN = 'CUDA_VISIBLE_DEVICES={0} nohup python train_image_classifier.py --learning_rate=0.01 --num_epochs_per_decay=2.0 --optimizer=adam --train_dir={1}/{2}_train ' \ '--dataset_name={3} --dataset_dir={4} --dataset_split_name=train --model_name={2} > {1}/{3}_{2}_train.txt & ' CMD_VAL = 'CUDA_VISIBLE_DEVICES={0} nohup python eval_image_classifier.py --alsologtostderr --checkpoint_path={1}/{2}_train' \ ' --eval_dir={1}/{2}_eval --dataset_name={3} --dataset_dir={4} --dataset_split_name=validation ' \ '--model_name={2} --preprocessing_name inception --eval_image_size 299 --eval_loop=True > {1}/{3}_{2}_eval.txt &'
def race(): for index, model in enumerate(MODELS): index += 1 cmd_train = CMD_TRAIN.format(index * 2 - 1, LOG_DIR, model, DATASET_NAME, DATASET_DIR) cmd_eval = CMD_VAL.format(index*2, LOG_DIR, model, DATASET_NAME, DATASET_DIR)
print(cmd_train) print(cmd_eval)
if __name__ == '__main__': race()
|
参考资料
本文独立博客地址