TensorFlow 2对象检测API教程 – TaterLi 个人博客

环境要求:

Anaconda3 (推荐,如果没有就很麻烦!)
GPU + CUDA (推荐,如果没有会很慢!)
Ubuntu xx.xx LTS (推荐,Windows下各种问题比较多!)
内存:16GB 硬盘:40GB

如果任何一个没有你都会很麻烦但是可以解决,我这里一次性推荐使用TensorDock,一个GPU云服务,开机自动配置好一切,短期使用价格合理.

创建虚拟环境并验证安装:

conda create -n tensorflow pip python=3.9
conda activate tensorflow
pip install --ignore-installed --upgrade tensorflow
python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

有以下关键输出代表OK.

2022-05-29 03:02:09.203319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 102 MB memory:  -> device: 0, name: Tesla V100-FHHL-16GB, pci bus id: 0000:05:00.0, compute capability: 7.0
tf.Tensor(220.2801, shape=(), dtype=float32)

为了快速实验,我已经把需要的实验代码创建成repo.

git clone --recursive https://github.com/nickfox-taterli/tensorflow-object-detection-api-tutorial-with-tf2

下载并把protoc放到/usr/local/bin或者其他等效目录,下载地址:

https://github.com/google/protobuf/releases

我选择3.19.1,不是版本越新越好,重要是和TF匹配,先下载较新的,发现不好用再一个一个版本回退.

执行一些模型转换:

# From within ~/tensorflow-object-detection-api-tutorial-with-tf2/models/research/
protoc object_detection/protos/*.proto --python_out=.

安装coco API.

cd ~
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
make
cp -r pycocotools ~/tensorflow-object-detection-api-tutorial-with-tf2/models/research/

安装对象检测API.

# From within ~/tensorflow-object-detection-api-tutorial-with-tf2/models/research/
cp object_detection/packages/tf2/setup.py .
python -m pip install --use-feature=2020-resolver .

最后测试安装:

# From within ~/tensorflow-object-detection-api-tutorial-with-tf2/models/research/
python object_detection/builders/model_builder_tf2_test.py

测试结果如下为正常.

# ... 省略很多 
[       OK ] ModelBuilderTF2Test.test_invalid_second_stage_batch_size
[ RUN      ] ModelBuilderTF2Test.test_session
[  SKIPPED ] ModelBuilderTF2Test.test_session
[ RUN      ] ModelBuilderTF2Test.test_unknown_faster_rcnn_feature_extractor
INFO:tensorflow:time(__main__.ModelBuilderTF2Test.test_unknown_faster_rcnn_feature_extractor): 0.0s
I0608 18:49:13.193742 29296 test_util.py:2102] time(__main__.ModelBuilderTF2Test.test_unknown_faster_rcnn_feature_extractor): 0.0s
[       OK ] ModelBuilderTF2Test.test_unknown_faster_rcnn_feature_extractor
[ RUN      ] ModelBuilderTF2Test.test_unknown_meta_architecture
INFO:tensorflow:time(__main__.ModelBuilderTF2Test.test_unknown_meta_architecture): 0.0s
I0608 18:49:13.195241 29296 test_util.py:2102] time(__main__.ModelBuilderTF2Test.test_unknown_meta_architecture): 0.0s
[       OK ] ModelBuilderTF2Test.test_unknown_meta_architecture
[ RUN      ] ModelBuilderTF2Test.test_unknown_ssd_feature_extractor
INFO:tensorflow:time(__main__.ModelBuilderTF2Test.test_unknown_ssd_feature_extractor): 0.0s
I0608 18:49:13.197239 29296 test_util.py:2102] time(__main__.ModelBuilderTF2Test.test_unknown_ssd_feature_extractor): 0.0s
[       OK ] ModelBuilderTF2Test.test_unknown_ssd_feature_extractor
----------------------------------------------------------------------
Ran 24 tests in 29.980s

OK (skipped=1)

在image-origin目录包含很多猫猫狗狗的照片,他们都是通过labelImg(具体用法网上自己查)标记的,分别包含对应的xml和jpeg,在实际使用中,应该自己拍非常多的照片并标记,由于这个很费时间,所以这里就先准备好数据了.

还要给数据设置标签,这个标签和labelimg是要一一对应的,文件我放在training_demo/annotations/label_map.pbtxt.

item {
    id: 1
    name: 'cat'
}

item {
    id: 2
    name: 'dog'
}

切割数据并生成tfrecord,建议阅读学习一下脚本.

# From within ~/tensorflow-object-detection-api-tutorial-with-tf2/script/preprocessing
python partition_dataset.py -x -i ../../workspace/training_demo/images-origin -o ../../workspace/training_demo/images -r 0.1
python generate_tfrecord.py -x ../../workspace/training_demo/images/train -l ../../workspace/training_demo/annotations/label_map.pbtxt -o ../../workspace/training_demo/annotations/train.record
python generate_tfrecord.py -x ../../workspace/training_demo/images/test -l ../../workspace/training_demo/annotations/label_map.pbtxt -o ../../workspace/training_demo/annotations/test.record

为了配置训练作业,一般去Model Zoo剽一个.

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md

比如我下载一个SSD ResNet50 V1 FPN 640x640 (RetinaNet50)的模型,找到里面的pipeline.config,复制到http://models/my_ssd_resnet50_v1_fpn/中,当然这个目录也是我自己创建,可以是任意名字,光复制文件还不行,还有部分内容需要修改,具体可以参考我库里的文件对比.

num_classes 有多少个不同的Label,我们测试只有2类.
batch_size 一次取多少,取得多利用率高,内存也要的多.
fine_tune_checkpoint 使用模型原有参数继续学习,需要把下载的模型checkpoint解压到指定的路径.我的库里指定了,但是文件由于比较大,未包含在库.
use_bfloat16 由于没有在TPU训练,所以false.
label_map_path 标签文件 (pbtxt,训练和验证都有)
input_path 数据输入(tfrecord,训练和验证都有)
metrics_set 优化目标

把预训练模型解压,并放到fine_tune_checkpoint指定路径上.

cd ~
mkdir ~/tensorflow-object-detection-api-tutorial-with-tf2/workspace/training_demo/pre-trained-models
wget http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz
tar xvf ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz -C ~/tensorflow-object-detection-api-tutorial-with-tf2/workspace/training_demo/pre-trained-models

把models/research/object_detection/model_main_tf2.py复制到training_demo目录中开始训练(建议在screen中运行):

python model_main_tf2.py --model_dir=models/my_ssd_resnet50_v1_fpn --pipeline_config_path=models/my_ssd_resnet50_v1_fpn/pipeline.config

根据你显卡性能不同,大概等一会能看到每100步的输出.

I0529 03:25:09.143892 139699402731712 model_lib_v2.py:708] {'Loss/classification_loss': 0.03322457,
 'Loss/localization_loss': 0.049729176,
 'Loss/regularization_loss': 0.15188748,
 'Loss/total_loss': 0.23484123,
 'learning_rate': 0.029921034}
INFO:tensorflow:Step 9800 per-step time 0.484s
I0529 03:25:57.500304 139699402731712 model_lib_v2.py:705] Step 9800 per-step time 0.484s
INFO:tensorflow:{'Loss/classification_loss': 0.023190781,
 'Loss/localization_loss': 0.06909638,
 'Loss/regularization_loss': 0.15175259,
 'Loss/total_loss': 0.24403974,
 'learning_rate': 0.029682912}
I0529 03:25:57.500637 139699402731712 model_lib_v2.py:708] {'Loss/classification_loss': 0.023190781,
 'Loss/localization_loss': 0.06909638,
 'Loss/regularization_loss': 0.15175259,
 'Loss/total_loss': 0.24403974,
 'learning_rate': 0.029682912}
INFO:tensorflow:Step 9900 per-step time 0.485s
I0529 03:26:46.023310 139699402731712 model_lib_v2.py:705] Step 9900 per-step time 0.485s
INFO:tensorflow:{'Loss/classification_loss': 0.06559964,
 'Loss/localization_loss': 0.068514176,
 'Loss/regularization_loss': 0.15113617,
 'Loss/total_loss': 0.28524998,
 'learning_rate': 0.029442988}
I0529 03:26:46.023681 139699402731712 model_lib_v2.py:708] {'Loss/classification_loss': 0.06559964,
 'Loss/localization_loss': 0.068514176,
 'Loss/regularization_loss': 0.15113617,
 'Loss/total_loss': 0.28524998,
 'learning_rate': 0.029442988}

在另外一个窗口打开TensorBoard:

 tensorboard --logdir=models/my_ssd_resnet50_v1_fpn --host=0.0.0.0

在Web能看到训练进度,如果你使用CPU训练,可能需要数个小时才能看到数据,像我这个例子中的使用V100 GPU训练只需要在几分钟就能看到数据,并且我们可以看到太强大了,loss在下快速下降了.

训练完成后,我们再开一个进行评估测试.

python model_main_tf2.py --model_dir=models/my_ssd_resnet50_v1_fpn --pipeline_config_path=models/my_ssd_resnet50_v1_fpn/pipeline.config --checkpoint_dir=models/my_ssd_resnet50_v1_fpn

之后便可以看到eval数据.

对模型满意的话,导出模型,同样导出脚本来自models/research/object_detection/.

python exporter_main_v2.py --input_type image_tensor --pipeline_config_path models/my_ssd_resnet50_v1_fpn/pipeline.config --trained_checkpoint_dir models/my_ssd_resnet50_v1_fpn --output_directory exported-models/my_model

现在训练是完成了,把我们保存的都拉下来,如果日后继续训练记得不要忘了checkpoint,不如来试试效果.

test_from_saved_model.py 是检测单个图片用途.

webcam_test.py 视频测试:

发表回复 取消回复

发表回复取消回复