数据集的获取
tiny-kinetics-400数据集的获取
参考:Tramac/tiny-kinetics-400: Tiny Kinetics-400 for test
也已转存到天翼云盘:https://cloud.189.cn/web/share?code=iy6NZjERvuIv(访问码:6cbv)
服务器租用&依赖环境安装
此处用的是funHPC的四毛六一小时的P40 24G,环境选择 pytorch1.9/python3.8/cuda11.1
官方教程(安装 — MMAction2 1.2.0 文档)上面讲了安装torch之类的,因为我这里是用的云服务器,直接就是部署好了的,不再重新安装
首先pip更换科大源
pip config set global.index-url https://mirrors.ustc.edu.cn/pypi/simple
根据官方教程 安装 MMEngine、MMCV、MMDetection 和 MMPose。
pip install -U openmim
mim install mmengine
mim install mmcv==2.1.0 # 好像不加版本要求后面会出错,可能是对不同cuda版本有不同要求
mim install mmdet
mim install mmpose
此处我们选择直接将mmaction2安装为python包(省的编译太费时间)(但是这样安装后面会有个问题,源码mmaction2/mmaction/models/localizers/drn目录好像并没有存在于安装的python包下,所以当终端报错找不到drn时直接cp -r把这个drn的目录复制过去即可解决问题)
pip install mmaction2
一般情况下,安装过程到此就完成了
数据导入和预处理
进入云服务器的code-server界面后,先拉mmaction2代码
git clone https://github.com/open-mmlab/mmaction2.git
tools/data下是各种数据集的获取方式,但是直接在这里获取的话因为网络原因应该会很慢,我们直接把tiny-kinetics-400导入进来
在项目根目录创建文件夹data/kinetics400,并将前面提到的数据集放进去并解压
mkdir data
mkdir data/kinetics400
#TODO: 上传数据集压缩文件
unzip tK-400.zip
将解压后的目录重命名并移动位置(train_256重命名为videos_train,val_256重命名为videos_val),最终是这样的目录结构,最上层的data目录位于mmaction2项目根目录
data
└── kinetics400
├── videos_train
│ ├── abseiling
│ │ └── _4YTwq0-73Y_000044_000054.mp4
│ ├── air_drumming
│ │ └── _axE99QAhe8_000026_000036.mp4
│ └── ...
│
└── videos_val
├── abseiling
├── air_drumming
└── ...
使用mmaction2自带的脚本进行抽帧(由于没有装denseflow,这里我用的是opencv抽帧),运行脚本自动抽帧
bash tools/data/kinetics/extract_rgb_frames_opencv.sh kinetics400
抽帧结束的目录结构大致是这样
data
└── kinetics400
├── rawframes_train
├── rawframes_val
├── videos_train
└── videos_val
生成train.txt和val.txt,这两个文件记录了每个视频帧存档的目录、帧数、对应的标签
我在data/kinetics
目录下创建了个gen_list.py文件,内容如下,参照这位老哥的博客kinetics数据集路径txt生成_wbiqb-CSDN博客
# gen_list.py
import os
import datetime
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('path', type=str, help='data_path of videos, absolute path')
parser.add_argument('outfile', type=str, help='output.txt file')
args = parser.parse_args()
start_time = datetime.datetime.now()
# get label dictionary
labels = [ i for i in os.listdir(args.path)]
labels.sort()
if '.DS_Store' in labels:
labels.remove('.DS_Store')
dic = {label:idx for (idx, label) in enumerate(labels)}
# get [video_path, num_of_frames, labels]
tt = 0
dirss = [i for i in os.listdir(args.path)]
dirss.sort()
# print(dirss)
record = []
dic_cor = 0
for dirs in dirss[:]:
if dirs == '.DS_Store':
continue
print(dic_cor)
dic_cor += 1
frames_path = []
for video in os.listdir(os.path.join(args.path, dirs)):
if video == '.DS_Store':
continue
# print(os.path.join(args.path, dirs, video))
frames_path = [i for i in os.listdir(os.path.join(args.path, dirs, video))]
frames_len = len(frames_path) - 2 if ".DS_Store" in frames_path else len(frames_path)-1
# print(dirs, video)
record.append([os.path.join(args.path, dirs, video), frames_len, dic[dirs]])
tt += 1
if tt % 10000 == 0:
print('record:', tt)
with open(args.outfile,"a") as f:
for i in range(len(record)):
rec = str(record[i][0] + ' ' + str(record[i][1]) + ' ' + str(record[i][2]) + '\n')
f.write(rec)
record = []
with open(args.outfile,"a") as f:
for i in range(len(record)):
rec = str(record[i][0] + ' ' + str(record[i][1]) + ' ' + str(record[i][2]) + '\n')
f.write(rec)
print("Run time:", datetime.datetime.now()-start_time)
运行脚本
python gen_list.py data/kinetics400/rawframes_train ./train.txt
python gen_list.py data/kinetics400/rawframes_val ./val.txt
最终的目录结构如下
data
└── kinetics400
├── gen_list.py
├── rawframes_train
├── rawframes_val
├── train.txt
├── val.txt
├── videos_train
└── videos_val
制作timesformer训练的配置文件
配置文件在configs/recognition/timesformer/
目录下,因为此处是抽帧后进行处理而不是直接处理视频,所以我们根据timesformer_spaceOnly_8xb8-8x32x1-15e_kinetics400-rgb.py
这个配置文件来修改,自定义配置文件暂时命名为timsformer_tiny_kinetics_400.py
修改后内容如下
_base_ = ['../../_base_/default_runtime.py']
# model settings
model = dict(
type='Recognizer3D',
backbone=dict(
type='TimeSformer',
pretrained= # noqa: E251
'https://download.openmmlab.com/mmaction/recognition/timesformer/vit_base_patch16_224.pth', # noqa: E501
num_frames=8,
img_size=224,
patch_size=16,
embed_dims=768,
in_channels=3,
dropout_ratio=0.,
transformer_layers=None,
attention_type='divided_space_time',
norm_cfg=dict(type='LN', eps=1e-6)),
cls_head=dict(
type='TimeSformerHead',
num_classes=400,
in_channels=768,
average_clips='prob'),
data_preprocessor=dict(
type='ActionDataPreprocessor',
mean=[127.5, 127.5, 127.5],
std=[127.5, 127.5, 127.5],
format_shape='NCTHW'))
# dataset settings
dataset_type = 'RawframeDataset'
data_root = 'data/kinetics400/rawframes_train'
data_root_val = 'data/kinetics400/rawframes_val'
ann_file_train = 'data/kinetics400/train.txt'
ann_file_val = 'data/kinetics400/val.txt'
ann_file_test = 'data/kinetics400/val.txt'
file_client_args = dict(io_backend='disk')
train_pipeline = [
# dict(type='DecordInit', **file_client_args),
dict(type='SampleFrames', clip_len=8, frame_interval=32, num_clips=1),
dict(type='RawFrameDecode'),
dict(type='RandomRescale', scale_range=(256, 320)),
dict(type='RandomCrop', size=224),
dict(type='Flip', flip_ratio=0.5),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='PackActionInputs')
]
val_pipeline = [
# dict(type='DecordInit', **file_client_args),
dict(
type='SampleFrames',
clip_len=8,
frame_interval=32,
num_clips=1,
test_mode=True),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='CenterCrop', crop_size=224),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='PackActionInputs')
]
test_pipeline = [
# dict(type='DecordInit', **file_client_args),
dict(
type='SampleFrames',
clip_len=8,
frame_interval=32,
num_clips=1,
test_mode=True),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 224)),
dict(type='ThreeCrop', crop_size=224),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='PackActionInputs')
]
train_dataloader = dict(
batch_size=8,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix=dict(img=data_root),
pipeline=train_pipeline))
val_dataloader = dict(
batch_size=8,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix=dict(img=data_root_val),
pipeline=val_pipeline,
test_mode=True))
test_dataloader = dict(
batch_size=1,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
ann_file=ann_file_test,
data_prefix=dict(img=data_root_val),
pipeline=test_pipeline,
test_mode=True))
val_evaluator = dict(type='AccMetric')
test_evaluator = val_evaluator
train_cfg = dict(
type='EpochBasedTrainLoop', max_epochs=15, val_begin=1, val_interval=1)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
optim_wrapper = dict(
optimizer=dict(
type='SGD', lr=0.005, momentum=0.9, weight_decay=1e-4, nesterov=True),
paramwise_cfg=dict(
custom_keys={
'.backbone.cls_token': dict(decay_mult=0.0),
'.backbone.pos_embed': dict(decay_mult=0.0),
'.backbone.time_embed': dict(decay_mult=0.0)
}),
clip_grad=dict(max_norm=40, norm_type=2))
param_scheduler = [
dict(
type='MultiStepLR',
begin=0,
end=15,
by_epoch=True,
milestones=[5, 10],
gamma=0.1)
]
default_hooks = dict(checkpoint=dict(interval=5))
# Default setting for scaling LR automatically
# - `enable` means enable scaling LR automatically
# or not by default.
# - `base_batch_size` = (8 GPUs) x (8 samples per GPU).
auto_scale_lr = dict(enable=False, base_batch_size=64)
训练模型
在项目根目录执行命令启动训练脚本
python tools/train.py configs/recognition/timesformer/timesformer_tiny_kinetics_400.py
训练过程中会在项目根目录的work_dir
目录下生成.pth
文件
测试模型
在项目根目录执行命令启动测试脚本
注:此处work_dirs/timesformer_tiny_kinetics_400/epoch_15.pth
是最后的训练结果
python tools/test.py configs/recognition/timesformer/timesformer_tiny_kinetics_400.py \ work_dirs/timesformer_tiny_kinetics_400/epoch_15.pth --dump result.pkl
测试结果位于work_dir目录下的日志中此处是位于(work_dirs/timesformer_tiny_kinetics_400/20241104_105423/20241104_105423.log
)(初次尝试,训练效果一般)
2024/11/04 10:54:43 - mmengine - INFO - Epoch(test) [ 20/400] eta: 0:01:38 time: 0.2584 data_time: 0.0413 memory: 689
2024/11/04 10:54:47 - mmengine - INFO - Epoch(test) [ 40/400] eta: 0:01:25 time: 0.2138 data_time: 0.0026 memory: 689
2024/11/04 10:54:51 - mmengine - INFO - Epoch(test) [ 60/400] eta: 0:01:17 time: 0.2144 data_time: 0.0027 memory: 689
2024/11/04 10:54:55 - mmengine - INFO - Epoch(test) [ 80/400] eta: 0:01:12 time: 0.2138 data_time: 0.0027 memory: 689
2024/11/04 10:55:00 - mmengine - INFO - Epoch(test) [100/400] eta: 0:01:06 time: 0.2139 data_time: 0.0028 memory: 689
2024/11/04 10:55:04 - mmengine - INFO - Epoch(test) [120/400] eta: 0:01:01 time: 0.2136 data_time: 0.0025 memory: 689
2024/11/04 10:55:08 - mmengine - INFO - Epoch(test) [140/400] eta: 0:00:57 time: 0.2139 data_time: 0.0025 memory: 689
2024/11/04 10:55:12 - mmengine - INFO - Epoch(test) [160/400] eta: 0:00:52 time: 0.2140 data_time: 0.0026 memory: 689
2024/11/04 10:55:17 - mmengine - INFO - Epoch(test) [180/400] eta: 0:00:48 time: 0.2142 data_time: 0.0027 memory: 689
2024/11/04 10:55:21 - mmengine - INFO - Epoch(test) [200/400] eta: 0:00:43 time: 0.2150 data_time: 0.0028 memory: 689
2024/11/04 10:55:25 - mmengine - INFO - Epoch(test) [220/400] eta: 0:00:39 time: 0.2146 data_time: 0.0028 memory: 689
2024/11/04 10:55:30 - mmengine - INFO - Epoch(test) [240/400] eta: 0:00:34 time: 0.2136 data_time: 0.0025 memory: 689
2024/11/04 10:55:34 - mmengine - INFO - Epoch(test) [260/400] eta: 0:00:30 time: 0.2138 data_time: 0.0026 memory: 689
2024/11/04 10:55:38 - mmengine - INFO - Epoch(test) [280/400] eta: 0:00:26 time: 0.2141 data_time: 0.0027 memory: 689
2024/11/04 10:55:43 - mmengine - INFO - Epoch(test) [300/400] eta: 0:00:21 time: 0.2146 data_time: 0.0029 memory: 689
2024/11/04 10:55:47 - mmengine - INFO - Epoch(test) [320/400] eta: 0:00:17 time: 0.2138 data_time: 0.0026 memory: 689
2024/11/04 10:55:51 - mmengine - INFO - Epoch(test) [340/400] eta: 0:00:13 time: 0.2142 data_time: 0.0027 memory: 689
2024/11/04 10:55:55 - mmengine - INFO - Epoch(test) [360/400] eta: 0:00:08 time: 0.2137 data_time: 0.0024 memory: 689
2024/11/04 10:56:00 - mmengine - INFO - Epoch(test) [380/400] eta: 0:00:04 time: 0.2139 data_time: 0.0028 memory: 689
2024/11/04 10:56:04 - mmengine - INFO - Epoch(test) [400/400] eta: 0:00:00 time: 0.2136 data_time: 0.0025 memory: 689
2024/11/04 10:56:04 - mmengine - INFO - Results has been saved to result.pkl.
2024/11/04 10:56:04 - mmengine - INFO - Epoch(test) [400/400] acc/top1: 0.2000 acc/top5: 0.4200 acc/mean1: 0.2000 data_time: 0.0046 time: 0.2162