模型:
timm/convnext_base.fb_in22k_ft_in1k
ConvNeXt图像分类模型。由论文作者在ImageNet-22k上进行预训练,并在ImageNet-1k上进行微调。
from urllib.request import urlopen from PIL import Image import timm img = Image.open(urlopen( 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png' )) model = timm.create_model('convnext_base.fb_in22k_ft_in1k', pretrained=True) model = model.eval() # get model specific transforms (normalization, resize) data_config = timm.data.resolve_model_data_config(model) transforms = timm.data.create_transform(**data_config, is_training=False) output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1 top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
from urllib.request import urlopen from PIL import Image import timm img = Image.open(urlopen( 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png' )) model = timm.create_model( 'convnext_base.fb_in22k_ft_in1k', pretrained=True, features_only=True, ) model = model.eval() # get model specific transforms (normalization, resize) data_config = timm.data.resolve_model_data_config(model) transforms = timm.data.create_transform(**data_config, is_training=False) output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1 for o in output: # print shape of each feature map in output # e.g.: # torch.Size([1, 128, 56, 56]) # torch.Size([1, 256, 28, 28]) # torch.Size([1, 512, 14, 14]) # torch.Size([1, 1024, 7, 7]) print(o.shape)
from urllib.request import urlopen from PIL import Image import timm img = Image.open(urlopen( 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png' )) model = timm.create_model( 'convnext_base.fb_in22k_ft_in1k', pretrained=True, num_classes=0, # remove classifier nn.Linear ) model = model.eval() # get model specific transforms (normalization, resize) data_config = timm.data.resolve_model_data_config(model) transforms = timm.data.create_transform(**data_config, is_training=False) output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor # or equivalently (without needing to set num_classes=0) output = model.forward_features(transforms(img).unsqueeze(0)) # output is unpooled, a (1, 1024, 7, 7) shaped tensor output = model.forward_head(output, pre_logits=True) # output is a (1, num_features) shaped tensor
在timm model results 中探索该模型的数据集和运行时指标。
所有时间数据来自于RTX 3090配备AMP的PyTorch 1.13的eager模型。
model | top1 | top5 | img_size | param_count | gmacs | macts | samples_per_sec | batch_size |
---|---|---|---|---|---|---|---|---|
1238321 | 88.848 | 98.742 | 512 | 660.29 | 600.81 | 413.07 | 28.58 | 48 |
1239321 | 88.668 | 98.738 | 384 | 660.29 | 337.96 | 232.35 | 50.56 | 64 |
12310321 | 88.612 | 98.704 | 256 | 846.47 | 198.09 | 124.45 | 122.45 | 256 |
12311321 | 88.312 | 98.578 | 384 | 200.13 | 101.11 | 126.74 | 196.84 | 256 |
12312321 | 88.196 | 98.532 | 384 | 197.96 | 101.1 | 126.74 | 128.94 | 128 |
12313321 | 87.968 | 98.47 | 320 | 200.13 | 70.21 | 88.02 | 283.42 | 256 |
12314321 | 87.75 | 98.556 | 384 | 350.2 | 179.2 | 168.99 | 124.85 | 192 |
12315321 | 87.646 | 98.422 | 384 | 88.72 | 45.21 | 84.49 | 209.51 | 256 |
12316321 | 87.476 | 98.382 | 384 | 197.77 | 101.1 | 126.74 | 194.66 | 256 |
12317321 | 87.344 | 98.218 | 256 | 200.13 | 44.94 | 56.33 | 438.08 | 256 |
12318321 | 87.26 | 98.248 | 224 | 197.96 | 34.4 | 43.13 | 376.84 | 256 |
12319321 | 87.138 | 98.212 | 384 | 88.59 | 45.21 | 84.49 | 365.47 | 256 |
12320321 | 87.002 | 98.208 | 224 | 350.2 | 60.98 | 57.5 | 368.01 | 256 |
12321321 | 86.796 | 98.264 | 384 | 88.59 | 45.21 | 84.49 | 366.54 | 256 |
12322321 | 86.74 | 98.022 | 224 | 88.72 | 15.38 | 28.75 | 624.23 | 256 |
12323321 | 86.636 | 98.028 | 224 | 197.77 | 34.4 | 43.13 | 581.43 | 256 |
12324321 | 86.504 | 97.97 | 384 | 88.59 | 45.21 | 84.49 | 368.14 | 256 |
12325321 | 86.344 | 97.97 | 256 | 88.59 | 20.09 | 37.55 | 816.14 | 256 |
12326321 | 86.256 | 97.75 | 224 | 660.29 | 115.0 | 79.07 | 154.72 | 256 |
12327321 | 86.182 | 97.92 | 384 | 50.22 | 25.58 | 63.37 | 516.19 | 256 |
12328321 | 86.154 | 97.68 | 256 | 88.59 | 20.09 | 37.55 | 819.86 | 256 |
12329321 | 85.822 | 97.866 | 224 | 88.59 | 15.38 | 28.75 | 1037.66 | 256 |
12330321 | 85.778 | 97.886 | 384 | 50.22 | 25.58 | 63.37 | 518.95 | 256 |
12331321 | 85.742 | 97.584 | 224 | 197.96 | 34.4 | 43.13 | 375.23 | 256 |
12332321 | 85.174 | 97.506 | 224 | 50.22 | 8.71 | 21.56 | 1474.31 | 256 |
12333321 | 85.118 | 97.608 | 384 | 28.59 | 13.14 | 39.48 | 856.76 | 256 |
12334321 | 85.112 | 97.63 | 384 | 28.64 | 13.14 | 39.48 | 491.32 | 256 |
12335321 | 84.874 | 97.09 | 224 | 88.72 | 15.38 | 28.75 | 625.33 | 256 |
12336321 | 84.562 | 97.394 | 224 | 50.22 | 8.71 | 21.56 | 1478.29 | 256 |
12337321 | 84.282 | 96.892 | 224 | 197.77 | 34.4 | 43.13 | 584.28 | 256 |
12338321 | 84.186 | 97.124 | 224 | 28.59 | 4.47 | 13.44 | 2433.7 | 256 |
12339321 | 84.084 | 97.14 | 384 | 28.59 | 13.14 | 39.48 | 862.95 | 256 |
12340321 | 83.894 | 96.964 | 224 | 28.64 | 4.47 | 13.44 | 1452.72 | 256 |
12341321 | 83.82 | 96.746 | 224 | 88.59 | 15.38 | 28.75 | 1054.0 | 256 |
12342321 | 83.37 | 96.742 | 384 | 15.62 | 7.22 | 24.61 | 801.72 | 256 |
12343321 | 83.142 | 96.434 | 224 | 50.22 | 8.71 | 21.56 | 1464.0 | 256 |
12344321 | 82.92 | 96.284 | 224 | 28.64 | 4.47 | 13.44 | 1425.62 | 256 |
12345321 | 82.898 | 96.616 | 224 | 28.59 | 4.47 | 13.44 | 2480.88 | 256 |
12346321 | 82.282 | 96.344 | 224 | 15.59 | 2.46 | 8.37 | 3926.52 | 256 |
12347321 | 82.216 | 95.852 | 224 | 28.59 | 4.47 | 13.44 | 2529.75 | 256 |
12348321 | 82.066 | 95.854 | 224 | 28.59 | 4.47 | 13.44 | 2346.26 | 256 |
12349321 | 82.03 | 96.166 | 224 | 15.62 | 2.46 | 8.37 | 2300.18 | 256 |
12350321 | 81.83 | 95.738 | 224 | 15.62 | 2.46 | 8.37 | 2321.48 | 256 |
12351321 | 80.866 | 95.246 | 224 | 15.65 | 2.65 | 9.38 | 3523.85 | 256 |
12352321 | 80.768 | 95.334 | 224 | 15.59 | 2.46 | 8.37 | 3915.58 | 256 |
12353321 | 80.304 | 95.072 | 224 | 9.07 | 1.37 | 6.1 | 3274.57 | 256 |
12354321 | 79.526 | 94.558 | 224 | 9.05 | 1.37 | 6.1 | 5686.88 | 256 |
12355321 | 79.522 | 94.692 | 224 | 9.06 | 1.43 | 6.5 | 5422.46 | 256 |
12356321 | 78.488 | 93.98 | 224 | 5.23 | 0.79 | 4.57 | 4264.2 | 256 |
12357321 | 77.86 | 93.83 | 224 | 5.23 | 0.82 | 4.87 | 6910.6 | 256 |
12358321 | 77.454 | 93.68 | 224 | 5.22 | 0.79 | 4.57 | 7189.92 | 256 |
12359321 | 76.664 | 93.044 | 224 | 3.71 | 0.55 | 3.81 | 4728.91 | 256 |
12360321 | 75.88 | 92.846 | 224 | 3.7 | 0.58 | 4.11 | 7963.16 | 256 |
12361321 | 75.664 | 92.9 | 224 | 3.7 | 0.55 | 3.81 | 8439.22 | 256 |
@article{liu2022convnet, author = {Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie}, title = {A ConvNet for the 2020s}, journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2022}, }
@misc{rw2019timm, author = {Ross Wightman}, title = {PyTorch Image Models}, year = {2019}, publisher = {GitHub}, journal = {GitHub repository}, doi = {10.5281/zenodo.4414861}, howpublished = {\url{https://github.com/huggingface/pytorch-image-models}} }