Mask2Former

Mask2Former 模型是在 COCO 全景分割上训练的（base-sized 版本，使用 Swin backbone）。该模型在论文 Masked-attention Mask Transformer for Universal Image Segmentation 中被介绍，并于 this repository 首次发布。

免责声明：发布 Mask2Former 模型的团队没有为该模型编写模型卡片，因此该模型卡片是由 Hugging Face 团队编写的。

模型描述

Mask2Former 用相同的范式处理实例、语义和全景分割：通过预测一组掩模和相应的标签来实现。因此，所有的三个任务都被视为实例分割。Mask2Former 通过以下方式在性能和效率方面优于之前的 SOTA 模型 MaskFormer ：(i) 用更先进的多尺度可变形注意力 Transformer 替换像素解码器，(ii) 采用具有掩码注意力的 Transformer 解码器以提高性能，同时不引入额外的计算量，(iii) 通过在子采样点上计算损失而不是整个掩模来提高训练效率。

使用目的和限制

您可以使用此特定的检查点进行全景分割。请参阅 model hub ，以查找您感兴趣的任务的其他微调版本。

使用方法

这是如何使用此模型的方法：

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on COCO panoptic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-base-coco-panoptic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-base-coco-panoptic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_panoptic_map = result["segmentation"]

有关更多代码示例，请参考 documentation 。

作者:

Meta AI

数据集大小:

411.94 MB