模型:
lllyasviel/control_v11p_sd15_lineart
Controlnet v1.1 由 Lvmin Zhang 于 lllyasviel/ControlNet-v1-1 发布。
此检查点是将 the original checkpoint 转换为扩散器格式的版本,可与稳定扩散如 runwayml/stable-diffusion-v1-5 结合使用。
更多详情,请参阅 ? Diffusers docs 。
ControlNet 是一个神经网络结构,用于通过添加额外条件来控制扩散模型。
此检查点对应于条件为线性图像的 ControlNet 版本。
开发者:Lvmin Zhang, Maneesh Agrawala
模型类型:基于扩散的文本到图像生成模型
语言:英语
许可证: Open RAIL M license 是由 BigScience 和 the RAIL Initiative 共同进行负责任 AI 许可领域研究修改的 The CreativeML OpenRAIL M license 。请参阅我们许可证的基础 the article about the BLOOM Open RAIL license 。
更多信息资源: GitHub Repository , Paper 。
引用方式:
@misc{zhang2023adding, title={Adding Conditional Control to Text-to-Image Diffusion Models}, author={Lvmin Zhang and Maneesh Agrawala}, year={2023}, eprint={2302.05543}, archivePrefix={arXiv}, primaryClass={cs.CV}}
Controlnet 是由 Lvmin Zhang, Maneesh Agrawala 在 Adding Conditional Control to Text-to-Image Diffusion Models 提出的。
摘要如下:
我们提出了一种名为 ControlNet 的神经网络结构,用于控制预训练的大型扩散模型,以支持额外的输入条件。ControlNet 在端到端的方式下学习任务特定的条件,并且即使训练数据集很小(< 50k),学习也很稳健。此外,训练 ControlNet 的速度与微调扩散模型的速度相同,并且可以在个人设备上进行训练。或者,如果有强大的计算集群可用,模型可以扩展到大量(百万到十亿)的数据。我们报告了像 Stable Diffusion 这样的大型扩散模型可以通过添加 ControlNets 来实现条件输入,如边缘图、分割图、关键点等。这可能丰富了控制大型扩散模型的方法并进一步促进相关应用。
建议使用基于 Stable Diffusion v1-5 训练的检查点 Stable Diffusion v1-5 。根据实验,检查点可以与其他扩散模型(如 dreamboothed 稳定扩散)一起使用。
注意:如果要处理图像以创建辅助条件,需要以下外部依赖项:
$ pip install controlnet_aux==0.3.0
$ pip install diffusers transformers accelerate
import torch import os from huggingface_hub import HfApi from pathlib import Path from diffusers.utils import load_image from PIL import Image import numpy as np from controlnet_aux import LineartDetector from diffusers import ( ControlNetModel, StableDiffusionControlNetPipeline, UniPCMultistepScheduler, ) checkpoint = "ControlNet-1-1-preview/control_v11p_sd15_lineart" image = load_image( "https://huggingface.co/ControlNet-1-1-preview/control_v11p_sd15_lineart/resolve/main/images/input.png" ) image = image.resize((512, 512)) prompt = "michael jackson concert" processor = LineartDetector.from_pretrained("lllyasviel/Annotators") control_image = processor(image) control_image.save("./images/control.png") controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16) pipe = StableDiffusionControlNetPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16 ) pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload() generator = torch.manual_seed(0) image = pipe(prompt, num_inference_steps=30, generator=generator, image=control_image).images[0] image.save('images/image_out.png')
作者发布了14个不同的检查点,每个检查点都使用 Stable Diffusion v1-5 在不同类型的条件下进行训练:
Model Name | Control Image Overview | Condition Image | Control Image Example | Generated Image Example |
---|---|---|---|---|
12319321 | Trained with canny edge detection | A monochrome image with white edges on a black background. | 12320321 12321321||
12322321 | Trained with pixel to pixel instruction | No condition . | 12323321 12324321||
12325321 | Trained with image inpainting | No condition. | 12326321 12327321||
12328321 | Trained with multi-level line segment detection | An image with annotated line segments. | 12329321 12330321||
12331321 | Trained with depth estimation | An image with depth information, usually represented as a grayscale image. | 12332321 12333321||
12334321 | Trained with surface normal estimation | An image with surface normal information, usually represented as a color-coded image. | 12335321 12336321||
12337321 | Trained with image segmentation | An image with segmented regions, usually represented as a color-coded image. | 12338321 12339321||
12340321 | Trained with line art generation | An image with line art, usually black lines on a white background. | 12341321 12342321||
12343321 | Trained with anime line art generation | An image with anime-style line art. | 12344321 12345321||
12346321 | Trained with human pose estimation | An image with human poses, usually represented as a set of keypoints or skeletons. | 12347321 12348321||
12349321 | Trained with scribble-based image generation | An image with scribbles, usually random or user-drawn strokes. | 12350321 12351321||
12352321 | Trained with soft edge image generation | An image with soft edges, usually to create a more painterly or artistic effect. | 12353321 12354321||
12355321 | Trained with image shuffling | An image with shuffled patches or regions. | 12356321 12357321||
12358321 | Trained with image tiling | A blurry image or part of an image . | 12359321 12360321
有关更多信息,请参阅 Diffusers ControlNet Blog Post 并查看 official docs 。