模型:

cerspense/zeroscope_v2_XL

中文

example outputs (courtesy of dotsimulate )

zeroscope_v2 XL

A watermark-free Modelscope-based video model capable of generating high quality video at 1024 x 576. This model was trained from the original weights with offset noise using 9,923 clips and 29,769 tagged frames at 24 frames, 1024x576 resolution. zeroscope_v2_XL is specifically designed for upscaling content made with zeroscope_v2_576w using vid2vid in the 1111 text2video extension by kabachuha . Leveraging this model as an upscaler allows for superior overall compositions at higher resolutions, permitting faster exploration in 576x320 (or 448x256) before transitioning to a high-resolution render.

zeroscope_v2_XL uses 15.3gb of vram when rendering 30 frames at 1024x576

Using it with the 1111 text2video extension

  • Download files in the zs2_XL folder.
  • Replace the respective files in the 'stable-diffusion-webui\models\ModelScope\t2v' directory.
  • Upscaling recommendations

    For upscaling, it's recommended to use the 1111 extension. It works best at 1024x576 with a denoise strength between 0.66 and 0.85. Remember to use the same prompt that was used to generate the original clip.

    Usage in ? Diffusers

    Let's first install the libraries required:

    $ pip install git+https://github.com/huggingface/diffusers.git
    $ pip install transformers accelerate torch
    

    Now, let's first generate a low resolution video using cerspense/zeroscope_v2_576w .

    import torch
    from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
    from diffusers.utils import export_to_video
    
    pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_576w", torch_dtype=torch.float16)
    pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
    pipe.enable_model_cpu_offload()
    pipe.enable_vae_slicing()
    pipe.unet.enable_forward_chunking(chunk_size=1, dim=1) # disable if enough memory as this slows down significantly
    
    prompt = "Darth Vader is surfing on waves"
    video_frames = pipe(prompt, num_inference_steps=40, height=320, width=576, num_frames=36).frames
    video_path = export_to_video(video_frames)
    

    Next, we can upscale it using cerspense/zeroscope_v2_XL .

    pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_XL", torch_dtype=torch.float16)
    pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
    pipe.enable_model_cpu_offload()
    pipe.enable_vae_slicing()
    
    video = [Image.fromarray(frame).resize((1024, 576)) for frame in video_frames]
    
    video_frames = pipe(prompt, video=video, strength=0.6).frames
    video_path = export_to_video(video_frames, output_video_path="/home/patrick/videos/video_1024_darth_vader_36.mp4")
    

    Here are some results:

    Darth vader is surfing on waves.

    Known issues

    Rendering at lower resolutions or fewer than 24 frames could lead to suboptimal outputs.

    Thanks to camenduru , kabachuha , ExponentialML , dotsimulate , VANYA , polyware , tin2tin