Shortcuts

Image Generation

HCP-Diffusion generates images based on a workflow system. The workflow is defined in a configuration file, which is a standard Python (.py) file. This allows you to describe the image generation process programmatically. Using workflows, you can incorporate various operations into the generation process, such as super-resolution, localized editing, and more. You can even assign different prompts, CFG scales, or models to each step in the workflow.

# Run the workflow
hcp_run --cfg cfgs/workflow/text2img.yaml

Adjust Word Attention

Note

You can emphasize specific words or phrases in the prompt during image generation:

Format: {text_to_emphasize:multiplier}, with a default multiplier of 1.1.

Example: a {cat} running {in the {city}:1.2}

In this case:

  • “cat” is emphasized by 1.1x

  • “in the” is emphasized by 1.2x

  • “city” is emphasized by 1.2 * 1.1 = 1.32x

Basic Configuration Structure

The entry point for the workflow is the make_cfg function. The returned dictionary must contain a workflow key that defines the workflow.

Image Generation with Stable Diffusion

The file cfgs/workflow/easy/text2img.yaml provides a simplified configuration for image generation. With just a few parameter settings, you can generate images easily. However, it offers limited flexibility and fewer features.

Common configuration:

from hcpdiff.easy.cfg import SD15_t2i
from rainbowneko.parser import neko_cfg

@neko_cfg
def make_cfg():
    return SD15_t2i(
        pretrained_model='Lykon/DreamShaper',  # Path to the pretrained model
        prompt='masterpiece, best quality, 1girl, cat ears, outside',  # Positive prompt
        # negative_prompt='',  # Optional: Negative prompt
        bs=4,  # Batch size
        width=512,  # Image width
        height=512,  # Image height
        guidance_scale=7.0  # CFG guidance scale
    )

To change the sampler, you can set the noise_sampler parameter (default is dpmpp_2m_karras):

from hcpdiff.easy import Diffusers_SD

SD15_t2i(
        pretrained_model='Lykon/DreamShaper',
        prompt='masterpiece, best quality, 1girl, cat ears, outside',
        bs=4,
        width=512,
        height=512,
        guidance_scale=7.0,
        noise_sampler=Diffusers_SD.euler_a  # Replace sampler
    )

Available samplers:

Sampler Description
dpmpp_2m Fewer steps
dpmpp_2m_karras Fewer steps, high quality, commonly used
ddim More steps
euler
euler_a Common in anime-style, smoother output

Other configurable options:

from hcpdiff.easy.cfg import SD15_t2i
from rainbowneko.parser import neko_cfg

@neko_cfg
def make_cfg():
    return SD15_t2i(
        pretrained_model='Lykon/DreamShaper',
        prompt='masterpiece, best quality, 1girl, cat ears, outside',
        # negative_prompt='',
        bs=4,
        width=512,
        height=512,
        guidance_scale=7.0,

        seed=42,  # Set random seed
        N_steps=30,  # Number of sampling steps
        save_root='output_pipe/',  # Output directory
    )

The file cfgs/workflow/text2img.yaml provides a more flexible and feature-rich configuration for image generation. The text-to-image workflow typically consists of several modules. For more examples, refer to other files in the cfgs/workflow/ directory.

Model Loading
import torch
from rainbowneko.parser import neko_cfg
from hcpdiff.easy import Diffusers_SD, SD15_auto_loader
from rainbowneko.infer import Actions, PrepareAction
from hcpdiff.workflow import BuildModelsAction

@neko_cfg
def build_model(pretrained_model='ckpts/any5', noise_sampler=Diffusers_SD.dpmpp_2m_karras) -> Actions:
    return Actions([
        PrepareAction(device='cuda', dtype=torch.float16),  # Set device and precision
        BuildModelsAction(  # Build and load pretrained model
            model_loader=SD15_auto_loader(_partial_=True,
                ckpt_path=pretrained_model,
                noise_sampler=noise_sampler  # Set sampler (preset)
            )
        ),
    ])

Note

The noise_sampler here uses a preset configuration. Available presets:

Sampler Description
dpmpp_2m Fewer steps
dpmpp_2m_karras Fewer steps, high quality, commonly used
ddim More steps
euler
euler_a Common in anime-style, smoother output

For custom sampler configurations, use full setup like below:

from diffusers import DPMSolverMultistepScheduler
from hcpdiff.diffusion.sampler import DiffusersSampler

# Use a Diffusers sampler
noise_sampler=DiffusersSampler(
    DPMSolverMultistepScheduler(
        beta_start=0.00085,
        beta_end=0.012,
        beta_schedule='scaled_linear',
        algorithm_type='sde-dpmsolver++',
        use_karras_sigmas=True,
    )
)
Model Optimization
from hcpdiff.workflow import PrepareDiffusionAction, XformersEnableAction, VaeOptimizeAction

@neko_cfg
def optimize_model() -> Actions:
    return Actions([
        PrepareDiffusionAction(),  # Set model-specific parameters
        XformersEnableAction(),  # Enable xformers to optimize memory
        VaeOptimizeAction(slicing=True),  # VAE optimization to reduce memory usage
    ])
Text Encoding
from hcpdiff.workflow import TextHookAction, AttnMultTextEncodeAction

@neko_cfg
def text(prompt=prompt, negative_prompt=negative_prompt, bs=4, N_repeats=1, layer_skip=1) -> Actions:
    return Actions([
        # Advanced text encoder support
        # N_repeats: Extend max prompt length
        # layer_skip: Skip last N layers (clip skip), note: 0 here equals 1 in webui
        TextHookAction(N_repeats=N_repeats, layer_skip=layer_skip),
        # Text encoding with token weighting support
        AttnMultTextEncodeAction(
            prompt=prompt,  # Positive prompt
            negative_prompt=negative_prompt,  # Negative prompt
            bs=bs  # Batch size
        ),
    ])
Diffusion Configuration
from hcpdiff.workflow import SeedAction, MakeTimestepsAction, MakeLatentAction

@neko_cfg
def config_diffusion(seed=42, N_steps=20, width=512, height=512) -> Actions:
    return Actions([
        SeedAction(seed),  # Set random seed
        MakeTimestepsAction(N_steps=N_steps),  # Set number of sampling steps
        # MakeTimestepsAction(N_steps=N_steps, strength=0.6),  # Denoising strength (for img2img)
        MakeLatentAction(width=width, height=height)  # Set image dimensions
    ])
Image Generation
from rainbowneko.infer import LoopAction
from hcpdiff.workflow import DiffusionStepAction, time_iter

@neko_cfg
def diffusion(guidance_scale=7.0) -> Actions:
    return Actions([
        LoopAction(  # Loop through actions
            iterator=time_iter,  # Time step iterator [{'t':t} for t in timesteps]
            actions=[
                DiffusionStepAction(guidance_scale=guidance_scale)  # Perform one denoising step
            ]
        )
    ])
Image Decoding
from hcpdiff.workflow import DecodeAction, SaveImageAction

@neko_cfg
def decode() -> Actions:
    return Actions([
        DecodeAction(),  # Decode latent to image using VAE
        SaveImageAction(save_root='output_pipe/', image_type='png'),  # Save image
    ])

Supported Actions

TODO