Image Generation¶

HCP-Diffusion generates images based on a workflow system. The workflow is defined in a configuration file, which is a standard Python (.py) file. This allows you to describe the image generation process programmatically. Using workflows, you can incorporate various operations into the generation process, such as super-resolution, localized editing, and more. You can even assign different prompts, CFG scales, or models to each step in the workflow.

# Run the workflow
hcp_run --cfg cfgs/workflow/text2img.yaml

Adjust Word Attention¶

Note

You can emphasize specific words or phrases in the prompt during image generation:

Format: {text_to_emphasize:multiplier}, with a default multiplier of 1.1.

Example: a {cat} running {in the {city}:1.2}

In this case:

“cat” is emphasized by 1.1x
“in the” is emphasized by 1.2x
“city” is emphasized by 1.2 * 1.1 = 1.32x

Basic Configuration Structure¶

The entry point for the workflow is the make_cfg function. The returned dictionary must contain a workflow key that defines the workflow.

Image Generation with Stable Diffusion¶

Simplified Configuration (Beginner-Friendly)

The file cfgs/workflow/easy/text2img.yaml provides a simplified configuration for image generation. With just a few parameter settings, you can generate images easily. However, it offers limited flexibility and fewer features.

Common configuration:

from hcpdiff.easy.cfg import SD15_t2i
from rainbowneko.parser import neko_cfg

@neko_cfg
def make_cfg():
    return SD15_t2i(
        pretrained_model='Lykon/DreamShaper',  # Path to the pretrained model
        prompt='masterpiece, best quality, 1girl, cat ears, outside',  # Positive prompt
        # negative_prompt='',  # Optional: Negative prompt
        bs=4,  # Batch size
        width=512,  # Image width
        height=512,  # Image height
        guidance_scale=7.0  # CFG guidance scale
    )

To change the sampler, you can set the noise_sampler parameter (default is dpmpp_2m_karras):

from hcpdiff.easy import Diffusers_SD

SD15_t2i(
        pretrained_model='Lykon/DreamShaper',
        prompt='masterpiece, best quality, 1girl, cat ears, outside',
        bs=4,
        width=512,
        height=512,
        guidance_scale=7.0,
        noise_sampler=Diffusers_SD.euler_a  # Replace sampler
    )

Available samplers:

Sampler	Description
dpmpp_2m	Fewer steps
dpmpp_2m_karras	Fewer steps, high quality, commonly used
ddim	More steps
euler
euler_a	Common in anime-style, smoother output

Other configurable options:

from hcpdiff.easy.cfg import SD15_t2i
from rainbowneko.parser import neko_cfg

@neko_cfg
def make_cfg():
    return SD15_t2i(
        pretrained_model='Lykon/DreamShaper',
        prompt='masterpiece, best quality, 1girl, cat ears, outside',
        # negative_prompt='',
        bs=4,
        width=512,
        height=512,
        guidance_scale=7.0,

        seed=42,  # Set random seed
        N_steps=30,  # Number of sampling steps
        save_root='output_pipe/',  # Output directory
    )

Full Configuration (For Advanced Users)

The file cfgs/workflow/text2img.yaml provides a more flexible and feature-rich configuration for image generation. The text-to-image workflow typically consists of several modules. For more examples, refer to other files in the cfgs/workflow/ directory.

Model Loading

import torch
from rainbowneko.parser import neko_cfg
from hcpdiff.easy import Diffusers_SD, SD15_auto_loader
from rainbowneko.infer import Actions, PrepareAction
from hcpdiff.workflow import BuildModelsAction

@neko_cfg
def build_model(pretrained_model='ckpts/any5', noise_sampler=Diffusers_SD.dpmpp_2m_karras) -> Actions:
    return Actions([
        PrepareAction(device='cuda', dtype=torch.float16),  # Set device and precision
        BuildModelsAction(  # Build and load pretrained model
            model_loader=SD15_auto_loader(_partial_=True,
                ckpt_path=pretrained_model,
                noise_sampler=noise_sampler  # Set sampler (preset)
            )
        ),
    ])

Note

The noise_sampler here uses a preset configuration. Available presets:

Sampler	Description
dpmpp_2m	Fewer steps
dpmpp_2m_karras	Fewer steps, high quality, commonly used
ddim	More steps
euler
euler_a	Common in anime-style, smoother output

For custom sampler configurations, use full setup like below:

from diffusers import DPMSolverMultistepScheduler
from hcpdiff.diffusion.sampler import DiffusersSampler

# Use a Diffusers sampler
noise_sampler=DiffusersSampler(
    DPMSolverMultistepScheduler(
        beta_start=0.00085,
        beta_end=0.012,
        beta_schedule='scaled_linear',
        algorithm_type='sde-dpmsolver++',
        use_karras_sigmas=True,
    )
)

Supported Actions¶

TODO