Training¶
HCP-Diffusion supports configuring various components used in different training stages through Python configuration files.
These include model architectures, training parameters and strategies, dataset configurations, and more.
Basic Training Configuration¶
Basic training configuration files and examples can be found in the cfgs/train directory.
All training configuration files should inherit both train_base.py and tuning_base.py.
train_base.py: Defines the hyperparameters and dataset configurations required during training.tuning_base.py: Defines the model architecture and training parameters used during training, specifying which model parameters and plugins to train, and how to apply LoRA to various layers.
hcp_train_1gpu --cfg cfgs/train/py/config_file.py
You can override values in the configuration file via CLI:
hcp_train_1gpu --cfg cfgs/train/py/config_file.py model.wrapper.models.ckpt_path=pretrained_model_path data_train.dataset1.batch_size=8
For multi-GPU training, specify the GPU IDs and number of GPUs in cfgs/launcher/multi.yaml, then run:
hcp_train --cfg cfgs/train/py/config_file.py
You can override values in the configuration file via CLI:
hcp_train --cfg cfgs/train/py/config_file.py model.wrapper.models.ckpt_path=pretrained_model_path data_train.dataset1.batch_size=8
Model Configuration¶
Base Model Configuration¶
The base model is configured in the model section. The structure is defined under model.wrapper, where you can initialize a model, such as loading a pretrained SD1.5 model:
from hcpdiff.models import SD15Wrapper
from hcpdiff.easy import SD15_auto_loader
wrapper=SD15Wrapper.from_pretrained( # Model wrapper
_partial_=True,
models=SD15_auto_loader(ckpt_path='Lykon/DreamShaper', _partial_=True), # Simplified pretrained model loader
),
Note
You can also load the model using lower-level APIs. See Model Format Documentation for details.
Replacing VAE¶
You can specify a different VAE module in the wrapper:
from hcpdiff.models import SD15Wrapper
from hcpdiff.easy import SD15_auto_loader
from diffusers import AutoencoderKL
wrapper=SD15Wrapper.from_pretrained( # Model wrapper
_partial_=True,
models=SD15_auto_loader(
ckpt_path='Lykon/DreamShaper',
vae=AutoencoderKL.from_pretrained('vae/'),
_partial_=True
),
),
You can also use a single VAE checkpoint file:
from hcpdiff.models import SD15Wrapper
from hcpdiff.easy import SD15_auto_loader
from diffusers import AutoencoderKL
wrapper=SD15Wrapper.from_pretrained( # Model wrapper
_partial_=True,
models=SD15_auto_loader(
ckpt_path='Lykon/DreamShaper',
vae=AutoencoderKL.from_single_file('vae.ckpt'),
_partial_=True
),
),
Dataset Configuration¶
HCP-Diffusion supports multiple parallel datasets. For each training step, one batch is sampled from each dataset independently for forward and backward propagation. Their gradients are then summed.
Because each dataset is processed independently, image sizes and formats may differ. You can adjust the proportion and weight of each dataset using batch_size and loss_weight.
Example:
from hcpdiff.data import TextImagePairDataset
data_train=dict(
dataset1=TextImagePairDataset(_partial_=True, batch_size=4, loss_weight=1.0,
...
),
dataset2=TextImagePairDataset(_partial_=True, batch_size=1, loss_weight=1.0,
...
),
)
Tip
If the importance of dataset1 and dataset2 is in ratio a:b, then they should satisfy:
\(\frac{batch\_size_1 \times loss\_weight_1}{batch\_size_2 \times loss\_weight_2} = \frac{a}{b}\)
For a detailed explanation, see RainbowNeko Engine Dataset Configuration
Data Sources¶
Each dataset can define multiple data sources. All sources within a dataset will be bucketed, shuffled, and processed together—effectively merging them.
Example:
from hcpdiff.data import TextImagePairDataset, Text2ImageSource
dataset1=TextImagePairDataset(_partial_=True, batch_size=4, loss_weight=1.0,
source=dict(
data_source1=Text2ImageSource(
img_root='imgs1/',
label_file='${.img_root}', # Label file path (same as image root)
prompt_template='prompt_template/caption.txt', # Prompt template
repeat=1, # Repeat the dataset N times to adjust its weight
),
data_source2=Text2ImageSource(
img_root='imgs2/',
label_file='${.img_root}',
prompt_template='prompt_template/caption.txt',
),
),
...
)
See more at RainbowNeko Engine Data Source Configuration
Buckets¶
Buckets group images with similar characteristics into the same batch.
Supported bucket types include:
FixedBucket: Resize and crop all images to a fixed resolution.
from rainbowneko.data import FixedBucket
FixedBucket(
target_size=[512, 512] # Use a fixed resolution of 512x512
)
RatioBucket (ARB): Group images by aspect ratio. Each batch can have a different ratio.
From ratios:
from rainbowneko.data import RatioBucket RatioBucket.from_ratios( target_area=512*512, num_bucket=6, # Optional: step_size=8, # Step size for bucket ratio_max=4, # Maximum aspect ratio pre_build_bucket='path.pkl', # Save buckets for reuse )
From files:
from rainbowneko.data import RatioBucket RatioBucket.from_files( target_area=512*512, num_bucket=6, step_size=8, pre_build_bucket='path.pkl', )
SizeBucket: Group images by resolution. Each batch may have different sizes.
From files:
from rainbowneko.data import SizeBucket SizeBucket.from_files( num_bucket=6, step_size=8, pre_build_bucket='path.pkl', )
LongEdgeBucket: Resize images by the long edge, then group by resolution.
From files:
from rainbowneko.data import LongEdgeBucket LongEdgeBucket.from_files( target_edge=800, num_bucket=6, step_size=8, pre_build_bucket='path.pkl', )
Advanced Dataset Configurations¶
Adding a Regularization Dataset
Use a regularization dataset for DreamBooth, or to help the model retain its original generative ability when learning from self-generated images.
First, prepare a prompt dataset and generate images using the workflow:
hcp_run --cfg cfgs/workflow/text2img_dataset.py
Specify the model and prompt dataset in the configuration file.
Note
The prompt dataset format is the same as a regular dataset, just without images.
Using Prompt Templates
Prompt templates allow placeholders to be replaced by specified text during training.
For example:
a photo of a {pt1} on the {pt2}, {caption}
Here, {pt1} and {pt2} will be replaced by TemplateFillHandler with the specified words, which can be either pretrained tokens or custom embeddings.
Example:
from hcpdiff.data.handler import TemplateFillHandler
TemplateFillHandler(
word_names={
'pt1': 'my-cat',
'pt2': 'sofa',
},
)
Important
Recommended: Use the simplified Stable Diffusion handler:
from hcpdiff.data import StableDiffusionHandler
handler=StableDiffusionHandler(
bucket=RatioBucket,
word_names={
'pt1': 'my-cat',
'pt2': 'sofa',
},
erase=0,
),
During training, {pt1} will be replaced with the embedding for my-cat, {pt2} with sofa, and {caption} with the image caption (if available).
Fine-tuning Configuration¶
Fine-tuning can be applied to various components, commonly UNet and text encoder. You can assign different learning rates to different parts.
Example:
from rainbowneko.parser import CfgWDModelParser
model_part=CfgWDModelParser([
dict(
lr=1e-5,
layers=['denoiser'], # Train U-Net
# layers=['TE'], # Or train TextEncoder
)
],
weight_decay=1e-2 # Default weight decay
),
Note
The layer names should match the module paths in model.named_modules() (PyTorch format). Regex can be used, e.g., re:denoiser\..*\.attn1$ for all self-attention layers in UNet.
Model configuration follows the RainbowNeko Engine system. See RainbowNeko Engine Configuration
Training Multiple Parts with Different Learning Rates
Same learning rate for UNet and text encoder:
from rainbowneko.parser import CfgWDModelParser
model_part=CfgWDModelParser([
dict(
lr=1e-5,
layers=[
'denoiser',
'TE',
],
),
],
weight_decay=1e-2
),
Different learning rates:
from rainbowneko.parser import CfgWDModelParser
model_part=CfgWDModelParser([
dict(
lr=1e-5,
layers=['denoiser'],
),
dict(
lr=2e-6,
layers=['TE'],
)
],
weight_decay=1e-2
),
Prompt-tuning Configuration¶
Prompt-tuning trains word embeddings, with each embedding possibly covering multiple token positions.
First, create the custom word:
python -m hcpdiff.tools.create_embedding pretrained_model_path word_name token_length [--init_text initial_word]
# Random init: --init_text *[std, token_len]
# Partial init: --init_text cat, *[std, token_len], tail
Specify the word to train with emb_pt:
from hcpdiff.parser import CfgEmbPTParser
emb_pt=CfgEmbPTParser(
emb_dir='embs/',
cfg_pt={
'pt-paimeng': dict(lr=0.003, weight_decay=1e-2)
}
),
LoRA Training Configuration¶
LoRA can be applied to any Linear or Conv2d layer.
The configuration is similar to fine-tuning, but LoRA is added as a plugin:
from rainbowneko.parser import CfgWDPluginParser
from hcpdiff.models.lora_layers_patch import LoraLayer
model_plugin=CfgWDPluginParser(cfg_plugin=dict(
lora1=LoraLayer.wrap_model(
_partial_=True, # Required
lr=1e-4, # Learning rate for this plugin
rank=4, # LoRA dimension
alpha=2, # LoRA weight
layers=[
're:denoiser.*\.attn.?$', # Attention layers
're:denoiser.*\.ff$', # FeedForward layers
]
)
), weight_decay=0.1), # Default weight decay for all plugins
Advanced Configuration¶
Loss Weight Mask (Assigning different importance to image regions)

When training data is limited, the model may struggle to learn important features.
You can use a loss mask to guide the model to focus more (or less) on specific areas during training.
Loss masks and original images should be placed in separate folders with the same filenames.
The grayscale brightness of the mask determines the attention multiplier:
| Brightness | 0% | 25% | 50% | 75% | 100% |
|---|---|---|---|---|---|
| Multiplier | 0% | 50% | 100% | 300% | 500% |
CLIP Skip¶
Some models skip a few CLIP blocks during training.
Set the clip_skip value in the TE_hook_cfg parameter under model.wrapper.
Default is 0 (equivalent to clip skip=1 in webui), meaning no blocks are skipped.
Tip
To skip one block:
TE_hook_cfg=TEHookCFG(clip_skip=1)