在Amazon SageMaker 上玩Stable Diffusion: 基於Dreambooth 的模型微調- 人工智慧

本文將以 Stable Diffusion Quick Kit 為例，詳細解說如何利用 Dreambooth 對 Stable Diffusion 模型進行微調，包括基礎的Stable Diffusion 模型微調知識，Dreambooth 微調介紹，並且使用Quick Kit 透過一個demo 演示微調效果。

亞馬遜雲端科技官網：https://www.amazonaws.cn

亞馬遜雲端海外官網：https://aws.amazon.com/cn/

Stable Diffusion 模型微調

目前Stable Diffusion 模型微調主要有4種方式：Dreambooth、LoRA (Low-Rank Adaptation of Large Language Models)、Textual Inversion、Hypernetworks。

它們的差異大致如下:

Textual Inversion (也稱為Embedding)，它實際上並沒有修改原始的Diffusion 模型，而是透過深度學習找到了和你想要的形像一致的角色形象特徵參數，透過這個小模型保存下來。這意味著，如果原始模型裡面這方面的訓練缺失的，其實你很難通過嵌入讓它“學會”，它並不能教會Diffusion 模型渲染其沒有見過的圖像內容。

Dreambooth 是對整個神經網路所有層權重進行調整，會將輸入的圖像訓練進Stable Diffusion 模型，它的本質是先複製了源模型，在源模型的基礎上做了微調（fine tunning）並獨立形成了一個新模型，在它的基本上可以做任何事情。缺點是，訓練它需要大量VRAM，目前經過調優後可以在16GB 顯存下完成訓練。

LoRA 也是使用少量圖片，但是它是訓練單獨的特定網絡層的權重，是向原有的模型中插入新的網絡層，這樣就避免了去修改原有的模型參數，從而避免將整個模型進行拷貝的情況，同時其也優化了插入層的參數量，最終實現了一種很輕量化的模型調校方法。 LoRA 產生的模型較小、訓練速度快，推理時需要LoRA 模型+基礎模型，LoRA 模型會取代基礎模型的特定網路層，所以它的效果會依賴基礎模型。

Hypernetworks 的訓練原理與LoRA 差不多，目前並沒有官方的文檔說明，與LoRA 不同的是，Hypernetwork 是一個單獨的神經網路模型，該模型用於輸出可以插入到原始Diffusion 模型的中間層。因此透過訓練，我們將得到一個新的神經網路模型，該模型能夠在原始Diffusion 模型中插入合適的中間層及對應的參數，使輸出影像與輸入指令之間產生關聯關係。

在Amazon SageMaker 上玩Stable Diffusion: 基於Dreambooth 的模型微調

註: 圖片來自網路資料

什麼是Dreambooth

。

DreamBooth 演算法對Imagen 模型進行了微調，從而實現了將現實物體在圖像中真實還原的功能，透過少量實體物品圖像的fine-turning，使得原有的SD 模型能對圖像實體記憶保真，識別文本中該實體在原始圖像中的主體特徵甚至主題風格，是一種新的文本到圖像「個性化」（可適應用戶特定的圖像生成需求）。

Dreambooth fine tuning 面臨的問題與挑戰

Dreambooth fine tuning 的原理，是透過少量輸入圖片，並且透過instance_prompt 定義實體主體（eg toy cat/隆美爾）和instance images 的fine tuning 映像，以及提供一個定義場景或主題schema（eg 卡通，油畫原風格）class prevision 的prompt 和 class class 主題，pixd的prompt 與之綁定，以便後續產生的圖片中只要有instance 的prompt 中的關鍵字token，就保持輸入instance 圖片裡面的主體實體，並且保持該class presevation 的圖形定義的主題風格。

目前業界對DreamBooth 做fine tuning 主要為兩種方式：

一是在Stable Diffusion WebUI 可視話介面進行模型的選擇，訓練圖片的上載及本地化的訓練；

二是在第三方IDE 平台如colab notebook 上以腳本互動式開發的方式進行訓練。

第一種方式只能在部署Stable Diffusion WebUI 應用的單一伺服器或主機上訓練，無法與企業及客戶的後台平台及業務整合；而第二種方式著重於演算法工程師個人在開發測試階段進行模型實驗探索，無法實現生產化工程化的部署。此外，以上兩種方式訓練Dreambooth，還需要關注高性能算力機資源的成本（尤其對模型效果要求較高的場景，需要多達50張以上的class images，顯存容易OOM）、基礎模型和fine tuning 後模型的存儲和管理、訓練超參的管理、統一的監控、訓練加速、依賴lib 編譯打包等具體落地的打包等具體落實的困難實施。

使用SageMaker Training Job 進行Dreambooth fine tuning

Amazon SageMaker 是一個一站式的機器學習整合開發平台，提供了廣泛的功能來幫助使用者輕鬆建置、訓練和部署機器學習模型。在training job 層面，SageMaker 可以拉取V100、A100、T4 等各種類型 GPU 優化的算力機資源，透過BYOC (Bring Your Own Container)，BYOS (Bring Your Own Script) 等方式，讓使用者可以使用自己的訓練腳本或自訂容器鏡像、靈活控制訓練過程並使用自己的資料預處理和模型評估方法。此外，還可以透過自動超參數優化功能、分散式訓練等advance 的功能，從而使得用戶能夠在SageMaker 中使用特定的框架和lib 庫，靈活性和可定制性的進行Dreambooth 模型的fine tuning 和調優，消除WebUI 及notebook 本地環境的局限，並生產業務系統集成，實現和生產業務系統集成，實現和生產業務系統集成，並實現工程化。

以下詳細介紹了在Amazon SageMaker 上，使用BYOC 模式的training Job，進行Dreambooth fine tuning 的方式方法，並針對Dreambooth 訓練過程的顯存開銷、模型管理、超參等進行了優化實踐，從而實現用戶在自己的ML 平台或業務系統的的工程化落地，並降低訓練的整體工程化。

Dreambooth fine tuning on SageMaker 技術方案

我們從模型拉取、訓練圖像輸入、模型輸出、訓練任務類型幾個面向講解Dreambooth 在SageMaker 上fine tuning 的技術實現：

模型拉取

Amazon 與HuggingFace 有策略性合作關係，因此在SageMaker 的training job 中，我們可以透過一個diffuser 的pipeline api，透過一個pretrained_model_name_or_path 超參變量，傳入標準huggingface model url 格式的模型id（例如runwayml 超參變量，傳入標準huggingface model url 格式的模型！ /opt/ml/model/stable-diffusion-v1.5/)，SageMaker 會自動拉取Huggingface 上的model，不需要註冊帳號及傳入token 認證，程式碼範例如下：

model_dir='/opt/ml/input/fineturned_model/' model = StableDiffusionPipeline.from_pretrained( model_dir, scheduler = DPMSolverMultistepScheduler.from_pretrained(model_dir, subder="="dir), 長度(Deduchs)d

訓練影像輸入

對於用於fine tuning 的輸入影像，SageMaker training job 提供方便的訓練資料輸入的方法，透過inputs 參數，可以以字典方式設定輸入影像的channel 的名字（如：images），輸入影像在S3 的儲存路徑做成value，則SageMaker 訓練任務時，會以將影像從Svalue目錄下，程式碼範例如下：

images_s3uri = 's3://{0}/dreambooth/images/'.format(bucket) inputs = { 'images': images_s3uri } estimator = Estimator( role = role, instance_count=1, instance_type = 智慧_ environment ) estimator.fit(inputs)

模型輸出

trainning 之後，SageMaker 預設會將模型檔案打包為model.tar.gz，並上傳到S3 上以trainning job 命名的子目錄，客戶的生產系統可以直接透過API 取得該路徑位置，從而方便實現模型管理和後續推理部署，如下所示：

訓練方式

Amazon SageMaker 支援BYOS、BYOC 兩種模式進行模型訓練，對於Dreambooth 的模型訓練，因為涉及diffuser、huggingface、accelerate、xformers 等眾多依賴的安裝部署，且如xformers、accelerate 一類的開源lib 在各種GPU 機型，各種運算元、cudnn 版本下有使用方法，因此應用程式設計工具方式，基於官方預置的Pytorch、cuda、torchversion 等基礎鏡像，再透過原始碼編譯打包方式安裝xformers 等所需的lib，擴展為客戶自己生產上的Dreambooth 訓練容器鏡像。

注意xformers 在Amazon G4dn，G5 上的編譯安裝，需要cuda 11.7，torch 1.13以上版本，且CUDA_ARCH_LIST 算力參數需要設定為8.0以上，否則編譯會報此型別GPU 算力不支援。

編譯打包的 docker file 參考如下：

FROM pytorch/pytorch:1.13.0-cuda11.6-cudnn8-runtime ENV PATH="/opt/ml/code:${PATH}" ENV DEBIAN_FRONTEND noninteractive RUN apt-get update RUN -get install install vim RUN apt install wget git -y RUN apt install libgl1-mesa-glx -y RUN pip install opencv-python-headless RUN mkdir -p /opt/ml/code RUN pip3 PYll sagemaker- mkdir -p /opt/ml/code RUN pip3 PYll sagemaker- mkdir -p /opt/ml/code RUN pip3 PYll sagemaker-training / /opt/ml/code/ RUN pip install -r /opt/ml/code/extensions/sd_dreambooth_extension/requirements.txt ENV SAGEMAKER_PROGRAM train.py RUN export TORCH_CUDA_ARCH_LIST="7.5 8.0 8.6 &CU&Al. triton==2.0.0.dev20221120 && git clone https://github.com/xieyongliang/xformers.git /opt/ml/code/repositories/xformers && cd /opt/ml pip install -e . ENTRYPOINT []

打包後push 到Amazon ECR 鏡像repository 的腳本參考如下：

algorithm_name=dreambooth-finetuning-v3 account=$(aws sts get-caller-identity --query Account --output text) # Get the region defined in the current configuration (default to us-west-2 if none defined in the current configuration (default to us-west-2 if none defined) if fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest" # If the repository doesn't exist in ECR, create it。 /dev/null 2>&1 if [ $? -ne 0 ] then aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null fi # Log into Docker pwd=$-c3 ecion 1456 月 =1TP4856 月 375 月 4 月2375 月)> --username AWS -p ${pwd} ${account}.dkr.ecr.${region}.amazonaws.com # Build the docker image locally with the image name and then push it to ECR docker image locally with the image name and then push it to ECR #ions - full name. ./sd_code/extensions/ && git clone https://github.com/qingyuan18/sd_dreambooth_extension.git cd ../../ docker build -t ${algorithm_name} ./ -f ./dockerfile_v3 > /{algorithm_name} ./ -f ./dockerfile_v3c./docker ${fullname} docker push ${fullname} rm -rf ./sd_code

Dreambooth on SageMaker fine tuning 最佳化

從WebUI 插件剝離

DreamBooth 最早是由 Google 文獻資料，在HuggingFace 的colab notebook 範例程式碼上開源，詳見 github 上相關資料，後續有眾多folk 並基於該版本的擴展和更新，目前最完善的一個版本是做成stable diffusion WebUI 插件的開源腳本，該插件封裝了更多的操控訓練的超參和優化手段，可以集成lora 權重，並支持WebUI 需要的checkpoint 的格式，詳見 github 上sd_extentions 的程式碼。

github 上sd_extentions 的程式碼：

https://github.com/d8ahazard/sd_dreambooth_extension

如上文所述，SD WebUI 無法和後端業務系統整合，因此我們需要將其從WebUI 插件方式剝離，根據基礎模型、輸入圖像、instance prompt、class prompt 等標準輸入和fine tuning 後模型輸出，獨立封裝成單獨的模型訓練程序。

要從WebUI 插件剝離，主要以下幾方面需要處理：

插件程式碼有各種WebUI 前端介面元件綁定的操作及資料互動耦合在一起，如原始程式碼的shared，保存的是web 頁面的輸入的各種訓練參數。

if shared.force_cpu: import modules.shared no_safe = modules.shared.cmd_opts.disable_safe_unpickle modules.shared.cmd_opts.disable_safe_unpickle = True

mytqdm 類，為web 頁面提示進度條相關狀態資訊：

from helpers.mytqdm import mytqdm

這類程式碼這在工程化的後台訓練job 中是不必要的，我們把前端頁面傳參的部分，統一整理為hyperparameter 超參，以便在main 主體中通過 Python直譯的parse_args lib 解析和獲取，另外把頁面展示資訊相關這些程式碼去掉。

清理後的sd_extentions 程式碼可以參考 https://github.com/qingyuan18/sd_dreambooth_extension.git，可以看到這裡面只保留了核心train 訓練模組，webui.py、helper、shard 等前端耦合相關程式碼都已經清理過。

訓練任務參數傳遞

SageMaker Training Job 支援模型超參的傳遞與解析。在API 中，將剛才提到的原始插件程式碼中model_path、 model_name、instance_prompt、class_prompt 等參數，封裝在一個json 字串的鍵值對格式中，再透過estimator API 傳遞給Training Job，在SageMaker 訓練傳算力機內部，再透過estimator API 傳遞給Training Job，在SageMaker 訓練的運算元表parse_args lib 進行解析和處理了，參考如下程式碼範例：

hyper？ 'models_path': '/opt/ml/model/', 'manul_upload_model_path':s3_model_output_location, 'instance_prompt': instance_prompt, …} estimator = Estimator( role = role, instance_count=1, instance_count=1, instance( role = role_uri_), instance_count=1, instance_countage = instance_), instance_count=1, instance_countage = instance_countimage_ instance_counts= hyperparameters )

WebUI 輸入的模型

為ckpt 格式（現在最新的WebUI 為safetensor 格式），而diffuser 訓練時from_pretrained 加載的model pipeline 為Stable Diffusion 的model path 或本地路徑格式（預設為目錄路徑，目錄下有後綴、unet、tokenizationer 等子模型目錄，每個子模型目錄後綴。

如果客戶生產環境中，是ckpt 格式的單一模型檔案（如從civit.ai 網站下載的模型），那麼我們可以透過diffuser 官方提供的轉換腳本，將其從ckpt 格式轉為diffuser 目錄格式，以便同樣的程式碼在生產環境中進行加載，腳本使用範例如下：

python convert_original_stable_diffusion_to_diffusers.py —checkpoint_path ./models_ckpt/768-v-ema.ckpt —dump_path ./models_diffuser

如上— dump_path 輸出即為diffuser 格式目錄，該目錄下展開可以看到各個vae、unet、text_encoder 的子模型目錄檔。

輸出模型管理

SageMaker 的模型訓練算力機目錄架構如下：

訓練後的模型，會預設輸出到/opt/ml/model/ 目錄下，SageMaker Training Job 完成後，會將這個目錄下的model 檔案打包為tar.gz 文件，並上傳到訓練任務的S3 路徑。對於Stable Diffusion 這樣的複合模型，存在多個子目錄，每個子目錄的模型檔案都是獨立的bin 格式，每個h 模型檔案有4、5G 以上，SageMaker 自動包裝和 upload 到S3 會耗時太長。

因此我們加入一個manul_upload_model_path 參數，指定訓練後的模型檔案手動上傳的S3 路徑，訓練結束後透過S3 SDK 遞歸方式上傳整個模型目錄到指定S3，讓SageMaker 不再打包model.tar.gz。

參考程式碼範例如下：

def upload_directory_to_s3(local_directory, dest_s3_path): bucket,s3_prefix=get_bucket_and_key(dest_s3_path) for root, dirs, files in os.walk(local_ditory): for name in filesfiles. relative_path = os.path.relpath(local_path, local_directory) s3_path = os.path.join(s3_prefix, relative_path).replace("\\", "/") s3_client.upload_file(local_paths, bucket, "/") s3_client。 s3://{bucket}/{s3_path}') for subdir in dirs: upload_directory_to_s3(local_directory+"/"+subdir,dest_s3_path+"/"+subdir) s_pipeline.save_pretrained(args.T5s_v) db model dirs to s3 path##### #### to eliminate sagemaker tar process##### upload_directory_to_s3(args.models_path,args.manul_upload_model_path)

透過此優化，SageMaker 上的Dreambooth training，800 steps 訓練由1小時提升到30分鐘左右。

GPU 顯示最佳化

對於Dreambooth 這樣的大模型fine tuning 訓練，成本是需要考慮的重要因素，Amazon 提供了各種GPU 機型的算力機資源，其中G4dn 機型是性價比最高的，且在幾乎所有Amazon 的區域中都有資源。

但g4dn 機型只有單張16G 顯存的英偉達T4 顯示卡，Dreambooth 要重訓練unet、vae 網絡，來保留先驗損失權重，當需要更高保真度的Dreambooth fine tuning，會多達數十張圖片的輸入數據，1000 step 的訓練過程很顯故障，整個加噪。

為了確保客戶在16G 顯存的成本優勢機型上能夠train Dreambooth 模型，我們做了這幾部分的優化，從而使得Dreambooth fine tuning 在SageMaker 上只需要G4dn.xlarge 的機型，數百到3000的training steps 都可以完成訓練，大幅度降低了客戶訓練成本Dreoth 的費用。

調整fine tuning 組件

在Stable Difussion 模型中，text_encoder 是CLIP 子模型的文本編碼器，對於instance prompt/class prompt 不是長文本的情況下，Dreambooth 不需要重新訓練文本編碼器，因為我們調整了一些規則，如果發現顯存小於16G，關閉text_encoder 部分的重訓練。如果顯存更低，則自動啟用開啟8bit Adam 優化器，以及fp16 半精度梯度資料格式。如果顯存更小，甚至直接offload 到CPU 訓練。

程式碼範例如下：

print(f"Total VRAM: {gb}") if 24 > gb >= 16: attention = "xformers" not_cache_latents = False train_text_encoder = True use_ema = True if 16 > gb >= 10: train_text_en cobse = cobse_b. < 10: use_cpu = True use_8bit_adam = False mixed_precision = 'no'

使用xformers

formers 是開源的訓練加速的框架，透過儲存不同層的參數，每個子層動態載入顯存，以及優化了自註意力機制和跨層的信息傳遞等方法，可以在不影響訓練速度的情況大幅降低顯存。

在Dreambooth 訓練過程中，將attention 關注度由預設的flash 改為xformer，對比開啟xformers 前後的GPU 顯存情況，可以看到該方法明顯降低了顯存使用。

開啟Xformers 前：

***** Running training ***** Instantaneous batch size per device = 1 Total train batch size (w. parallel, distributed & accumulation) = 1 Gradient Accumulation steps = 1 Total optimization steps = 1000: 光. True, TextTr: False EM: True, LR: 2e-06 LORA:False Allocated: 10.5GB Reserved: 11.7GB

開啟Xformers 後：

***** Running training ***** Instantaneous batch size per device = 1 Total train batch size (w. parallel, distributed & accumulation) = 1 Gradient Accumulation steps = 1 Total optimization steps = 1000: 光. True, TextTr: False EM: 真, LR: 2e-06 LORA:False Allocated: 5.5GB Reserved: 5.6GB

其他優化參數

'PYTORCH_CUDA_ALLOC_CONF':'max_split_size_mb:32′對於顯存片段化所造成的CUDA OOM，可以將PYTORCH_CUDA_ALLOC_CONF 的max_split_size_mb 設為較小值。
train_batch_size':1每次處理的圖片數量，如果instance images 或class image 不多的情況下（小於10張），可以把該值設為1，減少一個批次處理的圖片數量，一定程度降低顯存使用。

'sample_batch_size': 1和train_batch_size 對應，一次進行取樣加噪和降噪的批次吞吐量，調低該值也對應降低顯存使用。
not_cache_latents 另外，Stable Diffusion 的訓練，是基於Latent Diffusion Models，原始模型會緩存latent，而我們主要是訓練instance prompt, class prompt 下的正則化，因此在GPU 顯存緊張情況下，我們可以選擇不緩存latent，最大限度降低顯存latent，最大限度降低顯存latent，最大限度降低顯存latent，最大限度降低顯存latent，最大限度降低顯存latent，最大限度降低顯存latent，最大限度降低顯存latent，最大限度降低顯存latent，最大限度降低顯存latent，最大限度降低顯存latent，最大限度降低顯存latent，最大限度降低顯存latent，最大限度地降低顯存latent，最大限度地降低顯存latent，最大限度地降低顯存latent，最大限度地降低顯存latent，最大限度地降低顯存latent，最大限度地降低顯存latent，最大限度地降低顯存latent，最大限度地降低顯存latent。

'gradient_accumulation_steps' 梯度更新的批次，如果訓練steps 較大，例如1000，可以增加梯度更新的步數，累計到一定批次再一次性更新，該值越大，顯存佔用越高，如果希望降低顯存，可以在犧牲一部分訓練時長的前提下減少該值。注意如果選擇了重新訓練文字編碼器text_encode，不支援梯度累積，且多GPU 的機器上開啟了accelerate 的多卡分散式訓練，則批量梯度更新gradient_accumulation_steps 只能設定為1，否則文字編碼器的重訓練將被停用。

Stable Diffusion Quick Kit Dreambooth 模型微調演示

示範中我們使用了一個貓玩具的4張圖片，透過工具進行了512×512統一尺寸裁切。

然後進入提前創建好的SageMaker notebook，克隆Quick Kit 倉庫，git clone https://github.com/aws-samples/sagemaker-stablediffusion-quick-kit，打開fine-tuning/dreambooth/stablediffusion_dreambooth_finetuning.zh.ipyn book 一步一步操作。

#使用了zwx作為觸發詞, 模型訓練好之後我們使用這個字來產生圖instance_prompt="photo\ of\ zwx\ toy" class_prompt="photo\ of\ a\ cat toy" #notebook訓練代碼說明#設定為超參5,vironment = {split_FCUDA_Ld:FFDA_LD:,FDA:7_45:47_LDA_F:,FDA:7_4:,FDA_LDA:7_LDA_4:0,FDAd:7_4:,FDAd:7_4:,FDA:7_4:,4) =>d&F4:,37_45:4_LDA_4:4_LDAd:7_FDA; 'LD_LIBRARY_PATH':"${LD_LIBRARY_PATH}:/opt/conda/lib/" } hyperparameters = { 'model_name':'aws-trained-dreambooth-model', 'mixed_precision':'fp16', 'pretrained_model_name_or_path': model_name, 'instance_data_dir':instance_dir, 'class_data_dir':class_dir, 'with_prior_preservation':True, 'models_path': '/opt/ml/model/', 'instance_prompt': instance_prompt, 'class_prompt':class_ 'sample_batch_size': 1, 'gradient_accumulation_steps':1, 'learning_rate':2e-06, 'lr_scheduler':'constant', 'lr_warmup_steps':0, 'num_class_images':50, 'max_train_steps. 'attention':'xformers', 'prior_loss_weight': 0.5, 'use_ema':True, 'train_text_encoder':False, 'not_cache_latents':True, 'gradient_checkpointing':True, 'save_latents':True, 'gradient_checkpointing':True, 'save_latents':True, 'gradient_checkpointing':True, 'save_use_adgse: Fvalse_adjy_valy_valy _pse_sv) json_encode_hyperparameters(hyperparameters) #啟動sagemaker training job from sagemaker.estimator import Estimator inputs = { 'images': f"s3://{bucket}/dreambooth/images/" } estimator = Estimator_instance = instatype_instance_ instance_ instance = instance_instance_ insta長度_instance_ instance_ instance_instance_ instance_instance_ instance_instances_ instance_ instance_instances_ instance_ instance_instance_ instance_ instance_ instance_ instance_insta; = image_uri, hyperparameters = hyperparameters, environment = environment ) estimator.fit(inputs)

訓練任務啟動日誌:

訓練時間大約在40分鐘左右，也可以透過控制台SageMaker Training Job 查看CloudWatch 日誌，訓練結束後會自動把模型上傳到S3。

訓練完成後可以使用Quick Kit 推理notebook 將訓練好的模型載入到SageMaker 進行推理，訓練好的模型測試如下:

結論

綜上所述，本文介紹Dreambooth 的業務需求及技術原理，透過在Amazon SageMaker 上BYOC 方式的Training Job 解決方案，以及顯存、模型管理、超參等的優化實踐，實現了Dreambooth fine tuning 的生產化運作。文中腳本程式碼及筆記本訓練範例，可做為使用者基於Stable Diffusion 的AIGC ML 平台的工程化的基礎。

附錄

Stable Diffusion Quick Kit github:

https://github.com/aws-samples/sagemaker-stablediffusion-quick-kit

Stable Diffusion Quick Kit Dreambooth 微調文件:

https://catalog.us-east-1.prod.workshops.aws/workshops/1ac668b1-dbd3-4b45-bf0a-5bc36138fcf1/zh-CN/4-configuration-stablediffusion/4-4-find-tuning-notebook

Dreambooth 論文:

https://dreambooth.github.io/

Dreambooth 原始開源github: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb#scrollTo=rscg285SBh4M

Huggingface diffuser 格式轉換工具：

https://github.com/huggingface/diffusers/tree/main/scripts

Stable diffusion webui dreambooth extendtion 外掛：

https://github.com/d8ahazard/sd_dreambooth_extension.git