How to Fine-tune Stable Diffusion using Dreambooth (2024)

How to Fine-tune Stable Diffusion using Dreambooth (3)

Previously, I have covered an article on fine-tuning Stable Diffusion using textual inversion. This tutorial focuses on how to fine-tune Stable Diffusion using another method called Dreambooth. Unlike textual inversion method which train just the embedding without modification to the base model, Dreambooth fine-tune the whole text-to-image model such that it learns to bind a unique identifier with a specific concept (object or style). As a result, the generated images is more personalized to the object or style compared to textual inversion.

This tutorial is based on a forked version of Dreambooth implementation by HuggingFace. The original implementation requires about 16GB to 24GB in order to fine-tune the model. The maintainer ShivamShrirao optimized the code to reduce VRAM usage to under 16GB. Depending on your needs and settings, you can fine-tune the model with 10GB to 16GB GPU. I have personally tested the training to be feasible on Tesla T4 GPU.

Please note that all the existing implementation is not by the original author of Dreambooth. As a result, there might be slight difference in terms of reproducibility.

Let’s proceed to the next section to setup all the necessary modules.

It is recommended to create a new virtual environment before you continue with the installation.

Python packages

In your working directory, create a new file called requirements.txt with the following code:

accelerate==0.12.0
torchvision
transformers>=4.21.0
ftfy
tensorboard
modelcards

Activate your virtual environment and run the following command one by one to install all the necessary modules:

pip install git+https://github.com/ShivamShrirao/diffusers.git
pip install -r requirements.txt

NOTE: You need to install diffusers using the url above instead of installing it directly from pypi.

bitsandbytes package

There is an optional package called bitsandbytes, which can reduce the VRAM usage further…

How to Fine-tune Stable Diffusion using Dreambooth (2024)

FAQs

How to fine tune Stable Diffusion with DreamBooth? ›

First, download the pre-trained Stable Diffusion model as a starting point. Then train this model with a few images of a subject. To achieve this, choose a non-word as an identifier, such as unqtkn . When fine-tuning the model with this subject, you teach the model that the prompt is A photo of a unqtkn <class> .

How much VRAM to fine tune Stable Diffusion? ›

Now You Can Full Fine Tune / DreamBooth Stable Diffusion XL (SDXL) with only 10.3 GB VRAM via OneTrainer — Both U-NET and Text Encoder 1 is trained — Compared 14 GB config vs slower 10.3 GB Config.

How many images to fine tune Stable Diffusion? ›

You can use as few as 5 images, but 10-20 images is better. The more images you use, the better the fine-tune will be.

How much VRAM does DreamBooth need? ›

Windows systems require a minimum of 16GB of video memory, Linux systems require a minimum of 8GB of video memory.

Can you do NSFW with Stable Diffusion? ›

You can play around with different Stable Diffusion models to get the best NSFW content if you have an excellent artistic sense. You have to choose the best-suited model for the type of NSFW images you want to create to achieve the desired output.

How many images does DreamBooth need? ›

Class images: Denote the images generated using the "class prompt" for using prior preservation in DreamBooth training. We leverage the pre-trained model before fine-tuning it to generate these class images. Typically, 200 - 300 class images are enough.

Can 2gb VRAM run Stable Diffusion? ›

Stable Diffusion requires a decent graphics card to run efficiently, but there are no specific requirements beyond that. However, it is recommended to have a dedicated graphics card with at least 2 GB of VRAM for optimal performance.

Is 8GB RAM enough for Stable Diffusion? ›

What PC Hardware Does Stable Diffusion Require?
  • CPU: Any modern AMD or Intel CPU.
  • RAM: A minimum of 16 gigabytes of DDR4 or DDR5 RAM.
  • Storage: Any SATA or NVMe solid-state drive from a reputable company that is 256 gigabytes or larger.
  • GPU: Any GeForce RTX GPU with a minimum of 8 gigabytes of GDDR6 memory.
Jul 10, 2023

Is GPU or CPU better for Stable Diffusion? ›

Because stable diffusion can be computationally intensive, most developers believe a GPU is required in order to run.

What is the best image size for Stable Diffusion? ›

What is the best image size for Stable Diffusion? Stable Diffusion can create images from 64×64 to 1024×1024 pixels, but optimal results are achieved with its default 512×512 size. This size ensures consistency, diversity, speed, and manageable memory usage.

How does DreamBooth work? ›

The methodology used to run implementations of DreamBooth involves the fine-tuning the full UNet component of the diffusion model using a few images (usually 3--5) depicting a specific subject. Images are paired with text prompts that contain the name of the class the subject belongs to, plus a unique identifier.

How many images are needed for fine-tuning? ›

Updating your Personal AI Profile requires half the number of final edits as the last AI Profile training but with a minimum of 3,000 edits. For example, if you created your Personal AI Profile with 3,000 photos, you will need 3,000 new photos to run an effective Fine-tune update.

What is the best batch size for DreamBooth? ›

For training objects or particular people, or a small dataset of images try a batch size of 2-4. This will make more of the gradients get calculated. Gradients are processed across the batch.

Is 24 VRAM overkill? ›

but 24gb of vram is a marketing ploy. Consoles this gen only have 16gb total, meaning more than 16gb of vram just isn't likely to be necessary for any games for the next few years. None of the cards today are likely to be all that great 6 or 7 years from now, anyway. 12~16gb for today is about right.

What is the difference between Stable Diffusion and DreamBooth? ›

Stable Diffusion is ideal for generating general images but lacks personalization, requiring extensive training data. In contrast, DreamBooth is tailored for customization, demands a smaller dataset, and excels in generating images with specific subjects in various scenarios.

How do you make Stable Diffusion higher quality? ›

To upscale images using stable diffusion, begin by selecting the desired input image. Next, adjust the generation parameters, including the scale, batch size, and batch count, to suit your preferences. After setting the parameters, initiate the upscaling process by clicking on the generate button.

How to run DreamBooth with Stable Diffusion locally? ›

Under Extensions>Available, click on the “Load from:” button to show all the available extensions. Find the DreamBooth extension and click on "Install." Next, go to the “Installed” tab and click on the “Apply and restart UI” button. Your Web UI will restart, and you should be able to see the Dreambooth tab.

How can we improve Stable Diffusion model? ›

Here are the best practices for training a high-quality Stable Diffusion model: Curate High-Quality Training Data: The quality of your training data will have a significant impact on the quality of your model's output. You should make sure to use a large and diverse dataset of images that are relevant to your use case.

Top Articles
Latest Posts
Article information

Author: Ouida Strosin DO

Last Updated:

Views: 6389

Rating: 4.6 / 5 (76 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Ouida Strosin DO

Birthday: 1995-04-27

Address: Suite 927 930 Kilback Radial, Candidaville, TN 87795

Phone: +8561498978366

Job: Legacy Manufacturing Specialist

Hobby: Singing, Mountain biking, Water sports, Water sports, Taxidermy, Polo, Pet

Introduction: My name is Ouida Strosin DO, I am a precious, combative, spotless, modern, spotless, beautiful, precious person who loves writing and wants to share my knowledge and understanding with you.