Training Stable Diffusion with Dreambooth using Diffusers (2024)

Back to blog

Published November 7, 2022. Update on GitHub

valhalla Suraj Patil pcuenq Pedro Cuenca NineOfNein Valentine Kozin guest

Dreambooth is a technique to teach new concepts to Stable Diffusion using a specialized form of fine-tuning. Some people have been using it with a few of their photos to place themselves in fantastic situations, while others are using it to incorporate new styles. 🧨 Diffusers provides a Dreambooth training script. It doesn't take long to train, but it's hard to select the right set of hyperparameters and it's easy to overfit.

We conducted a lot of experiments to analyze the effect of different settings in Dreambooth. This post presents our findings and some tips to improve your results when fine-tuning Stable Diffusion with Dreambooth.

Before we start, please be aware that this method should never be used for malicious purposes, to generate harm in any way, or to impersonate people without their knowledge. Models trained with it are still bound by the CreativeML Open RAIL-M license that governs distribution of Stable Diffusion models.

Note: a previous version of this post was published as a W&B report.

TL;DR: Recommended Settings

Dreambooth tends to overfit quickly. To get good-quality images, we must find a 'sweet spot' between the number of training steps and the learning rate. We recommend using a low learning rate and progressively increasing the number of steps until the results are satisfactory.
Dreambooth needs more training steps for faces. In our experiments, 800-1200 steps worked well when using a batch size of 2 and LR of 1e-6.
Prior preservation is important to avoid overfitting when training on faces. For other subjects, it doesn't seem to make a huge difference.
If you see that the generated images are noisy or the quality is degraded, it likely means overfitting. First, try the steps above to avoid it. If the generated images are still noisy, use the DDIM scheduler or run more inference steps (~100 worked well in our experiments).
Training the text encoder in addition to the UNet has a big impact on quality. Our best results were obtained using a combination of text encoder fine-tuning, low LR, and a suitable number of steps. However, fine-tuning the text encoder requires more memory, so a GPU with at least 24 GB of RAM is ideal. Using techniques like 8-bit Adam, fp16 training or gradient accumulation, it is possible to train on 16 GB GPUs like the ones provided by Google Colab or Kaggle.
Fine-tuning with or without EMA produced similar results.
There's no need to use the sks word to train Dreambooth. One of the first implementations used it because it was a rare token in the vocabulary, but it's actually a kind of rifle. Our experiments, and those by for example @nitrosocke show that it's ok to select terms that you'd naturally use to describe your target.

Learning Rate Impact

Dreambooth overfits very quickly. To get good results, tune the learning rate and the number of training steps in a way that makes sense for your dataset. In our experiments (detailed below), we fine-tuned on four different datasets with high and low learning rates. In all cases, we got better results with a low learning rate.

Experiments Settings

All our experiments were conducted using the train_dreambooth.py script with the AdamW optimizer on 2x 40GB A100s. We used the same seed and kept all hyperparameters equal across runs, except LR, number of training steps and the use of prior preservation.

Cat Toy

High Learning Rate (5e-6)

Low Learning Rate (2e-6)

Pighead

High Learning Rate (5e-6). Note that the color artifacts are noise remnants – running more inference steps could help resolve some of those details.

Low Learning Rate (2e-6)

Mr. Potato Head

High Learning Rate (5e-6). Note that the color artifacts are noise remnants – running more inference steps could help resolve some of those details.

Low Learning Rate (2e-6)

Human Face

We tried to incorporate the Kramer character from Seinfeld into Stable Diffusion. As previously mentioned, we trained for more steps with a smaller batch size. Even so, the results were not stellar. For the sake of brevity, we have omitted these sample images and defer the reader to the next sections, where face training became the focus of our efforts.

Summary of Initial Results

To get good results training Stable Diffusion with Dreambooth, it's important to tune the learning rate and training steps for your dataset.

High learning rates and too many training steps will lead to overfitting. The model will mostly generate images from your training data, no matter what prompt is used.
Low learning rates and too few steps will lead to underfitting: the model will not be able to generate the concept we were trying to incorporate.

Faces are harder to train. In our experiments, a learning rate of 2e-6 with 400 training steps works well for objects but faces required 1e-6 (or 2e-6) with ~1200 steps.

Image quality degrades a lot if the model overfits, and this happens if:

The learning rate is too high.
We run too many training steps.
In the case of faces, when no prior preservation is used, as shown in the next section.

Using Prior Preservation when training Faces

Prior preservation is a technique that uses additional images of the same class we are trying to train as part of the fine-tuning process. For example, if we try to incorporate a new person into the model, the class we'd want to preserve could be person. Prior preservation tries to reduce overfitting by using photos of the new person combined with photos of other people. The nice thing is that we can generate those additional class images using the Stable Diffusion model itself! The training script takes care of that automatically if you want, but you can also provide a folder with your own prior preservation images.

Prior preservation, 1200 steps, lr=2e-6.

No prior preservation, 1200 steps, lr=2e-6.

As you can see, results are better when prior preservation is used, but there are still noisy blotches. It's time for some additional tricks!

Effect of Schedulers

In the previous examples, we used the PNDM scheduler to sample images during the inference process. We observed that when the model overfits, DDIM usually works much better than PNDM and LMSDiscrete. In addition, quality can be improved by running inference for more steps: 100 seems to be a good choice. The additional steps help resolve some of the noise patches into image details.

PNDM, Kramer face

LMSDiscrete, Kramer face. Results are terrible!

DDIM, Kramer face. Much better

A similar behaviour can be observed for other subjects, although to a lesser extent.

PNDM, Potato Head

LMSDiscrete, Potato Head

DDIM, Potato Head

Fine-tuning the Text Encoder

The original Dreambooth paper describes a method to fine-tune the UNet component of the model but keeps the text encoder frozen. However, we observed that fine-tuning the encoder produces better results. We experimented with this approach after seeing it used in other Dreambooth implementations, and the results are striking!

Frozen text encoder

Fine-tuned text encoder

Fine-tuning the text encoder produces the best results, especially with faces. It generates more realistic images, it's less prone to overfitting and it also achieves better prompt interpretability, being able to handle more complex prompts.

Epilogue: Textual Inversion + Dreambooth

We also ran a final experiment where we combined Textual Inversion with Dreambooth. Both techniques have a similar goal, but their approaches are different.

In this experiment we first ran textual inversion for 2000 steps. From that model, we then ran Dreambooth for an additional 500 steps using a learning rate of 1e-6. These are the results:

We think the results are much better than doing plain Dreambooth but not as good as when we fine-tune the whole text encoder. It seems to copy the style of the training images a bit more, so it could be overfitting to them. We didn't explore this combination further, but it could be an interesting alternative to improve Dreambooth and still fit the process in a 16GB GPU. Feel free to explore and tell us about your results!

Training Stable Diffusion with Dreambooth using Diffusers (2024)

FAQs

How many steps does it take to train DreamBooth? ›

Dreambooth needs more training steps for faces. In our experiments with batch size of 2 and LR of 1e-6, around 800-1200 steps worked well. Prior preservation is important to avoid overfitting when training on faces, for other objects it doesn't seem to make a huge difference.

Know More ›

How long does it take to train DreamBooth? ›

You can train a model with as few as three images and the training process takes less than half an hour. Notably, DreamBooth works with people, so you can make a version of Stable Diffusion that can generate images of yourself.

Discover More Details ›

How long does it take to train Stable Diffusion? ›

We wanted to know how much time (and money) it would cost to train a Stable Diffusion model from scratch using our Streaming datasets, Composer, and MosaicML Cloud Platform. Our results: it would take us 79,000 A100-hours in 13 days, for a total training cost of less than $160,000.

Get More Info ›

How do I set up DreamBooth on Stable Diffusion? ›

Find the DreamBooth extension and click on "Install." Next, go to the “Installed” tab and click on the “Apply and restart UI” button. Your Web UI will restart, and you should be able to see the Dreambooth tab.

Get More Info Here ›

How to train a DreamBooth model? ›

Training Process

git clone into RunPod's workspace. Inside a new Jupyter notebook, execute this git command to clone the code repository into the pod's workspace. ...
upload the sd-v1-4. ...
Run notebook on pod. ...
Run notebook cell to generate regularization images. ...
Put your images into training images folder. ...
Training the model.

Nov 11, 2022

Get More Info Here ›

What is Stable Diffusion model? ›

Stable Diffusion is a deep learning, text-to-image model released in 2022. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt.

How does Stable Diffusion work? ›

Stable Diffusion works by adding noise to an image. The model then reverses the noising process and gradually improves the quality of the image until there is no noise, thus generating a realistic image to match the text prompt. Popular models include Open AI's Dalle-E 2, Midjourney, and Dream Studio.

Can you train surf? ›

Individuals may train-surf to avoid the cost of a ticket or as a recreational activity. With the creation of the internet, the practice of filming the act and posting online videos of it is on the increase worldwide.

What is DreamBooth reddit? ›

r/DreamBooth. DreamBooth is a method by Google AI that has been notably implemented into models like Stable Diffus… More. 2.7K members • 37 online Join.

Get More Info ›

How many GPUs do you need to train Stable Diffusion? ›

According to Mostaque, the Stable Diffusion team used a cloud cluster with 256 Nvidia A100 GPUs for training. This required about 150,000 hours, which Mostaque says equates to a market price of about $600,000.

Discover More Details ›

How many images do you need to train Stable Diffusion? ›

Even though the original paper recommends using 4 to 6 images, the community in Discord has found that using 10 to 12 images leads to better results.

Find Out More ›

Does Stable Diffusion cost money? ›

Although the Stable Diffusion model is open-source, the DreamStudio website is only partially free. First-time users get 200 free credits, and once you use them all, you have to pay a $10 fee per generation. As DreamStudio explains, fees are charged to cover the computing costs required for each generated image.

Explore More ›

How many images does DreamBooth need? ›

The number of training images should be around 5 to 20. You may need to crop the images to focus on just the object.

Keep Reading ›

How do you train Stable Diffusion with your face? ›

STEP 1: Decide on the GPU and VRAM.
STEP 2: Run DreamBooth.
STEP 3: Log in to Hugging Face.
STEP 4: Install xformers.
STEP 5: Connect Google Drive.
STEP 6: Upload reference photos.
STEP 7: Train AI model with DreamBooth.
STEP 8: Convert AI model to ckpt format.

More items...

Oct 10, 2022

How can I make my model train run better? ›

Fixing the Train

Two common causes of sluggish trains are dirty wheels and dry gear. Spend some time cleaning the wheels. Then, thoroughly lubricate the gears. If the train still runs poorly, your best option is to visit a nearby model train repair shop or find an online repair service.

Get More Info Here ›

How do you train a model for a data set? ›

3 steps to training a machine learning model

Step 1: Begin with existing data. Machine learning requires us to have existing data—not the data our application will use when we run it, but data to learn from. ...
Step 2: Analyze data to identify patterns. ...
Step 3: Make predictions.

Know More ›

How does Stable Diffusion make money? ›

The company plans to make money by selling access to its AI technology, including AI Magic Tools, a suite of more than 30 utilities for generating and editing images. In addition to creating images with just a few words, Runway's offerings can create images based on other images.

Get More Info ›

Who owns Stable Diffusion? ›

Company Behind Stable Diffusion Partners with AWS. This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG.

Learn More Now ›

How do you mix Stable Diffusion models? ›

Bat file Method for Windows Users

Download this repo as a zip file.
Extract the folder and place it in the main folder of your stable-diffusion install. Copy the two models you want to merge into the folder you just created. Run merge.bat. The .bat file should guide you through the merge process.

Is there a Stable Diffusion app? ›

DreamStudio is the official web app for Stable Diffusion from Stability AI. To use the base model of version 2, change the settings of the model to “Stable Diffusion 2.0” like so. The base model generates 512x512 resolution.

How are diffusion models trained? ›

Diffusion models work by deconstructing training data through the successive addition of Gaussian noise, and then learning to recover the data by reversing this noising process. After training, the model can generate data by simply passing randomly sampled noise through the learned de-noising process.

Explore More ›

What do I need for my 360 booth? ›

To set-up a 360 booth, you will need a 360 Platform where guests will stand and an arm will rotate to capture their video. You will also need a ring light to mount on the 360 arm. RevoSpin - U.S. Based, offering both manual and automatic spinner platforms.

Show Me More ›

How do Photobooths work? ›

An open-air photo booth is a stand alone photo kiosk, with a camera and a touchscreen that takes a picture of people, usually against a backdrop. It's set up out in the open, so guests at the party can see all of the action going on in the photo booth.

Read On ›

Is 5000 steps a good workout? ›

Yes, walking 5000 steps a day can indeed help with weight loss. Walking even 2,500 steps a day will result in weight loss. Although 5,000 steps are considered less active compared to 10,000 steps, it can still create the calorie deficit required to achieve weight loss.

See Details ›

How many steps should I walk for bodybuilding? ›

In short, a good target is probably somewhere around 5000-7500 steps per day, and anything much more than that is probably not necessary.

Keep Reading ›

How many hours is 10 K steps? ›

Ten thousand steps equates to about eight kilometres, or an hour and 40 minutes walking, depending on your stride length and walking speed.

How long does it take to do 100 steps? ›

Take this in stride: A study of walking speeds

Even though the participants had a range of different body weights and fitness levels, researchers found that what constituted brisk or moderate walking was consistent across the studies: about 100 steps per minute (or about 2.7 miles per hour).

Explore More ›

How far should a 65 year old walk every day? ›

Generally, older adults in good physical shape walk somewhere between 2,000 and 9,000 steps daily. This translates into walking distances of 1 and 4-1/2 miles respectively. Increasing the walking distance by roughly a mile will produce health benefits.

See Details ›

Is it better to walk faster or longer? ›

In a new study, which looks at activity tracker data from 78,500 people, walking at a brisk pace for about 30 minutes a day led to a reduced risk of heart disease, cancer, dementia and death, compared with walking a similar number of steps but at a slower pace.

Show Me More ›

What is a good distance to walk everyday? ›

Walking is a form of low impact, moderate intensity exercise that has a range of health benefits and few risks. As a result, the CDC recommend that most adults aim for 10,000 steps per day . For most people, this is the equivalent of about 8 kilometers, or 5 miles.

Discover More Details ›

What exercise is equivalent to walking 10000 steps? ›

“Running or jogging two and a half miles is equivalent to walking 10,000 steps,” says Chauncey Graham, CSCS, an ACE Fitness Professional at Gold's Gym in Washington, DC. Higher-intensity workouts also come with added benefits, including improvements to your cardiorespiratory system.

How long should you walk a day to build muscle? ›

Just 30 minutes every day can increase cardiovascular fitness, strengthen bones, reduce excess body fat, and boost muscle power and endurance.

How many miles is 10,000 steps? ›

An average person has a stride length of approximately 2.1 to 2.5 feet. That means that it takes over 2,000 steps to walk one mile and 10,000 steps would be almost 5 miles.

Get More Info Here ›

How many steps should a 74 year old woman walk a day? ›

Many experts agree that the recommended steps per day for seniors is 7,000-10,000.

See Details ›

Does walking in place count as steps? ›

It does count the steps, but when it shows miles walked, it shows less, because when I walk in place, I am not going forward. It's still ok though. Because if I didn't walk in place as much as I do, I would not be getting enough steps. I like to walk through the commercials.

Read The Full Story ›

Does walking 10000 steps count as exercise? ›

Experts say, walking 10,000 steps is a form of low-to-moderate intensity cardio exercise. Thus, it can be equivalent to other low-moderate intensity cardio exercises like cycling, elliptical cross trainer, swimming and aquatic exercises.

Explore More ›

How fast should a 68 year old walk a mile? ›

The 1-Mile Walking Test

Age	20-29	60-69
Good	13:12-14:06	15:06-16:18
Average	14:07-15:06	16:19-17:30
Fair	15:07-16:30	17:31-19:12
Poor	>16:30	>19:12

1 more row

May 18, 2010

Know More ›

How many steps is 60 minutes of walking? ›

Since cadences were only measured for 3 MET (slow) and 5 MET (fast) walks, 122 steps/min is a mid-way estimate for a 4 MET walk. This produces an estimate of 3,660 steps in 30 minutes and 7,320 steps in 60 minutes.

Show Me More ›

What is a good walking speed? ›

The average walking pace is 2.5 to 4 mph, according to the Centers for Disease Control and Prevention. Factors that affect the speed of your pace include physical fitness levels, the incline and your age. Competitive walkers, for instance, can walk an 11-minute mile, according to a 2015 study on walking groups.

Keep Reading ›