Generative Models by Stability AI
News
June 22, 2023
- We are releasing two new diffusion models for research purposes:
-
SD-XL 0.9-base
: The base model was trained on a variety of aspect ratios on images with resolution 1024^2. The base model uses OpenCLIP-ViT/G and CLIP-ViT/L for text encoding whereas the refiner model only uses the OpenCLIP model. -
SD-XL 0.9-refiner
: The refiner has been trained to denoise small noise levels of high quality data and as such is not expected to work as a text-to-image model; instead, it should only be used as an image-to-image model.
-
If you would like to access these models for your research, please apply using one of the following links:
SDXL-0.9-Base model, and SDXL-0.9-Refiner.
This means that you can apply for any of the two links - and if you are granted - you can access both.
Please log in to your HuggingFace Account with your organization email to request access.
We plan to do a full release soon (July).
The codebase
General Philosophy
Modularity is king. This repo implements a config-driven approach where we build and combine submodules by calling instantiate_from_config()
on objects defined in yaml configs. See configs/
for many examples.
Changelog from the old ldm
codebase
For training, we use pytorch-lightning, but it should be easy to use other training wrappers around the base modules. The core diffusion model class (formerly LatentDiffusion
, now DiffusionEngine
) has been cleaned up:
- No more extensive subclassing! We now handle all types of conditioning inputs (vectors, sequences and spatial conditionings, and all combinations thereof) in a single class:
GeneralConditioner
, seesgm/modules/encoders/modules.py
. - We separate guiders (such as classifier-free guidance, see
sgm/modules/diffusionmodules/guiders.py
) from the samplers (sgm/modules/diffusionmodules/sampling.py
), and the samplers are independent of the model. - We adopt the "denoiser framework" for both training and inference (most notable change is probably now the option to train continuous time models):
- Discrete times models (denoisers) are simply a special case of continuous time models (denoisers); see
sgm/modules/diffusionmodules/denoiser.py
. - The following features are now independent: weighting of the diffusion loss function (
sgm/modules/diffusionmodules/denoiser_weighting.py
), preconditioning of the network (sgm/modules/diffusionmodules/denoiser_scaling.py
), and sampling of noise levels during training (sgm/modules/diffusionmodules/sigma_sampling.py
).
- Discrete times models (denoisers) are simply a special case of continuous time models (denoisers); see
- Autoencoding models have also been cleaned up.
Installation:
1. Clone the repo
git clone git@github.com:Stability-AI/generative
-models.git
cd generative-models
2. Setting up the virtualenv
This is assuming you have navigated to the generative-models
root after cloning it.
NOTE: This is tested under python3.8
and python3.10
. For other python versions, you might encounter version conflicts.
PyTorch 1.13
# install required packages from pypi
python3 -m venv .pt1
source .pt1/bin/activate
pip3 install wheel
pip3 install -r requirements_pt13.txt
PyTorch 2.0
# install required packages from pypi
python3 -m venv .pt2
source .pt2/bin/activate
pip3 install wheel
pip3 install -r requirements_pt2.txt
Inference:
We provide a streamlit demo for text-to-image and image-to-image sampling in scripts/demo/sampling.py
. The following models are currently supported:
Weights for SDXL :
If you would like to access these models for your research, please apply using one of the following links:
SDXL-0.9-Base model, and SDXL-0.9-Refiner.
This means that you can apply for any of the two links - and if you are granted - you can access both.
Please log in to your HuggingFace Account with your organization email to request access.
After obtaining the weights, place them into checkpoints/
.
Next, start the demo using
streamlit run scripts/demo/sampling.py --server.port <your_port>
Invisible Watermark Detection
Images generated with our code use the
invisible-watermark
library to embed an invisible watermark into the model output. We also provide
a script to easily detect that watermark. Please note that this watermark is
not the same as in previous Stable Diffusion 1.x/2.x versions.
To run the script you need to either have a working installation as above or
try an experimental import using only a minimal amount of packages:
python -m venv .detect
source .detect/bin/activate
pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"
pip install --no-deps invisible-watermark
To run the script you need to have a working installation as above. The script
is then useable in the following ways (don't forget to activate your
virtual.
Top comments (0)