Stable Diffusion

API调用

https://qishiya.com/?p=1795

SDXL model(Stable Diffusion XL)

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Model Spec

arxiv



Left: Comparing user preferences between SDXL and Stable Diffusion 1.5 & 2.1. While SDXL already clearly outperforms Stable Diffusion 1.5 & 2.1, adding the additional refinement stage boosts performance. Right: Visualization of the two-stage pipeline: We generate initial latents of size 128 × 128 using SDXL. Afterwards, we utilize a specialized high-resolution refinement model and apply SDEdit [28] on the latents generated in the first step, using the same prompt. SDXL and the refinement model use the same autoencoder

Left: Comparing user preferences between SDXL and Stable Diffusion 1.5 & 2.1. While SDXL already clearly outperforms Stable Diffusion 1.5 & 2.1, adding the additional refinement stage boosts performance. Right: Visualization of the two-stage pipeline: We generate initial latents of size 128 × 128 using SDXL. Afterwards, we utilize a specialized high-resolution refinement model and apply SDEdit [28] on the latents generated in the first step, using the same prompt. SDXL and the refinement model use the same autoencoder

The SDXL model consists of two models — The base model and the refiner model. The language model (the module that understands your prompts) is a combination of the largest OpenClip model (ViT-G/14) and OpenAI’s proprietary CLIP ViT-L

picture size: 1024 x 1024

Usage on Colab

Colab PRACTICE: StableDiffusionXL_Text2Image
huggingface： diffusers code: A library that offers an implementation of various diffusion models, including text-to-image models.
huggingface： diffusers doc
Stability-AI: generative-models

prompt = "cute robot stands on the moon, shallow depth of field, cinematic composition --ar 16:9"

start = datetime.now()
image = pipe(prompt=prompt).images[0]
print((datetime.now() - start).seconds)

display(image)

prompt	demo1	demo2
cute robot stands on the moon, shallow depth of field, cinematic composition --ar 16:9

The girl who is drawing. Makoto Shinkai style,Portrait,highly detailed, sharp focus,sci-fi, stunningly beautiful, dystopian, --ar 16:9

snoopy

ref links
- medium: how to run sdxl 1.0

防止超时断连

function ConnectButton(){
    console.log("Connect pushed"); 
    document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click() 
}
setInterval(ConnectButton,60000);

// id: the result outpur
// clearInterval(id);  // stop the execution

突破77token限制

Stable diffusion webui on colab

stable-diffusion-webui-colab

Stable Diffusion API

api reference

LoRA 微调

Low-Rank Adaptation of Large Language Models

通过注入训练层对模型进行微调,由于没有直接改变大模型的权重.大大减少了训练计算量

lora model	example
KoreanDollLikeness_v10 同系列：ChinaDollLikeness, JapaneseDollLikeness
cutegirlmix4
Makoto Shinkai style LoRA

提示词

extremely detail

style

Neo-expressionism of a cartoonish pluto , in the style of pulled, scraped, and scratched, meditative, unconventional poses, spiky mounds, 1970s, twisted characters, soggy Banksy style, vintage, by tim burton style --ar 2:3 --v 5.2 <\br> Neo-expressionism of 新表现主义 cartoonish pluto 卡通般的冥王星 in the style of 在拉、刮和划痕的风格 pulled, scraped, and scratched meditative 冥想 unconventional poses 非常规姿势 spiky mounds 尖锐的土堆 1970s 1970年代 twisted characters 扭曲的人物 soggy Banksy style 湿漉漉的Banksy风格 vintage 复古 by tim burton style 蒂姆·伯顿风格

alt