Image-to-Video
Diffusers
ONNX
Safetensors
WanPipeline
video-generation
video diffusion transformer
audio-driven avatar animation
Instructions to use FrancisRing/StableAvatar with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use FrancisRing/StableAvatar with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("FrancisRing/StableAvatar", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
Upload config.json
Browse files- config.json +19 -0
config.json
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"i2v": true,
|
| 3 |
+
"use_audio": true,
|
| 4 |
+
"random_prefix_frames": true,
|
| 5 |
+
"sp_size": 1,
|
| 6 |
+
"text_encoder_path": "/tmp/pretrained_models/Wan2.1-T2V-14B/models_t5_umt5-xxl-enc-bf16.pth",
|
| 7 |
+
"image_encoder_path": "None",
|
| 8 |
+
"dit_path": "/tmp/pretrained_models/Wan2.1-T2V-14B/diffusion_pytorch_model-00001-of-00006.safetensors,/tmp/pretrained_models/Wan2.1-T2V-14B/diffusion_pytorch_model-00002-of-00006.safetensors,/tmp/pretrained_models/Wan2.1-T2V-14B/diffusion_pytorch_model-00003-of-00006.safetensors,/tmp/pretrained_models/Wan2.1-T2V-14B/diffusion_pytorch_model-00004-of-00006.safetensors,/tmp/pretrained_models/Wan2.1-T2V-14B/diffusion_pytorch_model-00005-of-00006.safetensors,/tmp/pretrained_models/Wan2.1-T2V-14B/diffusion_pytorch_model-00006-of-00006.safetensors",
|
| 9 |
+
"model_config": {
|
| 10 |
+
"in_dim": 33,
|
| 11 |
+
"audio_hidden_size": 32
|
| 12 |
+
},
|
| 13 |
+
"train_architecture": "lora",
|
| 14 |
+
"lora_target_modules": "q,k,v,o,ffn.0,ffn.2",
|
| 15 |
+
"init_lora_weights": "kaiming",
|
| 16 |
+
"lora_rank": 128,
|
| 17 |
+
"lora_alpha": 64.0
|
| 18 |
+
}
|
| 19 |
+
|