alibaba-pai
/

FLUX.2-dev-Fun-Controlnet-Union

VideoX Fun

Model card Files Files and versions

xet

Community

bubbliiiing commited on Dec 11, 2025

Commit

5f2e9bf

verified ·

1 Parent(s): ff5b6b1

Update README.md

Browse files

Files changed (1) hide show

README.md +115 -114

README.md CHANGED Viewed

@@ -1,115 +1,116 @@
----
-license: other
-license_name: flux-dev-non-commercial-license
-license_link: https://huggingface.co/black-forest-labs/FLUX.2-dev/blob/main/LICENSE.txt
----
-# Flux.2-dev-Fun-Controlnet-Union
-[![Github](https://img.shields.io/badge/🎬%20Code-Github-blue)](https://github.com/aigc-apps/VideoX-Fun)
-# Model features
-- This ControlNet is added on 4 double blocks.
-- The model was trained from scratch for 10,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
-- It supports multiple control conditions—including Canny, HED, depth maps, pose estimation, and MLSD can be used like a standard ControlNet.
-- Inpainting mode is also supported.
-- You can adjust controlnet_conditioning_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for controlnet_conditioning_scale is from 0.65 to 0.80.
-- Although Flux.2‑dev supports certain image‑editing capabilities, its generation speed slows down when handling multiple images, and it sometimes produces similarity issues or fails to follow the control images. Compared with edit‑based methods, using ControlNet adheres more reliably to control instructions and makes it easier to apply multiple types of control.
-# TODO
-- [ ] Train more data and steps.
-# Results
-<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
-  <tr>
-    <td>Pose</td>
-    <td>Output</td>
-  </tr>
-  <tr>
-    <td><img src="asset/ref.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /></td>
-    <td><img src="results/inpaint.png" width="100%" /></td>
-  </tr>
-</table>
-<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
-  <tr>
-    <td>Pose</td>
-    <td>Output</td>
-  </tr>
-  <tr>
-    <td><img src="asset/pose.jpg" width="100%" /><img src="asset/ref.jpg" width="100%" /></td>
-    <td><img src="results/pose_ref.png" width="100%" /></td>
-  </tr>
-</table>
-<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
-  <tr>
-    <td>Pose</td>
-    <td>Output</td>
-  </tr>
-  <tr>
-    <td><img src="asset/pose.jpg" width="100%" /></td>
-    <td><img src="results/pose.png" width="100%" /></td>
-  </tr>
-</table>
-<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
-  <tr>
-    <td>Pose</td>
-    <td>Output</td>
-  </tr>
-  <tr>
-    <td><img src="asset/pose2.jpg" width="100%" /></td>
-    <td><img src="results/pose2.png" width="100%" /></td>
-  </tr>
-</table>
-<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
-  <tr>
-    <td>Canny</td>
-    <td>Output</td>
-  </tr>
-  <tr>
-    <td><img src="asset/canny.jpg" width="100%" /></td>
-    <td><img src="results/canny.png" width="100%" /></td>
-  </tr>
-</table>
-<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
-  <tr>
-    <td>Canny</td>
-    <td>Output</td>
-  </tr>
-  <tr>
-    <td><img src="asset/depth.jpg" width="100%" /></td>
-    <td><img src="results/depth.png" width="100%" /></td>
-  </tr>
-</table>
-# Inference
-Go to VideoX-Fun repository for more details.
-Please git clone VideoX-Fun and mkdirs.
-```sh
-# clone code
-git clone https://github.com/aigc-apps/VideoX-Fun.git
-# enter VideoX-Fun's dir
-cd VideoX-Fun
-# download weights
-mkdir models/Diffusion_Transformer
-mkdir models/Personalized_Model
-```
-Then download weights to models/Diffusion_Transformer and models/Personalized_Model.
-```
-📦 models/
-├── 📂 Diffusion_Transformer/
-│   └── 📂 FLUX.2-dev/
-├── 📂 Personalized_Model/
-│   └── "models/Personalized_Model/FLUX.2-dev-Fun-Controlnet-Union.safetensors"
-```
 Then run the file `examples/flux2_fun/predict_t2i_control.py`.

+---
+library_name: videox_fun
+license: other
+license_name: flux-dev-non-commercial-license
+license_link: https://huggingface.co/black-forest-labs/FLUX.2-dev/blob/main/LICENSE.txt
+---
+# Flux.2-dev-Fun-Controlnet-Union
+[![Github](https://img.shields.io/badge/🎬%20Code-Github-blue)](https://github.com/aigc-apps/VideoX-Fun)
+# Model features
+- This ControlNet is added on 4 double blocks.
+- The model was trained from scratch for 10,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
+- It supports multiple control conditions—including Canny, HED, depth maps, pose estimation, and MLSD can be used like a standard ControlNet.
+- Inpainting mode is also supported.
+- You can adjust controlnet_conditioning_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for controlnet_conditioning_scale is from 0.65 to 0.80.
+- Although Flux.2‑dev supports certain image‑editing capabilities, its generation speed slows down when handling multiple images, and it sometimes produces similarity issues or fails to follow the control images. Compared with edit‑based methods, using ControlNet adheres more reliably to control instructions and makes it easier to apply multiple types of control.
+# TODO
+- [ ] Train more data and steps.
+# Results
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+    <td>Pose</td>
+    <td>Output</td>
+  </tr>
+  <tr>
+    <td><img src="asset/ref.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /></td>
+    <td><img src="results/inpaint.png" width="100%" /></td>
+  </tr>
+</table>
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+    <td>Pose</td>
+    <td>Output</td>
+  </tr>
+  <tr>
+    <td><img src="asset/pose.jpg" width="100%" /><img src="asset/ref.jpg" width="100%" /></td>
+    <td><img src="results/pose_ref.png" width="100%" /></td>
+  </tr>
+</table>
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+    <td>Pose</td>
+    <td>Output</td>
+  </tr>
+  <tr>
+    <td><img src="asset/pose.jpg" width="100%" /></td>
+    <td><img src="results/pose.png" width="100%" /></td>
+  </tr>
+</table>
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+    <td>Pose</td>
+    <td>Output</td>
+  </tr>
+  <tr>
+    <td><img src="asset/pose2.jpg" width="100%" /></td>
+    <td><img src="results/pose2.png" width="100%" /></td>
+  </tr>
+</table>
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+    <td>Canny</td>
+    <td>Output</td>
+  </tr>
+  <tr>
+    <td><img src="asset/canny.jpg" width="100%" /></td>
+    <td><img src="results/canny.png" width="100%" /></td>
+  </tr>
+</table>
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+    <td>Canny</td>
+    <td>Output</td>
+  </tr>
+  <tr>
+    <td><img src="asset/depth.jpg" width="100%" /></td>
+    <td><img src="results/depth.png" width="100%" /></td>
+  </tr>
+</table>
+# Inference
+Go to VideoX-Fun repository for more details.
+Please git clone VideoX-Fun and mkdirs.
+```sh
+# clone code
+git clone https://github.com/aigc-apps/VideoX-Fun.git
+# enter VideoX-Fun's dir
+cd VideoX-Fun
+# download weights
+mkdir models/Diffusion_Transformer
+mkdir models/Personalized_Model
+```
+Then download weights to models/Diffusion_Transformer and models/Personalized_Model.
+```
+📦 models/
+├── 📂 Diffusion_Transformer/
+│   └── 📂 FLUX.2-dev/
+├── 📂 Personalized_Model/
+│   └── "models/Personalized_Model/FLUX.2-dev-Fun-Controlnet-Union.safetensors"
+```
 Then run the file `examples/flux2_fun/predict_t2i_control.py`.