VideoX Fun
bubbliiiing commited on
Commit
5f2e9bf
ยท
verified ยท
1 Parent(s): ff5b6b1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -114
README.md CHANGED
@@ -1,115 +1,116 @@
1
- ---
2
- license: other
3
- license_name: flux-dev-non-commercial-license
4
- license_link: https://huggingface.co/black-forest-labs/FLUX.2-dev/blob/main/LICENSE.txt
5
- ---
6
- # Flux.2-dev-Fun-Controlnet-Union
7
-
8
- [![Github](https://img.shields.io/badge/๐ŸŽฌ%20Code-Github-blue)](https://github.com/aigc-apps/VideoX-Fun)
9
-
10
- # Model features
11
- - This ControlNet is added on 4 double blocks.
12
- - The model was trained from scratch for 10,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
13
- - It supports multiple control conditionsโ€”including Canny, HED, depth maps, pose estimation, and MLSD can be used like a standard ControlNet.
14
- - Inpainting mode is also supported.
15
- - You can adjust controlnet_conditioning_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for controlnet_conditioning_scale is from 0.65 to 0.80.
16
- - Although Flux.2โ€‘dev supports certain imageโ€‘editing capabilities, its generation speed slows down when handling multiple images, and it sometimes produces similarity issues or fails to follow the control images. Compared with editโ€‘based methods, using ControlNet adheres more reliably to control instructions and makes it easier to apply multiple types of control.
17
-
18
- # TODO
19
- - [ ] Train more data and steps.
20
-
21
- # Results
22
-
23
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
24
- <tr>
25
- <td>Pose</td>
26
- <td>Output</td>
27
- </tr>
28
- <tr>
29
- <td><img src="asset/ref.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /></td>
30
- <td><img src="results/inpaint.png" width="100%" /></td>
31
- </tr>
32
- </table>
33
-
34
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
35
- <tr>
36
- <td>Pose</td>
37
- <td>Output</td>
38
- </tr>
39
- <tr>
40
- <td><img src="asset/pose.jpg" width="100%" /><img src="asset/ref.jpg" width="100%" /></td>
41
- <td><img src="results/pose_ref.png" width="100%" /></td>
42
- </tr>
43
- </table>
44
-
45
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
46
- <tr>
47
- <td>Pose</td>
48
- <td>Output</td>
49
- </tr>
50
- <tr>
51
- <td><img src="asset/pose.jpg" width="100%" /></td>
52
- <td><img src="results/pose.png" width="100%" /></td>
53
- </tr>
54
- </table>
55
-
56
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
57
- <tr>
58
- <td>Pose</td>
59
- <td>Output</td>
60
- </tr>
61
- <tr>
62
- <td><img src="asset/pose2.jpg" width="100%" /></td>
63
- <td><img src="results/pose2.png" width="100%" /></td>
64
- </tr>
65
- </table>
66
-
67
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
68
- <tr>
69
- <td>Canny</td>
70
- <td>Output</td>
71
- </tr>
72
- <tr>
73
- <td><img src="asset/canny.jpg" width="100%" /></td>
74
- <td><img src="results/canny.png" width="100%" /></td>
75
- </tr>
76
- </table>
77
-
78
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
79
- <tr>
80
- <td>Canny</td>
81
- <td>Output</td>
82
- </tr>
83
- <tr>
84
- <td><img src="asset/depth.jpg" width="100%" /></td>
85
- <td><img src="results/depth.png" width="100%" /></td>
86
- </tr>
87
- </table>
88
-
89
- # Inference
90
- Go to VideoX-Fun repository for more details.
91
-
92
- Please git clone VideoX-Fun and mkdirs.
93
- ```sh
94
- # clone code
95
- git clone https://github.com/aigc-apps/VideoX-Fun.git
96
-
97
- # enter VideoX-Fun's dir
98
- cd VideoX-Fun
99
-
100
- # download weights
101
- mkdir models/Diffusion_Transformer
102
- mkdir models/Personalized_Model
103
- ```
104
-
105
- Then download weights to models/Diffusion_Transformer and models/Personalized_Model.
106
-
107
- ```
108
- ๐Ÿ“ฆ models/
109
- โ”œโ”€โ”€ ๐Ÿ“‚ Diffusion_Transformer/
110
- โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ FLUX.2-dev/
111
- โ”œโ”€โ”€ ๐Ÿ“‚ Personalized_Model/
112
- โ”‚ โ””โ”€โ”€ "models/Personalized_Model/FLUX.2-dev-Fun-Controlnet-Union.safetensors"
113
- ```
114
-
 
115
  Then run the file `examples/flux2_fun/predict_t2i_control.py`.
 
1
+ ---
2
+ library_name: videox_fun
3
+ license: other
4
+ license_name: flux-dev-non-commercial-license
5
+ license_link: https://huggingface.co/black-forest-labs/FLUX.2-dev/blob/main/LICENSE.txt
6
+ ---
7
+ # Flux.2-dev-Fun-Controlnet-Union
8
+
9
+ [![Github](https://img.shields.io/badge/๐ŸŽฌ%20Code-Github-blue)](https://github.com/aigc-apps/VideoX-Fun)
10
+
11
+ # Model features
12
+ - This ControlNet is added on 4 double blocks.
13
+ - The model was trained from scratch for 10,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
14
+ - It supports multiple control conditionsโ€”including Canny, HED, depth maps, pose estimation, and MLSD can be used like a standard ControlNet.
15
+ - Inpainting mode is also supported.
16
+ - You can adjust controlnet_conditioning_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for controlnet_conditioning_scale is from 0.65 to 0.80.
17
+ - Although Flux.2โ€‘dev supports certain imageโ€‘editing capabilities, its generation speed slows down when handling multiple images, and it sometimes produces similarity issues or fails to follow the control images. Compared with editโ€‘based methods, using ControlNet adheres more reliably to control instructions and makes it easier to apply multiple types of control.
18
+
19
+ # TODO
20
+ - [ ] Train more data and steps.
21
+
22
+ # Results
23
+
24
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
25
+ <tr>
26
+ <td>Pose</td>
27
+ <td>Output</td>
28
+ </tr>
29
+ <tr>
30
+ <td><img src="asset/ref.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /></td>
31
+ <td><img src="results/inpaint.png" width="100%" /></td>
32
+ </tr>
33
+ </table>
34
+
35
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
36
+ <tr>
37
+ <td>Pose</td>
38
+ <td>Output</td>
39
+ </tr>
40
+ <tr>
41
+ <td><img src="asset/pose.jpg" width="100%" /><img src="asset/ref.jpg" width="100%" /></td>
42
+ <td><img src="results/pose_ref.png" width="100%" /></td>
43
+ </tr>
44
+ </table>
45
+
46
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
47
+ <tr>
48
+ <td>Pose</td>
49
+ <td>Output</td>
50
+ </tr>
51
+ <tr>
52
+ <td><img src="asset/pose.jpg" width="100%" /></td>
53
+ <td><img src="results/pose.png" width="100%" /></td>
54
+ </tr>
55
+ </table>
56
+
57
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
58
+ <tr>
59
+ <td>Pose</td>
60
+ <td>Output</td>
61
+ </tr>
62
+ <tr>
63
+ <td><img src="asset/pose2.jpg" width="100%" /></td>
64
+ <td><img src="results/pose2.png" width="100%" /></td>
65
+ </tr>
66
+ </table>
67
+
68
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
69
+ <tr>
70
+ <td>Canny</td>
71
+ <td>Output</td>
72
+ </tr>
73
+ <tr>
74
+ <td><img src="asset/canny.jpg" width="100%" /></td>
75
+ <td><img src="results/canny.png" width="100%" /></td>
76
+ </tr>
77
+ </table>
78
+
79
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
80
+ <tr>
81
+ <td>Canny</td>
82
+ <td>Output</td>
83
+ </tr>
84
+ <tr>
85
+ <td><img src="asset/depth.jpg" width="100%" /></td>
86
+ <td><img src="results/depth.png" width="100%" /></td>
87
+ </tr>
88
+ </table>
89
+
90
+ # Inference
91
+ Go to VideoX-Fun repository for more details.
92
+
93
+ Please git clone VideoX-Fun and mkdirs.
94
+ ```sh
95
+ # clone code
96
+ git clone https://github.com/aigc-apps/VideoX-Fun.git
97
+
98
+ # enter VideoX-Fun's dir
99
+ cd VideoX-Fun
100
+
101
+ # download weights
102
+ mkdir models/Diffusion_Transformer
103
+ mkdir models/Personalized_Model
104
+ ```
105
+
106
+ Then download weights to models/Diffusion_Transformer and models/Personalized_Model.
107
+
108
+ ```
109
+ ๐Ÿ“ฆ models/
110
+ โ”œโ”€โ”€ ๐Ÿ“‚ Diffusion_Transformer/
111
+ โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ FLUX.2-dev/
112
+ โ”œโ”€โ”€ ๐Ÿ“‚ Personalized_Model/
113
+ โ”‚ โ””โ”€โ”€ "models/Personalized_Model/FLUX.2-dev-Fun-Controlnet-Union.safetensors"
114
+ ```
115
+
116
  Then run the file `examples/flux2_fun/predict_t2i_control.py`.