Nvidia driver update - reactor node

I updated my Nvidia drivers today prompted by comfyui not starting and showing a message to update to the latest drivers.

Since then, Comfyui is breaking more often than normal.. the ‘browser refresh’ message appears during a video generation about 3 out of 10 times. Also, although it was never great, reactor has simply stopped working completely even after a re-install through the Manager. when it did work almost every vid was hitting the NSFW block even though the content was nothing more than an adult kiss so I gave up with it anyway

I tried to install the NSFW version of reactor but that failed miserably, something to do with cuda, light sowething, dlls and wheels.. far too complex to understand and chatgpt had no clue… almost ruined my entire PC build with it’s hopeless ability to be used as a diag tool

Anyway, rant over, is anyone else experiencing these issues and/or knows a path to fix ? Many thanks in advance

Hmm, it looks like ReActor can be quite fragile around ComfyUI updates:


I do not think this is simply “the NVIDIA driver update broke ReActor.”

My guess is that your environment is now in a mixed state:

ComfyUI was updated or moved to a newer Python / PyTorch / CUDA stack
↓
The new ComfyUI stack required a newer NVIDIA driver
↓
The driver update made the base ComfyUI stack partially usable again
↓
But ReActor still depends on ONNX Runtime GPU / CUDA DLLs / cuDNN / Python wheels
↓
Those ReActor-side dependencies no longer match cleanly
↓
Trying an NSFW fork or another ReActor variant probably made the Python environment more mixed

So the problem is probably not one single thing. It is more likely a combination of:

  • ComfyUI base stack change
  • NVIDIA driver requirement
  • PyTorch CUDA version
  • ONNX Runtime GPU provider
  • CUDA / cuDNN DLL loading
  • ReActor install script / Python environment
  • possibly a fork or NSFW variant modifying dependencies
  • video workflow instability on top of that

That is why reinstalling only the NVIDIA driver, or only reinstalling ReActor from Manager, may not fix it.

The important distinction

There are two different GPU stacks involved here:

ComfyUI / PyTorch stack:
  used for most image generation

ReActor / ONNX Runtime stack:
  used for face analysis / face swap / face restore paths

These can break independently.

So this can happen:

torch.cuda.is_available() == True
ComfyUI can generate images
but ReActor still fails
because ONNX Runtime CUDAExecutionProvider is broken

That is a common source of confusion.

Why the driver update may have been necessary

The current ComfyUI Windows portable documentation says the portable build includes Python 3.13 and PyTorch CUDA 13.0, and it tells users to update NVIDIA drivers if ComfyUI does not start:

For CUDA 13.x, NVIDIA’s compatibility table requires a 580+ NVIDIA driver:

So updating the driver was probably reasonable.

But the driver is only one layer. ReActor can still fail if ONNX Runtime GPU, cuDNN, CUDA DLLs, or Python wheels are mismatched.

Why ReActor specifically can break after updates

ReActor is not just a simple ComfyUI node. It pulls in a separate face-processing stack.

Relevant docs and reports:

ONNX Runtime’s CUDA provider documentation is especially important because it explains that CUDA and cuDNN major versions must match the ONNX Runtime build. In particular, cuDNN 8.x and cuDNN 9.x builds are not interchangeable.

That means ReActor can fail even if the rest of ComfyUI looks okay.

There are also similar ReActor failure reports:

These are not all exactly the same as your case, but they show the same general pattern: ReActor can break when ComfyUI, Python, PyTorch, ONNX Runtime, CUDA, cuDNN, and custom-node updates move together.

I would not keep repairing the current folder in-place

Because you already tried several things, the current environment may be hard to reason about.

It may now contain some combination of:

official ReActor
old ReActor files
Manager-installed files
fork / NSFW variant files
onnxruntime
onnxruntime-gpu
changed numpy / opencv / protobuf / insightface packages
possibly packages installed into system Python
possibly packages installed into ComfyUI embedded Python

At that point, it is very hard to know which layer is broken.

So I would not keep pressing random fixes in the current folder.

I would do this instead:

1. Preserve the broken folder as a backup
2. Create a clean updated ComfyUI portable install next to it
3. Confirm the clean ComfyUI base works
4. Confirm PyTorch CUDA works
5. Move models only
6. Install official ReActor only
7. Verify ONNX Runtime providers
8. Test ReActor on one still image
9. Only then move old workflows and video nodes back

This is usually faster than trying to surgically repair a mixed Python environment.


Recommended recovery path

Step 0: Stop changing the broken install

Rename the current folder, but do not delete it:

ComfyUI_windows_portable
->
ComfyUI_windows_portable_broken_backup

Keep it because it may contain:

models/
input/
output/
user/
workflows/
ReActor face models
logs
custom-node list

But do not copy these wholesale into the new install:

python_embeded/
custom_nodes/

Especially do not copy python_embeded. If that environment is already mixed, copying it transfers the problem.

ComfyUI Portable uses its own embedded Python environment:

So the Python environment matters a lot.

Step 1: Save a diagnostic snapshot of the broken install

From the broken folder, run:

cd C:\path\to\ComfyUI_windows_portable_broken_backup

nvidia-smi > recovery-baseline.txt

.\python_embeded\python.exe -c "import sys; print(sys.version)" >> recovery-baseline.txt

.\python_embeded\python.exe -c "import torch; print('torch=', torch.__version__); print('torch cuda=', torch.version.cuda); print('cuda available=', torch.cuda.is_available()); print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'no gpu')" >> recovery-baseline.txt

.\python_embeded\python.exe -c "import onnxruntime as ort; print('ort=', ort.__version__); print('providers=', ort.get_available_providers())" >> recovery-baseline.txt

.\python_embeded\python.exe -m pip freeze > pip-freeze-broken.txt

If some of these fail, that is okay. The failure itself is useful information.

Step 2: Verify the NVIDIA driver once

Run:

nvidia-smi

Check the driver version.

If your ComfyUI portable build is the new CUDA 13.0 one, the driver should be 580+ according to NVIDIA’s CUDA compatibility table:

Do not keep reinstalling drivers repeatedly. Verify it once, then move on.

Also note: the CUDA Version shown by nvidia-smi is the maximum CUDA API version supported by the driver. It is not necessarily the same as the CUDA version used by PyTorch inside ComfyUI. For that, use torch.version.cuda.

Step 3: Create a fresh ComfyUI portable install

Create a clean folder, for example:

C:\AI\ComfyUI_clean\

At this stage, do not install:

ReActor
ReActor NSFW fork
video custom nodes
old custom_nodes
old python_embeded

Start clean ComfyUI with custom nodes disabled:

cd C:\AI\ComfyUI_clean

.\python_embeded\python.exe -s ComfyUI\main.py --disable-all-custom-nodes --windows-standalone-build

ComfyUI documents this troubleshooting method here:

Then run a very basic built-in image workflow.

At this point you are only testing:

driver
ComfyUI base
embedded Python
PyTorch
basic image generation

You are not testing ReActor yet.

If this fails

If clean ComfyUI cannot generate one basic image, stop. ReActor is not the problem yet.

If this works

Then the base environment is probably good, and you can move to ReActor later.

Step 4: Verify PyTorch CUDA in the clean install

In the clean folder:

cd C:\AI\ComfyUI_clean

.\python_embeded\python.exe -c "import torch; print('torch=', torch.__version__); print('torch cuda=', torch.version.cuda); print('cuda available=', torch.cuda.is_available()); print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'no gpu')"

You want:

cuda available = True
GPU name printed

Reference:

If torch.cuda.is_available() is false, do not install ReActor yet. Fix the base GPU/PyTorch/driver problem first.

Step 5: Move models, not the entire old environment

Move or share only the model files.

Typical folders:

models/checkpoints/
models/vae/
models/clip/
models/clip_vision/
models/loras/
models/controlnet/
models/upscale_models/
models/diffusion_models/
models/text_encoders/
models/insightface/
models/reactor/

If the models are large, use extra_model_paths.yaml instead of copying everything:

The goal is:

new clean program environment
+
old model files

not:

new clean program environment
+
old broken Python packages

Step 6: Install only the official ReActor

Use the official repository:

Install it into the clean ComfyUI:

cd C:\AI\ComfyUI_clean\ComfyUI\custom_nodes

git clone https://github.com/Gourieff/ComfyUI-ReActor

cd ComfyUI-ReActor

install.bat

Important: the installer must use the ComfyUI portable embedded Python.

If you see a prompt like this:

I couldn't find an embedded version of Python,
but I did find Python <version> in your Windows PATH.
Would you like to proceed with the install using that version? (Y/N)

I would answer N and stop.

That is very close to this failure mode:

For portable ComfyUI, using system Python is usually the wrong direction. You want packages installed into the ComfyUI instance you actually run.

Step 7: Check ONNX Runtime after ReActor install

Before opening your old workflow, run:

cd C:\AI\ComfyUI_clean

.\python_embeded\python.exe -c "import onnxruntime as ort; print('ort=', ort.__version__); print('providers=', ort.get_available_providers())"

Ideal result:

['CUDAExecutionProvider', 'CPUExecutionProvider']

If you only see:

['CPUExecutionProvider']

then ReActor may still import, but ONNX Runtime GPU is not working.

Also check whether both CPU and GPU ONNX Runtime packages are present:

.\python_embeded\python.exe -m pip list | findstr /i "onnxruntime"

Be suspicious of a mixed state like:

onnxruntime
onnxruntime-gpu

The important documentation is here:

The key idea is:

PyTorch CUDA working does not automatically prove ONNX Runtime CUDA is working.

They are related, but not identical.

Step 8: Test ReActor with one still image

Do not test video first.

Do not test the old workflow first.

Do not turn on face restore / GPEN / Face Boost first.

Start with the smallest possible workflow:

Load Image
Load Image
ReActorFaceSwap
Save Image

Start with:

Face Restore: off
Face Boost: off
Video: off
Batch size: tiny

Then test in this order:

1. still image face swap
2. still image face swap + face restore
3. small image batch
4. short low-resolution video
5. normal video
6. face restore / boost / upscale in video

This order matters because some reports involve the restore/boost paths and CUDA DLL failures:

Step 9: Replace the ReActor node in old workflows

Do not simply open the old workflow and press Queue.

ReActor’s README says old workflow nodes may need to be deleted and re-added after updates, because node inputs or definitions can change:

So the safer process is:

open old workflow
delete old ReActor node
add new ReActor node from the current install
reconnect inputs
save as a new workflow
test on one image

This avoids confusing an old serialized node definition with a CUDA or ONNX problem.

Step 10: Bring video custom nodes back last

The “browser refresh” / reconnect symptom during video generation may be a separate issue from ReActor.

ComfyUI uses WebSocket communication for execution progress, node status, error/debug information, and queue updates:

If the backend process crashes, hangs, runs out of VRAM/RAM, or a custom video node throws an exception, the browser may appear to refresh or reconnect.

So test video in this order:

1. video workflow without ReActor
2. short low-resolution video without ReActor
3. short low-resolution video with ReActor
4. normal video with ReActor
5. face restore / boost / upscale only after that

If video fails without ReActor, ReActor is not the remaining main problem.


What I would avoid

Until the clean environment works, I would avoid:

ComfyUI Manager -> Update All repeatedly
installing several ReActor variants
mixing official ReActor with NSFW forks
copying old custom_nodes wholesale
copying old python_embeded
using system Python pip
running random pip install -U commands
testing directly with a large video workflow
enabling Face Boost / GPEN / restore in the first test

Especially avoid commands like this unless you know exactly why:

pip install -U onnxruntime-gpu numpy opencv-python torch

If you must use pip in ComfyUI portable, use the embedded Python for the specific ComfyUI install you are running:

C:\AI\ComfyUI_clean\python_embeded\python.exe -m pip <package>

not just:

pip <package>

ComfyUI custom-node dependency docs:

Success checklist

I would consider the updated environment repaired only after these are true:

nvidia-smi works
driver version is suitable for the ComfyUI CUDA stack
fresh ComfyUI portable starts
basic image generation works
torch.cuda.is_available() is True
ReActor imports without error
onnxruntime.get_available_providers() includes CUDAExecutionProvider
minimal still-image ReActor workflow works
old workflow works after replacing the ReActor node
short video workflow works without ReActor
short video workflow works with ReActor
full video workflow works

Short version

I would not try to keep repairing the current folder in-place.

I would:

1. rename the broken install and keep it as backup
2. install a fresh current ComfyUI portable
3. confirm basic ComfyUI + PyTorch CUDA first
4. move models only
5. install only official ReActor
6. verify ONNX Runtime providers
7. test one still image
8. replace old ReActor nodes in workflows
9. bring video workflows back last

That gives you the best chance of ending up with a properly updated environment instead of a half-rolled-back, half-updated one.

wow… very comprehensive… I take the point about a mixed up environment… makes sense

I did exactly as suggested, new install, all the tests and cmd commands appeared to work correctly with no errors and the basic reactor faceswap on a simple image also worked! However, I noticed the is the command window when using reactor
“2026-05-26 10:52:37.6787763 [E:onnxruntime:Default, provider_bridge_ort.cc:2369 onnxruntime::TryGetProviderInfo_CUDA] E:_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1962 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error loading “D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll” which depends on “cublasLt64_12.dll” which is missing. (Error 126: “The specified module could not be found.”)”

absolutely no idea how to fix this and chatgpt is hopeless at this stuff

Yeah, ONNX Runtime breaks quite easily after environment changes… switching it to CPU temporarily may be a good move. Maybe something like this?


That error is very informative. It means the clean reinstall probably helped, but ONNX Runtime’s CUDA provider is still not loading correctly.

This line is the key:

Error loading:
D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll

which depends on:
cublasLt64_12.dll

which is missing.

So I would read the current state like this:

Clean ComfyUI install: probably OK
Basic PyTorch CUDA test: probably OK, if your earlier torch command passed
Basic ReActor still-image swap: works
ONNX Runtime CUDAExecutionProvider: broken or drifting
ONNX Runtime CPUExecutionProvider: probably being used as fallback

So this is not the same problem as before. You made progress.

Before, the whole environment was probably mixed. Now the problem looks narrower:

ReActor can run, but ONNX Runtime GPU acceleration cannot load the CUDA 12 DLLs it expects.

That missing cublasLt64_12.dll is a CUDA 12 cuBLAS runtime DLL. ONNX Runtime’s CUDA provider is trying to load it, and Windows cannot find it.

Similar ReActor reports:

Relevant ONNX Runtime docs:

Important caveat: this may be real GPU/runtime drift

I would not assume this is only a harmless warning.

It might be harmless if ReActor falls back to CPU and the output looks correct. But it might also mean the GPU execution layer genuinely drifted.

In other words, this could be real drift in the execution substrate:

GPU model / architecture
NVIDIA driver
PyTorch CUDA build
onnxruntime-gpu version
CUDA runtime DLLs
cuDNN runtime DLLs
Windows DLL search path
ComfyUI embedded Python environment

That matters because ONNX Runtime GPU support is sensitive to the exact combination of CUDA, cuDNN, Python, PyTorch, and GPU architecture.

So CPU mode is not “the final answer.” It is a stabilization and isolation step:

First: make ReActor work reliably on CPU.
Then: treat ONNX Runtime GPU as a separate optimization / migration problem.

This keeps you from breaking the clean install again while trying to fix GPU acceleration.

Why the face swap still worked

Because ONNX Runtime can often fall back to CPU.

So this can happen:

ONNX Runtime tries CUDAExecutionProvider
CUDA provider fails because a CUDA DLL is missing
ONNX Runtime continues with CPUExecutionProvider
ReActor still produces an image
console prints scary CUDA errors

That would explain:

simple ReActor image test works
but console shows cublasLt64_12.dll errors

The main downside may be speed.

But if later video or face-restore paths become unstable, then the broken CUDA provider may matter more.

First: capture the exact GPU/runtime state

Before changing packages again, I would record this:

cd D:\ComfyUI_windows_portable

nvidia-smi

.\python_embeded\python.exe -c "import sys; print('python=', sys.version)"

.\python_embeded\python.exe -c "import torch; print('torch=', torch.__version__); print('torch cuda=', torch.version.cuda); print('cuda available=', torch.cuda.is_available()); print('gpu=', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'no gpu'); print('capability=', torch.cuda.get_device_capability(0) if torch.cuda.is_available() else 'no gpu')"

.\python_embeded\python.exe -c "import onnxruntime as ort; print('onnxruntime=', ort.__version__); print('providers=', ort.get_available_providers())"

.\python_embeded\python.exe -m pip list | findstr /i "onnxruntime torch nvidia cuda cudnn"

The GPU name and compute capability matter. For example, if someone is on a very new GPU architecture, old CUDA wheels or runtime packages can genuinely be wrong even if the code did not change.

So I would save this output somewhere before fixing anything.

Interpret the provider result

Run:

cd D:\ComfyUI_windows_portable

.\python_embeded\python.exe -c "import onnxruntime as ort; print('onnxruntime=', ort.__version__); print('providers=', ort.get_available_providers())"

Case A

providers= ['CUDAExecutionProvider', 'CPUExecutionProvider']

This means ONNX Runtime package-level CUDA support is present, but loading the actual CUDA DLLs may still fail at runtime.

Case B

providers= ['CPUExecutionProvider']

or:

providers= ['AzureExecutionProvider', 'CPUExecutionProvider']

This means ONNX Runtime is effectively CPU-only.

Case C

Error loading onnxruntime_providers_cuda.dll
missing cublasLt64_12.dll

This means onnxruntime-gpu is installed, but the CUDA 12 runtime DLL it expects is not findable.

Check for mixed ONNX Runtime packages

This is worth checking:

.\python_embeded\python.exe -m pip list | findstr /i "onnxruntime"

A confusing state would be:

onnxruntime
onnxruntime-gpu

Ideally, for a clean CPU-only temporary state, you want only:

onnxruntime

For a clean GPU state, you want only:

onnxruntime-gpu

Mixed CPU/GPU ONNX packages can make debugging harder.

Safest short-term workaround: make ONNX Runtime CPU-only

Because the simple ReActor swap already works, I would probably stabilize first.

The idea is:

Remove broken ONNX Runtime GPU provider
Use CPU ONNX Runtime
Verify ReActor still works
Verify the cublasLt64_12.dll warning disappears
Then decide whether GPU acceleration is worth fixing

Commands:

cd D:\ComfyUI_windows_portable

.\python_embeded\python.exe -m pip uninstall -y onnxruntime-gpu

.\python_embeded\python.exe -m pip install --upgrade --force-reinstall onnxruntime

Then verify:

.\python_embeded\python.exe -c "import onnxruntime as ort; print('onnxruntime=', ort.__version__); print('providers=', ort.get_available_providers())"

A good temporary result is:

providers= ['CPUExecutionProvider']

or sometimes:

providers= ['AzureExecutionProvider', 'CPUExecutionProvider']

Then restart ComfyUI and test the same simple ReActor face swap again.

If the warning disappears and the image still works, then you have a stable baseline:

ComfyUI works
ReActor works
ONNX Runtime uses CPU
no missing cublasLt64_12.dll warning

That is a good recovery state.

Not necessarily the final fastest state, but a good stable state.

Why CPU-only may be the right next move

Because you have already proven something important:

fresh install worked
basic commands worked
basic ReActor image swap worked

So the remaining issue is narrower:

ONNX Runtime GPU acceleration is not loading

That should be treated separately.

If you try to fix ONNX GPU immediately, you may end up changing:

onnxruntime-gpu
CUDA runtime packages
cuDNN packages
PyTorch
Windows PATH
NVIDIA driver
ReActor dependencies

all at once again.

That is exactly how the original mixed environment probably happened.

So I would first get a clean CPU baseline, then handle GPU acceleration as a second phase.

If you later want to fix ONNX Runtime GPU

Only do this after CPU ReActor is stable.

The goal would be:

Make onnxruntime-gpu able to find the CUDA 12 / cuDNN DLLs it expects.

There are several possible paths.

Option A: Try ONNX Runtime GPU with CUDA/cuDNN extras

ONNX Runtime documents installation forms that can bring CUDA/cuDNN runtime dependencies into the Python environment.

In ComfyUI portable, use embedded Python:

cd D:\ComfyUI_windows_portable

.\python_embeded\python.exe -m pip uninstall -y onnxruntime onnxruntime-gpu

.\python_embeded\python.exe -m pip install --upgrade "onnxruntime-gpu[cuda,cudnn]"

Then test:

.\python_embeded\python.exe -c "import onnxruntime as ort; ort.preload_dlls(); print('onnxruntime=', ort.__version__); print('providers=', ort.get_available_providers()); ort.print_debug_info()"

Relevant docs:

But I would treat this as a second-phase experiment, not the immediate recovery step.

Option B: Preload PyTorch / NVIDIA DLLs before ONNX Runtime

ONNX Runtime documents that preloading DLLs can help it use CUDA/cuDNN DLLs from PyTorch or NVIDIA Python packages.

The basic idea is:

import torch
import onnxruntime

or:

import onnxruntime
onnxruntime.preload_dlls()

But whether this helps inside ComfyUI depends on when ReActor creates its ONNX Runtime sessions. If the custom node creates sessions before the preload happens, the preload test may work in a standalone command but not fix the node.

So this is useful for diagnosis, but not always a clean user-level fix.

Option C: Install CUDA runtime globally

You could install CUDA 12 runtime/toolkit so Windows can find cublasLt64_12.dll.

But I would not start there.

Global CUDA and PATH edits can make things harder to reason about. For ComfyUI portable, I would prefer keeping dependencies inside the portable Python environment where possible.

Do not confuse these two questions

There are now two separate questions:

Question 1:
Can ReActor work reliably at all?

Question 2:
Can ReActor use ONNX Runtime GPU acceleration?

You have probably answered Question 1: yes, at least for a simple still image.

Question 2 is still unresolved.

That means the practical recovery route is:

First stabilize ReActor CPU.
Then decide whether ONNX GPU is worth fixing.

What I would do next

My exact next steps would be:

1. Save the current command outputs
2. Confirm the simple ReActor still-image test works
3. Check ONNX Runtime providers
4. Remove onnxruntime-gpu
5. Install CPU onnxruntime
6. Restart ComfyUI
7. Confirm the cublasLt64_12.dll warning is gone
8. Test simple ReActor still image again
9. Test your real still-image workflow
10. Test short video
11. Only later try ONNX Runtime GPU again

Commands:

cd D:\ComfyUI_windows_portable

.\python_embeded\python.exe -m pip list | findstr /i "onnxruntime"

.\python_embeded\python.exe -m pip uninstall -y onnxruntime-gpu

.\python_embeded\python.exe -m pip install --upgrade --force-reinstall onnxruntime

.\python_embeded\python.exe -c "import onnxruntime as ort; print('onnxruntime=', ort.__version__); print('providers=', ort.get_available_providers())"

Then restart ComfyUI.

How to interpret the result

Good temporary result

providers= ['CPUExecutionProvider']
ReActor still works
no cublasLt64_12.dll warning

Meaning:

You now have a stable CPU ReActor setup.
It may be slower, but it is clean.

Still noisy

cublasLt64_12.dll warning still appears

Then onnxruntime-gpu is probably still installed somewhere in the ComfyUI embedded Python environment, or another package is still importing the GPU provider.

Check again:

.\python_embeded\python.exe -m pip list | findstr /i "onnxruntime"

ReActor stops working after CPU-only install

Then reinstall official ReActor using its installer:

cd D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-ReActor

install.bat

But make sure it uses:

D:\ComfyUI_windows_portable\python_embeded\python.exe

not system Python.

Short version

Your new install probably worked.

The remaining error is probably:

ONNX Runtime GPU provider is broken or drifting
but ReActor can still run through CPU provider

So I would not reinstall everything again.

I would first stabilize ONNX Runtime as CPU-only:

cd D:\ComfyUI_windows_portable

.\python_embeded\python.exe -m pip uninstall -y onnxruntime-gpu

.\python_embeded\python.exe -m pip install --upgrade --force-reinstall onnxruntime

Then test ReActor again.

If that works and the warning disappears, you have a usable recovery state.

After that, GPU acceleration can be treated as a separate ONNX Runtime / CUDA / cuDNN / GPU-architecture migration problem, not as the main ReActor recovery problem.

Again… very comprehensive…

Here’s where we are now..

Basic reactor (single image to image) works

More advance (image to video (81 frames) - runs but always seem to hit the NSFW detected and produces a black square - even though there is no nsfw, literally someone just walking - other than a bare leg!
onnxruntime 1.26.0

There are no error or warning messages in the command screen (cublasLt64_12.dll warning still appears) no message about this

So I guess this is progress

I notice there is a NSFW reactor node - should I try it?

should I try it?

Yeah. I think trying that direction is probably faster. For ONNX, if the current speed feels too slow, you can sort out the GPU side later:


At this point I would probably stop treating the cublasLt64_12.dll warning as the main blocker.

It is still a real issue, but it now looks like a performance / ONNX CUDA provider issue, not the thing preventing ReActor from working.

The current situation looks more like this:

Clean ComfyUI install: working
Basic ReActor still-image face swap: working
Video workflow: runs
ONNX Runtime CUDA provider: still not healthy
ONNX Runtime CPU fallback: probably working
Current visible blocker: black output / SFW checker / NSFW detector

So if the current speed is acceptable, I would leave ONNX alone for now.

Do not keep chasing CUDA DLLs yet. That path can easily re-break the clean install.

Relevant ONNX docs, for later:

The cublasLt64_12.dll warning basically means:

onnxruntime-gpu tried to load CUDAExecutionProvider
↓
onnxruntime_providers_cuda.dll needed cublasLt64_12.dll
↓
Windows could not find that CUDA 12 DLL
↓
CUDAExecutionProvider failed
↓
ReActor likely continued through CPUExecutionProvider

That is not ideal, but if ReActor runs and speed is tolerable, I would postpone it.

The more important problem now is probably the SFW checker

The current GitHub ReActor build is intentionally SFW-oriented. The README says it has a nudity detector to avoid use with 18+ content:

There are also reports where ReActor outputs black images because the SFW checker is too sensitive:

So if your workflow is now running but the result becomes black squares, that sounds less like “ReActor is broken” and more like:

ReActor works
but the SFW / NSFW checker is blocking the output

If the checker is the real blocker, then trying to fix ONNX GPU first probably will not solve the black output.

I would not modify the now-working install directly

The current install is valuable because it works.

I would back it up before doing anything else.

For example:

ComfyUI_windows_portable
->
ComfyUI_windows_portable_sfw_working_backup

Then create a second copy for experiments:

ComfyUI_windows_portable_reactor_experiment

Use one as the stable backup, and one as the test install:

ComfyUI_windows_portable_sfw_working_backup
  = known working clean/SFW install

ComfyUI_windows_portable_reactor_experiment
  = experimental ReActor install

Do not test alternative ReActor builds inside the only working install.

That is how the environment can get mixed again.

Why I would separate the installs

ReActor-related installs can modify or depend on packages such as:

onnxruntime
onnxruntime-gpu
opencv
numpy
protobuf
insightface
transformers
timm
face restore packages

If multiple ReActor variants are installed together, it becomes hard to know which one is loading and which dependency changed.

So I would avoid this:

official ComfyUI-ReActor
+ old reactor node
+ forked reactor node
+ Manager-installed reactor
+ manual clone reactor

Instead, in the experiment copy, keep one ReActor variant at a time.

That is the main rule.

Share models instead of copying everything

If disk space is an issue, you do not have to duplicate all models.

ComfyUI supports extra_model_paths.yaml for sharing model folders between ComfyUI installs:

So the structure can be:

D:\AI\Models\
  checkpoints\
  vae\
  loras\
  controlnet\
  insightface\
  reactor\

D:\ComfyUI_windows_portable_sfw_working_backup\
D:\ComfyUI_windows_portable_reactor_experiment\

Then point both ComfyUI installs to the shared model directory.

This keeps the program environments separate while avoiding huge duplicate model folders.

Suggested practical path

I would do this:

1. Keep the current install as a backup.
2. Do not fix ONNX GPU right now.
3. Do not run Manager Update All.
4. Make a second ComfyUI copy for ReActor experiments.
5. In the experiment copy, remove/disable other ReActor variants.
6. Install/test only one ReActor variant at a time.
7. Test one still image first.
8. Test one extracted video frame.
9. Test a tiny 5–10 frame video.
10. Only then test the full 81-frame workflow.

The test order matters.

Do not start with the full video workflow.

Use this order:

still image
↓
one problematic extracted frame
↓
several extracted frames
↓
very short video
↓
full video

How to interpret the result

If the experimental ReActor build works on still images

Good. Then the base face swap path is healthy.

If it works on extracted frames

Then the SFW checker was probably the main blocker.

If extracted frames work but video fails

Then the issue is probably video pipeline integration, batch handling, frame order, or memory pressure.

If it fails even on a still image

Then the experimental install itself is not healthy yet.

Keep ONNX GPU as a later task

I would treat the ONNX warning like this:

Known issue:
  ONNX Runtime CUDAExecutionProvider cannot load cublasLt64_12.dll.

Current workaround:
  Let ReActor run through CPU fallback if speed is acceptable.

Later optimization:
  Fix onnxruntime-gpu / CUDA / cuDNN only if CPU speed is too slow.

Do not combine these two projects:

Project A:
  get a ReActor build that does not black out the output

Project B:
  restore ONNX Runtime GPU acceleration

Do Project A first.

Only do Project B later if needed.

One more caution

I would avoid manually patching random files inside the working install.

It is cleaner to preserve the working state and test a separate build/branch in a separate ComfyUI copy.

Also, obviously, this should only be used for content where you have the rights and consent to use the faces/images involved. Face-swap tooling can easily cross legal and ethical lines, so I would keep the test environment private and controlled.

Short version

I would do this:

Back up the current working install.
Leave ONNX GPU alone for now.
Use CPU fallback if the speed is acceptable.
Create a separate ComfyUI experiment copy.
Test a different ReActor build/branch there, not in the stable install.
Keep only one ReActor variant in that copy.
Test still image → extracted frame → short video → full workflow.

If that solves the black output, then the SFW checker was probably the real blocker.

If it does not, then the next target is the video pipeline, not ONNX CUDA.

so its been a tough learning curve and I still have a very long way to go but, thanks to the advice in this post (and No Thanks to copilot ) I finally have a complete workflows and a method to create a video scene - woohooo. It’s not perfect but it’s good enough and I’ve thoroughly enjoyed the challenges from fabricated rings on hands, spots appearing from nowhere, facial inconsistency, clothing suddenly changing style .. well you all know the challenges)

Now for my next challenge and it’s one I wish I had combined with this project… audio

My main workflow is a native Wan2.2 - not even sure its a bit of a hybrid SDXL I think… not figured this bit out yet… anyway, there is no audio. The save video node audio icon is not active…
I’ve tried to add a node but I’m not sure that’s the answer as this looks like simply adding a prerecorded audio track, I just need to prompt speech through the video…

any help always appreciated although I have a horrible suspicion that my home made workflow may now have reached it’s limits so happy to create a new one (I still only have a 8GB VRAM setup)

many thanks in advance

Hmm… if you are trying to do audio too on 8GB VRAM, maybe something like this:


I would separate the “audio problem” into three different problems:

Layer What it means 8GB VRAM practicality
1. Attach audio to a video Add an existing .wav / .mp3 track to the generated video Very realistic
2. Generate speech Create the dialogue audio from text or a recording Realistic, especially if done outside the video workflow
3. Lip-sync / audio-driven motion Make the mouth, face, head, or body follow the audio Possible, but should be treated as a separate later workflow

So I would not try to solve all of this in one giant ComfyUI graph at first.

For 8GB VRAM, the practical order is probably:

1. Generate or record the voice separately
2. Generate the silent video with your current working workflow
3. Mux the audio into the video
4. If the mismatch is too obvious, try a simple lip-sync post-process
5. Only then look at heavier audio-driven video systems

The most important point is this:

An audio input on a video node usually means “I can attach an existing audio stream,” not “I can generate speech from the prompt.”

So if the save/combine video node has an audio socket or audio icon, that does not necessarily mean Wan/ReActor is generating audio. It usually means you can pass in an existing audio object and have it combined into the final file.

Recommended path for 8GB VRAM

I would start with the boring, common, well-documented path:

TTS or recorded voice
↓
silent generated video
↓
mux audio into the video
↓
optional Wav2Lip / LatentSync pass if lip-sync is needed

This is less magical than a full audio-driven video model, but it is much more realistic on 8GB VRAM.

Why I would not start with Wan2.2-S2V locally

Wan2.2-S2V is closer to the ideal solution: image/video + audio → speech-driven video.

But I would not start there on 8GB VRAM.

Wan2.2-S2V-14B exists and is the more “native” speech-to-video direction:

However, the official model card / README examples are much heavier than an 8GB local setup. The S2V route is more like:

high-VRAM GPU / cloud / hosted workflow

not:

easy local 8GB ComfyUI workflow

So I would treat Wan2.2-S2V as the “ideal future path,” not the first recovery path.

Step 1: audio attachment / muxing

For simply putting audio into a generated video, the common ComfyUI path is usually something like:

LoadAudio
+
Video Combine

Useful links:

VideoHelperSuite’s Video Combine node is useful because it combines image frames into a video, and if an optional audio input is provided, it can combine that audio into the output video.

So the first test should be very simple:

short generated silent video
+
short audio file
↓
Video Combine
↓
video with audio

Do not start with a full long clip. Start with 5–10 seconds.

Good first test settings

Duration: 5–10 seconds
One speaker
One face
Front-facing if possible
No scene cuts
No camera chaos
Audio length roughly equals video length

This first step answers only one question:

Can I attach audio to the video at all?

It does not solve lip-sync yet.

Step 2: generate the speech separately

For speech, I would initially use a separate TTS or recorded voice path.

Possible options:

Option Why use it Caveat
Recorded voice Simplest and most predictable Requires recording
External TTS Often easiest May require API/account
ComfyUI TTS node Keeps more inside ComfyUI Adds more dependencies
Voice cloning TTS Better character voice control More setup and ethical/legal care

ComfyUI TTS/audio nodes exist, but I would keep them separate from the main video workflow at first.

Some useful entry points:

For the first working version, I would not care too much where the voice comes from. The important thing is to get a clean .wav or .mp3 that you can attach to the video.

Step 3: if lip-sync is needed, start with common tools

You probably will eventually want lip-sync. But I would not start with the newest full audio-driven video system.

For beginner-friendly debugging, I would try the older/common lip-sync route first:

generated video
+
speech audio
↓
Wav2Lip or LatentSync
↓
lip-synced video

This is a post-processing step. It is different from asking Wan to generate the whole video from the audio.

Beginner-friendly first lip-sync option: Wav2Lip

Wav2Lip is older, but that is actually a benefit for debugging. There are many examples, tutorials, and failure reports around it.

Useful links:

Why Wav2Lip first?

older
common
more tutorials
more known failure cases
simpler mental model

The mental model is straightforward:

input video + input audio → output video with adjusted mouth movement

It may not be the best quality, but it is often a good first proof of concept.

Expected Wav2Lip problems

Wav2Lip can struggle with:

small faces
side views
covered mouths
fast head movement
multiple faces
low-resolution faces
strong stylization
large camera motion
long clips

So the first test should be intentionally easy:

one person
face visible
mouth visible
short clip
audio length close to video length

Better-quality next option: LatentSync

If Wav2Lip works but the quality is not good enough, I would try LatentSync next.

Useful links:

LatentSync is newer and likely to give better results in some cases, but it also has more moving parts.

The main practical issues to expect are:

video length vs audio length mismatch
fps mismatch
audio sample-rate expectations
face detection failure
small/side faces
long clip instability
VRAM pressure
dependency issues

A very common beginner mistake is trying:

5-second video + 30-second audio

and expecting a full 30-second lip-synced output. Tools often behave according to the video length, the audio length, or internal chunking assumptions. So keep the first test very short and matched:

5-second video
+
5-second audio

What I would not do first

I would not start with these on 8GB VRAM:

Tool / direction Why not first
Wan2.2-S2V-14B Much closer to ideal, but too heavy for 8GB local first attempt
InfiniteTalk More powerful audio-driven video/dubbing direction, but more complex
FantasyTalking / WanVideo adapter workflows Potentially strong, but heavier and more fragile
HunyuanVideo-Avatar High-end audio-driven human animation; not a simple 8GB beginner route
Long multi-scene lip-sync Too many failure points at once

These are worth knowing about, but I would keep them as later options.

Useful links for later exploration:

InfiniteTalk is interesting because it does not only try to modify the lips. It aims to align lip sync, head movement, body posture, and facial expression from an input video and audio track. That is more ambitious than Wav2Lip-style mouth replacement. But that also means it is not where I would start on a small local setup.

Recommended practical workflow

I would use this staged approach.

Phase 1 — prove audio muxing

Goal:

Can I attach audio to my generated video?

Test:

1. Generate a 5-second silent video
2. Create or record a 5-second audio file
3. Load the audio
4. Combine video + audio
5. Export

Use:

Do not care about lip-sync yet.

Phase 2 — create better speech

Goal:

Can I make the voice track I actually want?

Options:

recorded voice
external TTS
ComfyUI TTS node
voice-cloning TTS

Output:

clean wav/mp3
same approximate duration as the video

Phase 3 — basic lip-sync attempt

Goal:

Can I make the mouth roughly match the audio?

First try:

Wav2Lip

Then, if needed:

LatentSync

Test conditions:

5–10 seconds
single person
front-facing
no cuts
mouth visible
audio length ~= video length

Phase 4 — scale up carefully

Only after the short test works:

longer clip
higher resolution
more motion
more camera movement
more stylized faces
multiple scenes

If it breaks, go back to a shorter clip.

Phase 5 — advanced audio-driven video

Only later consider:

InfiniteTalk
Wan2.2-S2V
HunyuanVideo-Avatar
FantasyTalking
MMAudio for sound effects/background audio

This is where you explore more modern full audio-driven motion systems, but it is not the first 8GB route.

Suggested decision table

Goal First thing to try If not enough Avoid at first
Just add sound VideoHelperSuite Video Combine ffmpeg/editor mux S2V
Generate dialogue external TTS or simple ComfyUI TTS better TTS / voice cloning full audio-driven video
Basic lip-sync Wav2Lip LatentSync InfiniteTalk first
Better lip-sync quality LatentSync short-clip advanced tools long full-scene test
Body/head/audio-driven performance InfiniteTalk-like workflows cloud/high-VRAM workflows 8GB local full setup
Sound effects/background audio MMAudio-like V2A tools manual SFX editing treating it as dialogue TTS

Practical advice for 8GB VRAM

For 8GB VRAM, I would think in terms of short, separate stages:

video generation
audio generation
audio muxing
lip-sync pass
final edit

not one huge all-in-one workflow.

A good first target:

5-second talking clip
one face
one voice
audio attached
rough lip-sync

A bad first target:

81-frame or longer full workflow
multiple cuts
stylized face
moving camera
ReActor
Wan video
TTS
lip-sync
audio mux
all in one graph

The second version has too many things that can fail at once.

The simple recommendation

If I had to pick a practical beginner route, I would do:

1. Keep your current working video workflow.
2. Generate or record the voice separately.
3. Use VideoHelperSuite / Video Combine to attach the audio.
4. If lip-sync is necessary, try Wav2Lip on a 5–10 second clip.
5. If Wav2Lip is too low-quality, try LatentSync.
6. Treat Wan2.2-S2V / InfiniteTalk / HunyuanVideo-Avatar as later high-end options.

That path is not the fanciest, but it is probably the most debuggable.

And with 8GB VRAM, “debuggable” matters more than “most advanced.”

Still figuring out how to prevent my wan2.1 w/f from inventing things into a clip that mess up the integrity of the scene

for example, when hands disappear for what ever reason out of shot, then, in a later shot they back in the scene but covered with rings and hideous nail colours… problem is, earlier clips show no rings or nail colour so this looks off
Tried all sorts of prompts like ((nude nails : 1.4)) and negative prompts to avoid like nail colour, rings on fingers, bracelets etc but nothing seem to work… I even tried taking the last generated frame with the accessories and removed them in I2I.. then reran the I2V using 1st frame last frame.. very time consuming and then new rings and nail colours appear in the next clip

This is a bit of a show stopper really… cant have this inconsistency - its tough enough maintaining facial consistency… just get close enough then a ring appears in the scene… really frustrating… sometime taking me an hour to just produce a 6 second clip with consistency

Any advice is always welcome

Hmm, this is not really a Wan-only thing, but there is a pretty hard ceiling on what prompt changes alone can fix in the model output:


I think this is probably not just a prompt problem.

What you are describing sounds like a mix of:

image-to-video continuity drift
+
occlusion / out-of-frame re-entry
+
small-detail resampling
+
clip-to-clip artifact accumulation

In simpler terms:

hand is visible with no rings / nude nails / no bracelet
↓
hand leaves the frame, becomes hidden, or crosses a clip boundary
↓
model no longer has a strong visual anchor for the exact hand details
↓
hand re-enters the frame
↓
model re-invents plausible "pretty hand" details
↓
rings, nail polish, bracelets, manicured nails, etc. appear

That is frustrating, but it is also a very normal kind of failure in AI video workflows.

I would not interpret it as:

The negative prompt is wrong.

I would interpret it more like:

The model is not being visually constrained strongly enough after the hand disappears.

So I would treat the hands, nails, rings, and bracelets as continuity props, not just prompt words.

If the hands matter to the scene, manage them the same way you would manage a face, outfit, tattoo, logo, or important object.

Direct answer

For this case, I would not try to solve it only with:

no rings
no nail polish
((nude nails:1.4))

That can help, but it is not a hard constraint.

A better practical stack is:

Layer What it does 8GB VRAM suitability How much I would rely on it
Positive continuity prompt Tells Wan what must remain stable Safe Medium
Negative prompt Reduces unwanted concepts Safe Low/medium
Shot design Avoids giving the model a chance to re-invent the hand Safe High
Short clips Reduces drift and error accumulation Safe High
Clean first frames Gives each new clip a corrected visual anchor Safe Very high
Still-image I2I / inpaint Fixes bad keyframes before continuing Usually realistic Very high
Video inpaint / VACE Repairs a region across frames Later/experimental on 8GB Medium, but not first
Fun Control / pose/depth/canny Controls motion/structure more than accessories Later/experimental Medium
Reference / LoRA / identity workflows Helps character/object consistency Later Case-dependent

The short version:

Prompt can bias the model.
Clean frames and shot planning constrain the model.

For your particular artifact, I would start with:

structured positive prompt
+
shorter shots
+
avoid hand leaving frame
+
clean keyframe before each new clip
+
still-image inpaint if the hand becomes wrong

Only after that would I try heavier video-editing workflows.

Why this is not only a Wan issue

This is a broader image-to-video problem.

Image-to-video generation starts from an image and tries to extend it into a temporally coherent video. But maintaining visual consistency across the subject, background, style, and details is still a known hard problem.

Useful background:

The exact research papers are not necessary for your workflow, but they are useful context: even research systems treat consistency as a central problem. So if Wan invents a ring after the hand disappears, that is not surprising. It is a practical version of the same general issue.

Why hands, nails, and jewelry are especially fragile

Some details are much harder to preserve than others.

Detail type Usually easier/harder Why
Character gender/body outline Easier Large global feature
Hair color Medium Visible but can drift
Clothing color Medium Large but texture can change
Face identity Hard Needs identity preservation
Hands/fingers Hard Complex anatomy, small shapes
Nails Very hard Tiny detail, easily stylized
Rings/bracelets Very hard Small accessories, often “beautified” by model
Logos/text Very hard Fine detail and exact symbols

The model can preserve:

same woman
same room
same general outfit
same cinematic look

while still changing:

nails
rings
bracelets
finger details
small clothing trim

That is why prompt changes may improve the probability but not guarantee continuity.

Wan-specific prompt advice

For Wan, I would use a more structured prompt rather than only short tokens.

A useful Wan prompting discussion:

The practical idea is to describe:

subject
scene
camera
motion
style
continuity constraints

not only a list of forbidden objects.

For example, instead of only:

no rings, nude nails

I would include a positive continuity block:

Her hands remain plain and bare throughout the entire shot.
Her fingernails are short, natural, nude, and unpainted.
She wears no rings, no bracelets, no bangles, no gemstones, and no hand jewelry.
When her hand moves or re-enters the frame, it remains the same plain bare hand as in the first frame.
The hand details remain visually consistent from the first frame to the last frame.

Then use a supporting negative prompt:

rings, ring, bracelet, bracelets, bangle, bangles, hand jewelry, gemstones, nail polish, painted nails, colorful nails, red nails, pink nails, black nails, long nails, fake nails, acrylic nails, decorated nails, manicure, inconsistent hands, changing jewelry

But I would still think of this as only Layer 1.

Prompt template to try

Positive prompt:

A cinematic realistic shot of the same woman in the same outfit and the same environment. Her hands remain plain and bare throughout the entire shot. Her fingernails are short, natural, nude, and unpainted. She wears no rings, no bracelets, no bangles, no gemstones, and no hand jewelry. When her hand moves, partially leaves view, or re-enters the frame, it remains the same plain bare hand as in the first frame. The hands, fingernails, sleeves, and skin texture remain visually consistent from the first frame to the last frame. Natural realistic fingers, no added accessories, no cosmetic nail styling.

Negative prompt:

rings, ring, bracelet, bracelets, bangle, bangles, hand jewelry, gemstones, jewel, jewels, nail polish, painted nails, colorful nails, red nails, pink nails, black nails, glossy nails, long nails, fake nails, acrylic nails, decorated nails, manicure, jeweled hands, inconsistent hands, changing jewelry, extra fingers, deformed fingers, distorted hands

This is worth trying.

But again:

This can reduce the issue.
It probably cannot fully guarantee the issue disappears.

Why negative prompts are weak here

Negative prompts are not the same as a hard mask or a hard rule.

They can reduce probability, but they do not force exact pixel-level continuity.

This is especially true when the artifact is a small plausible detail. In cinematic/fashion/beauty-like imagery, the model may have a learned tendency to add things like:

manicured nails
glossy nails
rings
bracelets
jewelry

because those are visually common in similar training contexts.

There are also practical discussions around Wan negative-prompt behavior:

I would still use negatives, but I would not make them the main fix.

The main failure trigger: hand leaves frame

The most important practical observation in your case is probably this:

The hand leaves the frame, then returns changed.

That suggests the problem is not simply:

Wan cannot draw plain hands.

It is more likely:

Wan can draw the current hand, but does not preserve the exact small details after occlusion/out-of-frame/re-entry.

So the easiest practical fix is to avoid the failure trigger.

If hands/nails/jewelry continuity matters, avoid shots where:

the hand exits the frame
the hand goes behind the body
the hand is hidden by hair/clothing/object
the hand becomes tiny
the hand crosses the edge of the frame
the hand returns after several seconds
the clip boundary happens while the hand is missing or distorted

Better shot planning:

keep the hands visible
keep the hands relaxed
cut before the hand disappears
start a new clip from a clean hand frame
avoid long hand re-entry movements
do not make the hand the focus unless necessary

This is less exciting than a technical fix, but often more reliable.

A useful practical rule:

If the model keeps re-inventing something after it disappears, avoid making it disappear.

Shot design table

If you want… Avoid… Prefer…
No rings / no bracelets Hand leaving frame and returning Hand stays visible
Nude natural nails Close-up beauty hand gestures Relaxed hands, less emphasis
Stable sleeve/hand relation Hand crossing body/face/hair Simpler arm motion
Stable hands across clips Last frame with bad hand Clean corrected first frame
Long scene continuity One long uncontrolled generation Several short controlled clips
Fewer hand artifacts Tiny/blurred hands Larger, clearer, simpler hands

First/last frame is useful, but not magic

First/last frame workflows are useful, but they are not a total continuity solution.

Relevant links:

The issue is that first/last frames give boundary guidance, but they do not guarantee that every tiny detail will remain stable inside the generated video.

Also, if your last frame already contains a bad ring/nail/bracelet artifact, and you feed it into the next clip, you may accidentally tell the next generation:

This ring is now part of the scene.

So I would not do this blindly:

clip A last frame
↓
clip B first frame

I would do this:

clip A last frame candidate
↓
inspect it
↓
fix hands/nails/jewelry as a still image if needed
↓
use the corrected clean image as clip B first frame

That gives the model a cleaner anchor.

Clean keyframe strategy

This is probably the most important workflow change.

A practical clip-chaining workflow:

1. Generate clip A.
2. Pick the best final frame or near-final frame.
3. Inspect hands, nails, jewelry, sleeves, face, clothing.
4. If anything is wrong, fix the frame as a still image.
5. Use the corrected frame as clip B's first frame.
6. Generate clip B.
7. Repeat.

This is slower, but it avoids artifact accumulation.

Bad pipeline:

generate video
↓
use last frame automatically
↓
artifact enters next clip
↓
artifact becomes stronger
↓
continuity gets worse

Better pipeline:

generate short video
↓
select clean boundary frame
↓
repair boundary frame
↓
continue from repaired frame

Still-image inpainting before video inpainting

Since you are on 8GB VRAM, I would start with still-image repair before video repair.

ComfyUI has basic inpainting workflows:

The simple idea:

bad hand frame
↓
mask rings / nails / bracelet area
↓
inpaint as plain bare hand
↓
use repaired image as next first frame

Still-image inpainting is not perfect, but it is usually easier to debug than video inpainting.

For your case, I would try:

I2I / inpaint keyframes first
VACE / video inpaint later

What to do when the bad detail appears inside the clip

If a bad ring/bracelet/nail appears inside an otherwise good clip, you have several options.

Option Best when 8GB friendliness Notes
Regenerate the short clip Clip is short High Often easiest
Change seed Artifact is seed-dependent High Quick test
Change shot Hand re-entry is causing it High Best structural fix
Cut around the bad frames Artifact is brief High Normal editing approach
Repair a keyframe and regenerate You need continuity Medium/high Good compromise
Still-image inpaint frames Few frames matter Medium Manual but controllable
Video inpaint / VACE Good clip except local hand issue Lower More advanced

For a first pass, I would not attempt to perfectly repair a long clip. I would make the clip shorter and regenerate.

8GB VRAM reality check

This matters because some solutions are technically relevant but may not be the best first move for your setup.

Wan2.1 has a 1.3B model that is described as consumer-grade and around 8.19GB VRAM:

But that does not automatically mean every Wan editing workflow, VACE workflow, 14B workflow, or video-inpainting setup will be comfortable on an 8GB card.

So I would keep the first-line plan lightweight:

prompt
shot design
short clips
clean first frames
still-image inpaint

and treat heavier tools as experiments.

VACE / video inpainting

VACE is relevant, but I would not put it first for an 8GB setup.

Relevant links:

The concept is exactly relevant:

mask the bad hand/nail/jewelry area
↓
inpaint the local region
↓
keep the rest of the video

For your case:

mask the hand area
remove ring / nail polish / bracelet
restore plain bare hand

However, video inpainting can be harder than it sounds. There are practical reports of VACE/video inpainting damaging areas outside the mask or creating texture issues:

So I would phrase VACE like this:

Very relevant idea.
Worth testing later on a short clip.
Not my first recommendation on 8GB VRAM.

If you do try it:

test 2–3 seconds
low resolution first
small mask
one problem area
compare before/after carefully
do not start with a full scene

Fun Control / control workflows

Wan Fun Control workflows may be useful if the problem is hand movement, pose, or structure.

Relevant links:

These workflows can use control signals like:

Canny
Depth
OpenPose
MLSD
trajectory

This can help with:

hand position
body movement
camera structure
silhouette
motion path

But it probably does not directly solve:

no rings
no nail polish
no bracelet

So I would put Fun Control in this category:

Useful for motion/shape control.
Not a direct jewelry/nail continuity fix.
Try later if hand motion itself is unstable.

Reference / subject consistency workflows

Reference-based workflows may help with general character and clothing consistency.

Relevant links:

These are interesting for:

same character
same outfit
same subject
same visual identity

But I would be cautious about expecting them to guarantee:

exact nail color
no tiny ring
same bracelet absence

They may help the whole image stay more consistent, but tiny hand accessories are still a very hard target.

LoRA / training route

A LoRA or character-specific workflow may help if you need repeated scenes with the same character and outfit.

But for your immediate problem, I would not go there first.

Training or reference workflows are more useful when the problem is:

the character keeps changing
the outfit keeps changing
the face keeps changing

Your current issue is narrower:

hands/nails/accessories are being invented after motion/occlusion

So I would first solve it with:

shot design
keyframes
still repair
short clips

not training.

Suggested practical order

If this were my workflow, I would test in this order:

Order Test Why
1 Add positive hand-continuity wording Easy and free
2 Keep negative prompt, but simplify expectations It may help but not guarantee
3 Generate a shot where the hands never leave frame Tests whether re-entry is the trigger
4 Shorten the clip Reduces drift
5 Change seed Checks if artifact is seed-specific
6 Use a manually cleaned first frame Stronger continuity anchor
7 Fix bad boundary frames with still inpaint Prevents artifact accumulation
8 Regenerate only the short bad clip Cheaper than fixing a long clip
9 Try VACE/video inpaint on 2–3 seconds Later experiment
10 Try Fun Control/reference workflows Later experiment

Tiny test matrix

Before investing in the full scene, I would run a tiny experiment.

Test Setup What it tells you
A Original prompt Baseline
B Add positive hand continuity prompt Prompt sensitivity
C Same prompt, hand never leaves frame Whether re-entry causes the artifact
D Same shot, shorter clip Whether duration causes drift
E Clean corrected first frame Whether visual anchor helps
F Different seed Whether it is random seed dependent
G Still-inpainted first frame + shorter clip Best lightweight solution
H VACE/inpaint short test Whether local video repair is worth the complexity

If test C is much better than A, then your main enemy is:

hand disappears → model re-invents details

If test E/G is much better, then your best tool is:

clean keyframes

If none of them helps, then the prompt/model combination may simply be biased toward styled hands/jewelry and you may need a different shot or workflow.

A concrete “safe” 8GB workflow

1. Keep the current working Wan workflow.
2. Do not add several new heavy systems at once.
3. Generate only a short clip.
4. Avoid hand leaving frame.
5. Use positive hand continuity wording.
6. Inspect final frames.
7. If hands/nails/jewelry are wrong, fix the still frame.
8. Use the fixed still frame as the next clip's first frame.
9. Only after this works, test VACE/Fun Control/reference workflows separately.

That keeps your current setup from getting unstable again.

What I would avoid for now

I would avoid:

long clips
multiple chained clips without inspection
automatically feeding bad last frames into the next clip
trying VACE on a full scene first
installing several new video custom nodes at once
treating "no rings" as a hard rule
forcing everything through one giant workflow

Especially after your earlier ReActor/ONNX/environment trouble, I would keep each new experiment isolated.

My practical expectation

I would expect roughly this:

Method Expected effect
Negative prompt only Small improvement, not reliable
Better positive continuity prompt Useful, but not enough
Keeping hands visible Very useful
Shorter clips Very useful
Clean first frames Very useful
Still-image inpaint on boundary frames Very useful
VACE/video inpaint Potentially useful, but more complex
Fun Control Useful for motion/pose, not directly jewelry
Reference workflows Useful for character consistency, uncertain for nails/rings

So my best guess is:

Do not fight this only at the prompt level.
Make the continuity visible and explicit.
Then repair/anchor the frames where continuity matters.

Final mental model

I would think of it like film continuity:

hands = continuity prop
nails = continuity prop
rings/bracelets = forbidden props
clean first frame = continuity anchor
short clip = safer shot
inpainted keyframe = reset point

Once a bad ring appears, do not let it become part of the next generation.

Go back to the last clean frame, repair it if needed, and continue from there.

Thanks again for the tips… I had to laugh though… I used the posi and neg prompts you advised… first gen I got 3 rings on one hand, a chunky bracelet and sleeves from nowhere

so clearly prompts are not the solution but thanks anyway

I am determined to crack this…
I also tied first frame > last frame using the exact same frame for both to see if this would tighten up the facial consistency. Then pick a new end frame for the start of the next clip (say fame 50 from 80)… this ensures I have a different start frame for each new clip and don’t just keep going back to the same end frame which would look unnatural

Even this is a battle.. I notice that frames 1>80 I can see a very gradual light increase (very very slight) and hardly noticeable. Accept, after 4 or 5 clips this light increase becomes very noticeable.. looks like the characters have been on a sun bed by the end of clip 4 and it’s clearly not the same scene anymore

Tried tweaking the diffusions (high and low) but makes no difference, I have a 2 pass ksampler with cfg =1 and KS1 steps 0-2 and KS2 2-4… any higher and the time to gen a 7 sec clip becomes too long on an 8GB gpu

As always, look forward to any advice

Hmm, probably:


I think you have found a second, separate problem now.

The earlier rings / nails / bracelets issue was mostly:

object / detail continuity drift

This new issue sounds more like:

color / exposure / skin-tone drift

They are related because both are continuity problems, but they need slightly different fixes.

Your description is very specific:

frames 1 → 80:
  very slight gradual light increase

after 4–5 chained clips:
  the small increase has accumulated
  the character looks much brighter / more tanned
  the scene no longer feels like the same scene

That sounds less like a bad prompt and more like cumulative clip-chain drift.

Direct answer

I would not try to solve this only by increasing steps or tweaking the prompt.

On an 8GB GPU, I would first test a pipeline fix:

shorter clip
+
do not pass the raw last frame forward
+
create a corrected "clean continuity frame"
+
color/exposure match that frame to a stable reference
+
use the corrected frame as the next clip's first frame

In other words, I would stop thinking of the last frame as automatically “safe.”

I would treat the last frame as:

a candidate frame that needs inspection and correction before it becomes the next first frame

Why this seems like a real Wan/I2V behavior

There are a few reports that look close to what you are seeing.

Wan2GP has a very similar issue report:

The description there is basically:

every time the video is extended
brightness/light increases
the video becomes progressively more overexposed

That is extremely close to your “after 4 or 5 clips” observation.

There is also this one:

That report describes 5–6 second Wan2.1 I2V generations where brightness increases progressively toward the end of the video.

Wan2.1 itself also has a long-frame color-shift/flicker issue report:

That issue specifically mentions long generations such as 121 frames and says the problem appears when going beyond 81 frames.

So your 80-frame / 7-second setup is right near the area where I would be suspicious of frame-length instability.

There was also an older Wan2.1 I2V frame-count discussion:

I would not over-interpret that as “81 is always the magic number,” because workflows and implementations differ. But it does suggest that Wan2.1 I2V has historically been very sensitive around that frame-count region.

The likely mechanism

I would describe the mechanism like this:

clip 1 starts from a good frame
↓
within clip 1, exposure/skin tone drifts slightly brighter
↓
clip 1 last frame is now a little brighter than the original
↓
clip 2 uses that brighter frame as its new starting point
↓
clip 2 drifts a little brighter again
↓
repeat 4–5 times
↓
the accumulated change becomes obvious

So the problem is not necessarily that any single clip is terrible.

The problem is that a very small change becomes the new baseline each time.

This is why it can be “hardly noticeable” in one 80-frame clip, but obvious after several clips.

Two different continuity problems

I would separate the current issues like this:

Problem What changes Likely cause Best first response
Object/detail drift rings, nails, bracelets, sleeves model re-invents small details after motion/occlusion shot design, clean keyframes, still inpaint
Color/exposure drift skin tone, brightness, contrast, warmth small exposure/color shift accumulates through chained clips shorter clips, color reset, reference matching
Identity drift face changes weak identity constraint over time better reference frames, face restore/swap, shorter clips
Motion drift pose/action becomes different long generation / weak control shorter clips, control workflows
Scene drift room/background changes insufficient visual anchor stronger first frames, fewer chained assumptions

Right now, #12 is mostly the second one:

color/exposure drift

This is not mainly a prompt problem

You can try adding phrases like:

consistent lighting
constant exposure
same skin tone
no exposure change
no brightness shift
same color grading

but I would not expect that to fully solve it.

The issue is happening inside the generated image sequence and then getting inherited by the next clip.

A prompt can bias the model, but it cannot reliably enforce:

frame 80 must have exactly the same exposure distribution as frame 1

So I would put prompt tweaks low on the priority list.

Why increasing steps may not be the best first fix

You said:

2-pass KSampler
cfg = 1
KS1 steps 0–2
KS2 steps 2–4
higher steps make a 7 sec clip too slow on 8GB GPU

That is important.

On an 8GB card, trying to fix this by simply raising steps may not be practical. It may help a little, or it may not, but it makes every experiment expensive.

I would first test cheaper variables:

frame count
clip length
raw last frame vs corrected first frame
color match on continuity frame
seed sensitivity
reference-frame color matching

Those tests may teach you more than just increasing steps.

First test: shorten the clip

Because there are Wan reports around color shift / flicker at longer frame counts, I would first test shorter clips.

For example:

Test Frames Goal
A 80 frames current baseline
B 48 frames test whether drift reduces
C 40 frames stronger short-clip test
D 32 frames extreme stability test

Keep everything else as similar as possible:

same source image
same prompt style
same sampler style
same resolution
same approximate scene

Then compare:

frame 1 vs final frame

If the brightness drift is much smaller at 40–48 frames, then the problem is probably strongly related to clip length / frame count.

That would be useful information.

Second test: do not chain raw last frames

I would test this difference:

Test A:
  clip A raw last frame → clip B first frame

Test B:
  clip A last frame
  → color/exposure corrected against original reference
  → clip B first frame

If Test B reduces the “sun bed” effect, then the main fix is not prompt or steps.

The main fix is:

color reset before chaining

Build a reference frame

Pick one frame as the color reference.

Usually one of these:

original input image
clip 1 frame 1
best-looking frame from clip 1
a manually corrected still frame

This becomes your “visual truth” for:

skin tone
exposure
contrast
warmth
color balance
scene brightness

Then every time you create a new continuity frame, compare it against that reference.

The mental model:

reference frame = color anchor
last frame = motion anchor
corrected continuity frame = next first frame

Suggested chaining pipeline

Instead of:

generate clip 1
↓
use raw last frame as clip 2 first frame
↓
generate clip 2
↓
use raw last frame as clip 3 first frame

try:

generate clip 1
↓
pick a good near-final frame
↓
compare it to the reference frame
↓
correct exposure/color/skin tone
↓
save this as clean continuity frame 1
↓
use clean continuity frame 1 as clip 2 first frame
↓
generate clip 2
↓
repeat

This turns clip chaining into a controlled pipeline instead of an automatic drift amplifier.

Practical “color reset” methods

There are several levels of correction.

Method Where Good for Notes
Manual correction editor / image tool one boundary frame very controllable
ColorMatch ComfyUI matching one frame/image to reference quick test
Histogram match ComfyUI approximate color distribution match can help with exposure/color drift
LUT / grade DaVinci / ffmpeg / editor final clip consistency good for final polish
Full-frame batch correction ComfyUI / editor all frames in a clip more work, but useful
Video inpainting ComfyUI local objects/details not the first fix for exposure drift

ComfyUI-related nodes/resources worth looking at:

External tools:

I would not assume a ComfyUI color-match node will be perfect. It may be enough for boundary frames, though.

Important caution about ColorMatch

Color matching is useful, but it can also fail.

There is a KJNodes issue where Color Match caused problems when later frames were very white:

Also, if the camera angle or content changes too much, a pure color-match operation can produce weird results.

So I would test gently:

use 50% strength first if available
test on one frame
do not apply blindly to a whole long clip
compare before/after

For boundary-frame chaining, I would start with:

correct only the frame that becomes the next first frame

not:

color-match every frame aggressively

Boundary frame correction vs full clip correction

There are two different uses of color correction.

1. Boundary-frame correction

Goal:

prevent drift from being passed into the next generation

This is probably the most important for you.

Pipeline:

clip A final frame
↓
color match to reference
↓
use corrected frame as clip B first frame

2. Final clip grading

Goal:

make the finished clips look consistent after generation

Pipeline:

all clips generated
↓
bring into editor
↓
match exposure/contrast/skin tone across clips
↓
export final movie

Both are useful, but they solve different parts of the problem.

Boundary correction prevents the next generation from inheriting the drift.

Final grading makes the already-generated clips look consistent.

What I would test first

I would run a small matrix.

Test Change What it tells you
1 Current 80-frame generation baseline
2 48-frame generation whether shorter clips reduce drift
3 40-frame generation stronger short-clip check
4 80-frame generation, different seed whether seed matters
5 80-frame generation, same source but lower motion whether motion drives drift
6 raw last-frame chaining baseline chaining drift
7 color-corrected continuity-frame chaining whether color reset helps
8 final editor grade only whether post grade is enough

The highest-value test is probably:

80 frames vs 40–48 frames

and then:

raw last frame vs corrected continuity frame

Why 7 seconds may be too ambitious for stable chaining

A 7-second clip may be fine if you only need one clip.

But if you plan to chain:

clip 1
clip 2
clip 3
clip 4
clip 5

then each clip must not only look good by itself. It must also end in a good state for the next clip.

That is much harder.

A 7-second clip with a tiny internal exposure drift may look acceptable alone, but it becomes a bad building block for chaining.

So for chained scenes, shorter clips can be more stable:

3–4 seconds
or
40–48 frames

Then use editing to assemble the scene.

This may feel slower creatively, but it gives you more control.

The “do not pass drift forward” rule

This is probably the key rule.

Do not pass this forward:

slightly brighter last frame

Pass this forward:

motion-continuity frame corrected back to reference exposure/color

Once a frame becomes brighter and you use it as the next input, the model may treat the brighter skin/scene as the new normal.

That is the same idea as the earlier rings/nails issue:

if a bad ring becomes part of the next first frame,
the next clip may preserve or amplify the ring

For color:

if a brighter skin tone becomes part of the next first frame,
the next clip may preserve or amplify the brighter skin tone

Relationship to long-video research

This is not just a “you configured something wrong” issue.

Long/chained video generation has a general problem called things like:

drift
error accumulation
exposure bias
temporal degradation
memory bottleneck

FramePack discusses this problem directly:

Another related long-video paper:

I am not saying you should switch to those immediately. With 8GB VRAM, that may not be the practical next step.

But these links are useful because they show that the problem category is real:

long video generation tends to accumulate errors over time

Your case is a practical Wan I2V version of that.

What I would not do first

I would not start with:

raising steps a lot
trying to fix it with "no overexposure" in the negative prompt
building a giant new workflow
installing several new video systems at once
trying full video inpainting first

Those may be useful later, but they are not the cheapest diagnostic tests.

Given your 8GB GPU, I would first test:

shorter clips
color reset between clips
boundary-frame correction
final grading

A possible safe workflow for your setup

Something like:

1. Pick one reference frame.
   Usually original image or clip 1 frame 1.

2. Generate clip 1, but test shorter length first.
   Try 40–48 frames.

3. Inspect frame 1 vs final frame.
   Look at exposure, skin tone, warmth, contrast.

4. Pick a near-final frame for continuity.
   Do not automatically use frame 80 if frame 80 is already drifting.

5. Color-correct that frame against the reference.
   Use ColorMatch / histogram match / manual editor correction.

6. Use the corrected frame as clip 2 first frame.

7. Repeat.

8. After all clips are generated, do final color grading across the finished clips.

If you must keep 80 frames, then I would be stricter about step 5.

Tiny diagnostic checklist

For each generated clip, check:

frame 1
frame 20
frame 40
frame 60
frame 80

Ask:

Is the face brighter?
Is the skin warmer?
Is contrast lower?
Are highlights clipping?
Is the background changing?
Is the whole frame brighter or only the character?

If the whole frame is drifting:

global exposure/color drift

If only the face/skin is drifting:

subject/skin-tone drift

If only highlights are growing:

overexposure/highlight clipping

Different drift types may need different correction.

Suggested language for your notes

I would name the two current problems like this:

Problem 1:
  object/detail continuity drift
  rings, nails, bracelets, sleeves

Problem 2:
  cumulative color/exposure drift
  brightness, skin tone, contrast over chained clips

Then your workflow goal becomes:

Do not let either kind of drift become the next clip's starting condition.

My practical expectation

I would expect the best improvement from:

shorter clips
+
not using raw last frames
+
color-correcting continuity frames

I would expect smaller or inconsistent improvement from:

negative prompt
slightly more steps
minor diffusion tweaks

And I would treat model/workflow changes as later experiments.

Short version

You are probably no longer just fighting prompt adherence.

You are now fighting clip-chain drift.

Each clip slightly changes the visual state. If the next clip starts from that changed state, the change accumulates.

For your 8GB setup, the most practical first fix is probably:

try 40–48 frames
use a stable reference frame
correct the boundary frame before chaining
then do final color grading after all clips are assembled

That may not be a perfect in-model fix, but it is a realistic production-style workaround.

I probably didn’t explain it well
frame 1 and frame 80 are exactly the same frame.
I am using a 1st frame > last frame w/f. I find this helps tighten the facial constituency - as it drifts mid clip it starts to return because it needs to land on the last frame which is exactly the same frame as the first frame.

If I look at all the frames (80) in all side by side I can see the around frame 20 ish there is a def increase in brightness through to about frame 50 ish. Then, as it targets the end frame is starts to go dark again

Interestingly, I tried using one of the lighter frames as 1st and last and it did the opposite… went from light to dark then back to light again

Seems every time it find a workaround for 1 challenge then another one comes along - lol

Ah, so, then I think the fix changes too:


I think I misunderstood one important detail earlier.

If frame 1 and frame 80 are literally the same image, then this is probably not mainly cumulative last-frame chaining drift.

It sounds more like a same-frame first/last-frame problem:

frame 1  = reference image
frames 20-50 = model invents a transition path and exposure/color moves
frame 80 = the same reference image again

So the model is not simply drifting forward forever. It is doing something more like:

dark/reference
→ unwanted brighter middle
→ dark/reference again

And when you use one of the lighter middle frames as both the first and last frame, it does the opposite:

light/reference
→ unwanted darker middle
→ light/reference again

That is a very useful observation, because it changes the diagnosis.

Direct answer

I would stop treating identical first/last-frame FLF as the main tool for this specific goal.

A simple rule:

FLF is good for “A becomes B.”
FLF is not always good for “A stays A.”

Your goal seems closer to:

A
→ small motion / facial micro-movement / slight acting
→ A

But FLF may interpret identical first/last frames as:

A
→ generate some kind of transition path anyway
→ A

That generated middle path can show up as:

brightness hump
contrast shift
skin tone change
gray flicker
unwanted exposure movement
unwanted visual excursion

So yes, if frame 1 and frame 80 are exactly the same image, the fix changes.

Why identical first/last frames can be unstable

Wan FLF is not simply:

keep this image stable for 80 frames

It is more like:

given a starting frame and an ending frame,
generate the missing motion between them

The ComfyUI Wan FLF documentation describes FLF as using a start frame and an end frame, with the model filling in the intermediate dynamic change:

ComfyUI Wan2.1 FLF2V Native Example

That distinction matters.

FLF is naturally a transition tool:

first image
→ intermediate motion / transformation
→ last image

So if you give it:

first image = A
last image  = A

the model may still try to create a meaningful middle. It may not understand that you want:

A, but almost static, with constant exposure and only tiny motion

It may instead produce:

A
→ some invented motion/lighting/color path
→ A

Your light/dark/light experiment fits that explanation very well.

Similar reports / clues

There are a few reports that point in the same direction.

Same image as first/last causing loop/color problems

This issue is very relevant:

ComfyUI-WanVideoWrapper issue #1541 — Wan 2.2 I2V Looping same image Start-End step - Color contrast increase over time

That report involves using the same image as first/last for a loop and getting problems such as:

color / contrast increase
gray flicker
gray ending

It is not exactly the same workflow as yours, but it is close enough to be useful: identical first/last conditioning can create unwanted color/exposure behavior.

First/last frame ping-pong behavior

Another related issue:

ComfyUI-WanVideoWrapper issue #1342 — First and Last Frame Pingponging Wan 2.2

This is useful because it shows that FLF can behave less like “hold this image steady” and more like a temporal path that may move toward one endpoint and then back.

Practical comment: identical frames may be bad

There is also a practical Reddit discussion where users mention that identical first/last frames can give worse results, and that slightly changing the last frame may help:

Reddit: PSA — WAN 2.2 does First Frame Last Frame out of the box

The practical takeaway is:

Do not always use identical first/last frames.
If you use FLF, make the last frame slightly different.

That turns the task into:

A → A'

instead of:

A → A

The model may handle that more naturally.

Updated tool map

I would map the tools like this.

Goal Better first tool Why
A clearly changes into B FLF This is what first/last-frame generation is good at
A stays mostly A with slight motion short normal I2V Less likely to create an artificial transition arc
A loops back to A normal I2V + editing loop Often more stable than identical first/last FLF
A loops back to A using FLF anyway FLF with a slightly modified last frame Avoids the exact duplicate endpoint problem
The middle of FLF changes too much shorter clip / middle anchor / FMLF First+last alone may not constrain the middle
Color/brightness hump appears mid-clip avoid same-frame FLF first; then post-correct if needed The hump may be part of the generated FLF path
8GB VRAM stability short clips and editing More iterations, fewer heavy experiments

The main thing:

Use FLF when you want a transition.
Use normal I2V when you want a short living-still / micro-motion shot.
Use editing when you want a clean loop.

Broader map for your current project

Based on the whole thread, I would separate the tasks like this:

Task Practical path on 8GB VRAM Avoid as first step
ReActor face swap Keep it simple, test stills/short clips first full complex video workflow immediately
Basic Wan video short normal I2V long chained clips before stability testing
Add audio TTS or recorded audio + mux full speech-to-video model locally
Lip-sync Wav2Lip / LatentSync as post-process Wan2.2-S2V 14B on 8GB
Keep hands/nails/jewelry stable shot planning + clean keyframes + still inpaint negative prompt only
A→B visual change FLF normal I2V only
A→tiny motion→A short normal I2V identical first/last FLF
Loop-like result generate a good short clip, then loop in editing forcing exact duplicate FLF loop
More control over the middle FMLF / middle anchor later hoping first/last alone controls everything
Heavy local video editing test only after the simple path works installing many new systems at once

This is why I would not put FLF in the “bad” category. It is not bad. It may just be the wrong tool for this exact task.

What I would test next

I would run a small test matrix.

Test 1 — normal I2V, no FLF

Use the same source image.
Do not set the same image as both first and last.
Generate a short normal I2V clip.
Try 40–48 frames first.

Goal:

Does the brightness hump still appear?

If the brightness hump mostly disappears, then same-frame FLF was probably the trigger.

Test 2 — FLF, but last frame slightly different

Instead of:

first = A
last  = A

try:

first = A
last  = A'

Where A' is almost the same image, but with a tiny real change:

slightly different expression
slightly different head angle
slightly different hand position
slightly different eye direction
slightly different camera crop

Goal:

Give the model a real A→A' path,
instead of forcing it to invent an A→A loop path.

Test 3 — shorter FLF

Try the same setup at:

32 frames
40 frames
48 frames

instead of:

80 frames

Goal:

Does the middle brightness hump shrink when the model has less time to invent an excursion?

Test 4 — loop in editing, not generation

Generate a normal short I2V clip that looks good.

Then loop it afterward using editing techniques:

cut off bad ending frames
crossfade end to start
reverse a copy and ping-pong it
use interpolation between end/start

For interpolation, RIFE is one common frame interpolation family:

RIFE paper

This is not necessarily “better AI generation,” but it may be a more stable production trick.

About FMLF / middle anchors

If first/last alone gives too much freedom to the middle, then a middle anchor is the logical next idea.

There are Wan first-middle-last / multi-keyframe workflows and nodes, for example:

RunComfy: WanFirstMiddleLastFrameToVideo

The idea is:

first frame = A
middle frame = controlled middle state
last frame = A or A'

This can reduce the model’s freedom in the middle.

But I would treat FMLF as a later experiment, not the first fix, because:

more keyframes = more complexity
more nodes = more installation/debugging
8GB VRAM = test carefully
middle-frame workflows can have their own artifacts

There are also practical reports of middle-frame workflows having artifacts such as flash-like problems, so it is not guaranteed to be magic:

Reddit: Wan 2.2 First Frame Middle Frame Last Frame FMLF middle-frame flash issue

So the order I would use is:

normal I2V short clip first
then slightly different FLF endpoint
then FMLF/middle anchor only if needed

If the goal is a loop

If the goal is a clean loop, I would not force a perfect AI-generated loop first.

I would first generate a good short clip.

Then make it loop in editing.

Possible approaches:

Loop method When useful Notes
Cut before the bad ending The clip looks good until the final frames simplest
Crossfade end→start Slight mismatch easy in any editor
Ping-pong / reverse Motion can naturally reverse can look artificial, but stable
Frame interpolation Need smoother transition RIFE/other interpolation tools
Slightly different FLF endpoint You still want FLF avoid identical A→A
FMLF / middle anchor Middle needs control later experiment

For 8GB VRAM, editing tricks are not a compromise. They are often the practical route.

My suggested order for your setup

Given your constraints, I would try:

1. Stop using identical first/last frame for this test.
2. Try short normal I2V at 40–48 frames.
3. If you need FLF, make the last frame slightly different.
4. If you need a loop, create a good non-loop clip first, then loop in editing.
5. If the middle still needs control, try FMLF / middle anchor later.
6. If color still bumps in the middle, correct it after generation.

I would avoid doing this first:

80-frame identical first/last FLF

because your own test suggests that it creates:

reference → brightness/color excursion → reference

which is exactly the pattern you are trying to avoid.

Short version

I would now think of it like this:

Your previous setup:
  identical first frame + identical last frame
  = asks FLF to generate an A→A path

The problem:
  the model fills the middle with an unwanted exposure/color path

The fix:
  do not ask FLF for A→A unless you really need that.
  use short normal I2V for subtle motion,
  or use A→A' with a slightly changed last frame,
  or make the loop later in editing.

So the practical rule is:

FLF for transitions.
Normal I2V for living-still motion.
Editing/interpolation for loops.
FMLF only if the middle needs extra control.

Ok… great diag… thanks
The F2F is def the issue.. ran without a last clip and the lighting bump (tango’d as I call it) disappeared… so that is clearly what is going wrong and would explain why it felt random because my entire video is a mix of F2F and I2V clips. I though I was being clever with F2F to tighten the facial consistency effectively stopping wan from drifting too far from the reference image.

This has resulted in very short 1 liner prompt like ’ char1 looks at char2’ given that the end frame will always contain all the detail ie: no need for a prompt to keep describing the background, camera or lighting given that the only possible outcome is the end frame if that makes sense. So the ‘filler’ now makes sense.. I give a 1 line action prompt, wan makes up the rest then around halfway through the clip it starts to ‘undo’ any lighting changes (tango) to get back to the 1st image (now the last image too)

Sorry, needed to type this out so I could validate this in my own head

So now I’m back to facial consistency.. and the best route to solve this..

Editing I2I frames works best for actions and removal of unwanted jewelry, painted nails etc… but not for facial consistency. Reactor only works where there is no interaction between the chars as it is appears NSFW sensitive even though there is nothing ‘adult’ happening.

but I’m making progress so thanks for the diag.. given up on the speech for now

I’m beginning to wonder if I should stray outside my comfort zone of my current w/f because I understand it… maybe I should consider looking at something that already has the sound nodes…

Hmm… then maybe something like this?


I think this is a good point to separate two very different goals:

  1. finish a minimum end-to-end pipeline
  2. make the pipeline good

Those are not the same task.

From your latest test, if the lighting bump disappears when you remove the F2F / last-frame part, I would treat that as a useful diagnosis. F2F was probably acting more like an endpoint-transition constraint, not like a face/identity lock.

So I would not try to solve face consistency, lighting, audio, lip-sync, and final export all in one workflow right now.

I would first make one very simple complete pipeline.

Route Do now? Why
Short normal I2V/T2V clip yes you already have this roughly working
Simple audio mux yes fastest way to complete the pipeline
Simple TTS if you have no audio yes gives you a test WAV/MP3
Face / hand / jewelry / lighting repair later separate video-quality problem
Lip-sync later separate audio-to-mouth problem
V2A / Foley sound effects later separate sound-design problem
VACE / advanced video repair later useful, but workflow-heavy
S2V / all-in-one speech-to-video not now interesting, but not the next 8GB step

My short answer would be:

Yes, moving to a minimum audio/video pipeline now is reasonable.
Just keep it minimal: short video, simple audio, mux, export.
Then come back to face consistency, lip-sync, SFX, and visual repair as separate branches.

1. Minimum pipeline I would test first

For a first complete test, I would keep it almost boring:

short generated video
→ simple audio file
→ mux audio into the video
→ export MP4

If you already have audio:

video frames / video clip
+ WAV/MP3 audio
→ Video Combine / ffmpeg / editor
→ MP4 with sound

If you do not have audio yet:

text
→ simple TTS WAV/MP3
→ LoadAudio
→ Video Combine
→ MP4

Useful references:

The important thing is that this is mainly audio muxing.

That means:

existing video + existing audio → video file with audio

It does not mean:

prompt → perfect speech
→ perfect mouth movement
→ perfect acting
→ perfect edited video

Those are later problems.

For the first test, I would only check:

  • does the MP4 play?
  • is audio present?
  • is the duration correct?
  • is the audio roughly in sync?

If yes, the minimum pipeline works.

2. If you do not have audio yet: simple TTS

If you do not already have a voice/audio file, I would add a very simple TTS step first.

I would not assume English only. Even if someone posts in English, the final voice might need another language, accent, or voice style.

Possible TTS routes:

TTS route Use case Notes
External TTS site/app fastest first test least ComfyUI dependency risk
EdgeTTS-style node simple TTS inside/near ComfyUI good enough for testing
Qwen3-TTS-style model multilingual / more flexible voice direction more advanced
F5-TTS / voice cloning better voice control more workflow risk

Examples:

For the first complete pipeline, I would not start with voice cloning or complex acting control.

I would just make one short WAV/MP3 line and treat it as normal audio.

3. What is easy vs hard

A lot of these tasks sound related, but they are different problems.

Task What it does Difficulty
Audio mux puts existing audio into video low
Simple TTS creates a voice file from text low-medium
Lip-sync changes mouth motion to match speech high
V2A / Foley generates SFX/ambience from video high
I2V/T2V generates the visual clip medium
V2V / inpaint / repair fixes or modifies video high
S2V uses speech/audio to generate video very high
One perfect AV workflow tries to do everything at once very high

So I would separate it like this:

minimum pipeline
= video + audio + export

quality pipeline
= face + hands + lighting + lip-sync + SFX + editing

Those are not the same thing.

This is especially important on 8GB VRAM.
Low VRAM makes the “one perfect workflow” route much harder, so small repairable stages are easier to debug.

4. Advanced audio branches for later

After the minimum audio path works, there are several later branches.

Lip-sync

Lip-sync is for:

video + speech audio
→ video with mouth motion adjusted to speech

Examples:

This is not the same as simply adding audio to a video.

V2A / Foley

V2A / Foley is for:

silent video
→ matching sound effects / ambience

Examples:

This can help later with footsteps, movement sounds, background ambience, object sounds, etc.

But I would not start here if the basic MP4-with-audio pipeline is not working yet.

5. Advanced video branches for later

For the video side, there are also advanced repair/control routes.

Examples:

One important detail:

a small model does not always mean a simple workflow.

For example, VACE has a 1.3B route and a 14B route, so the smaller route may be more possible on low VRAM. But VACE is still an editing/control/inpaint-style workflow. It may be lighter than 14B in model size, but it is not necessarily beginner-simple.

So I would classify VACE 1.3B as:

possible later experiment
not the first minimum pipeline step

For the current stage, I would keep the first complete pipeline much simpler.

6. 8GB VRAM reality

On 8GB VRAM, I would think roughly like this:

Task 8GB reality
Short T2V/I2V realistic
Short I2V + simple audio mux realistic
Wan 1.3B-class workflows realistic-ish
Wan2.2 5B with offloading/optimized workflow possible, slower, more fragile
VACE 1.3B possible experiment, but workflow-heavy
14B video workflows much harder
S2V-14B / all-in-one speech-to-video not the next local 8GB step

Wan2.1 1.3B is documented as a consumer-grade model around the 8GB VRAM class:

Wan2.2-S2V-14B is a very different class of thing:

So I would not treat S2V as the next local 8GB step.

It is more like:

future / high-resource / cloud-GPU direction

not:

beginner next step after I2V
7. Even with more VRAM, editing still matters

Even with more VRAM, I would still not assume that the normal route is one perfect long generation.

A more realistic pattern is:

generate short shots
→ repair weak shots
→ add or generate audio
→ edit clips together
→ final export

This is not only an AI limitation. Normal video editing has its own traps too:

  • FPS
  • frame count
  • duration
  • MP4 export settings
  • audio sync
  • variable frame rate vs constant frame rate
  • concat/mux issues

So the minimum pipeline is not:

make one perfect long video

It is more like:

make one short clip with sound successfully

Then build from there.

If ComfyUI export becomes annoying, it is also reasonable to use a normal video editor or ffmpeg for the final mux/export step:

My suggested next step

I would do the simplest possible complete test:

  1. make a very short video clip
  2. create or choose one short audio file
  3. load the audio
  4. combine video + audio
  5. export MP4
  6. check whether it plays correctly

Only after that, I would return to:

  • face consistency
  • lighting stability
  • hand/accessory fixes
  • lip-sync
  • V2A/Foley
  • VACE or other advanced repair workflows

So my practical answer is:

Finish the rough pipeline first.
Keep it minimal: short video, simple audio, mux, export.
Then treat visual quality, lip-sync, sound effects, and final editing as separate later branches.

In my previous reply, I mainly talked about the general recommended flow, but if I narrow it down specifically to improving the video side, I think it would look something like this:


Short version

For your current stage and an 8GB VRAM setup, I would not jump sideways into a larger “perfect identity / perfect audio / perfect control” workflow yet.

I would first stabilize the smallest video route:

no F2F for now
→ very short I2V / TI2V
→ static camera
→ one small action
→ stronger preservation prompt
→ several seeds
→ pick the best face-stable result
→ repair only the bad parts later

The main change is this:

I would stop treating F2F as a face-consistency tool.
I would treat it as an endpoint-transition tool.

Your own test is probably the most useful evidence here: when you removed the F2F / last-frame part, the lighting bump disappeared. So I would not put F2F back into the main route until the simpler video path is stable.


Why I would avoid F2F as the main fix for now

The way I understand your result is:

Setup What you wanted What likely happened
First frame = image A preserve the character
Last frame = same image A force the clip to return to the same identity
Short action prompt “char1 looks at char2”
Actual result Wan generated an intermediate path, then returned to A
Visible symptom lighting / color / exposure changed in the middle

So the model may not be interpreting this as:

keep the same face and lighting for the whole clip

It may be closer to:

start here, invent a middle, and return to the endpoint

That is useful for transitions, but not always useful for “same person, same lighting, tiny motion.”

This also matches how the ComfyUI FLF docs describe the workflow: first/last frames define the video boundaries, and the model fills the intermediate transition/dynamic change:
ComfyUI Wan FLF2V example

So I would classify the tools like this:

Goal Better first tool
Make one image slightly alive normal I2V / TI2V
Make a clear A-to-B transition F2F / FLF
Make a loop generate a good short clip first, then loop/edit later
Repair one bad area/frame I2I / inpaint / frame repair
Stronger video repair/control VACE or control workflow later
Strong identity-specific workflow Stand-In / LoRA / identity workflow later

So, for now, I would keep F2F out of the main path.


8GB VRAM: what seems realistic

On 8GB, I would think about the options like this:

Route 8GB practicality My opinion for your current stage
Very short I2V / TI2V Good Best next step
Wan2.2 5B with ComfyUI native offloading Good candidate Main route to test
Wan2.1 1.3B / 480P route Useful fallback Good diagnostic route
F2F / FLF Possible, but risky here Do not use as face lock
ReActor Useful in clean cases Targeted repair only
VACE 1.3B Possible later Useful, but workflow-heavy
Fun Control / pose-control routes Later More about structure/motion than face identity
Stand-In / identity-preserving workflows Interesting later Probably too early as the next fix
14B / S2V / all-in-one AV workflows Heavy Not the next local 8GB step

The ComfyUI Wan2.2 docs say the Wan2.2 5B version should fit well on 8GB VRAM with ComfyUI native offloading, so that is the most natural first candidate if you want to stay local.

If that is still too slow or unstable, a smaller/480P test route is not a failure. The Wan2.1 repo describes the 1.3B model as requiring about 8.19GB VRAM and being aimed at 480P. That makes it useful as a diagnostic route, even if the final quality is lower.

The key rule is:

On 8GB, the best workflow is often not the most powerful workflow.
It is the smallest workflow that fails in a predictable way.


Recommended video-only route

For now, I would try to make one clean baseline:

source image
→ short I2V / TI2V
→ no F2F
→ static camera
→ one small action
→ preservation-heavy positive prompt
→ several seeds
→ choose the best face-stable result
→ repair isolated defects only if needed

For the first stable test, I would keep it boring:

Setting area Recommendation
Clip length 2–3 seconds
Motion one small action only
Camera static
Prompt explicitly preserve face / lighting / background
Resolution low enough to iterate
F2F off
Seed test several
Judging priority face first, prompt obedience second

The first goal is not to make the whole final scene.

The first goal is only:

Can I get one short clip where the face, lighting, and background stay mostly stable?

If yes, then build from there.

Detailed test plan

Test 0 — keep the broken/control case

Keep one copy of the old F2F result as a reference.

That tells you what failure you are trying to avoid:

F2F on
→ lighting bump appears

Then create the new branch:

F2F off
→ same/similar source
→ same rough action
→ short clip

Test 1 — normal short I2V / TI2V

Goal:

Does the lighting bump stay gone?

If yes, that is already useful. It means the new route is healthier than the old F2F route.

Test 2 — clip length

Try very short clips first:

Clip length Why
2 seconds fastest sanity check
3 seconds good working range
5+ seconds only after the short clip is stable

Longer clips give the model more time to drift. So for face consistency, short clips are not just faster; they are safer.

Test 3 — prompt preservation

Run the same seed twice:

Run Prompt
A short action prompt only
B action + face/lighting/background preservation prompt

Judge in this order:

  1. same face
  2. same lighting
  3. same background
  4. no morphing
  5. natural motion
  6. prompt obedience

Prompt obedience should not be first. For this case, a clip that obeys the action perfectly but changes the face is still a failed clip.

Test 4 — seed search

Once the settings look acceptable, run several seeds:

same image
same prompt
same settings
8–16 seeds if time allows
short preview first

Choose by face stability first.

For this project:

A seed that preserves the face and obeys 70% of the prompt is better than a seed that obeys 100% but changes the face.


Prompt strategy: do not rely on the end frame anymore

Since F2F is no longer doing the “return to the reference” job, I would not use only an ultra-short prompt like this:

char1 looks at char2

That prompt gives the action, but leaves too much open:

  • lighting
  • skin tone
  • background
  • camera movement
  • color balance
  • expression intensity
  • face permanence
  • amount of motion

Instead, I would put the preservation rules into the positive prompt.

Example:

same person, same face, same identity, same hairstyle, same clothing,
same room, same background, same camera angle, static camera,
same soft lighting, same skin tone, same color balance,
only subtle natural motion, small eye movement, slight breathing,
char1 slowly looks toward char2,
no zoom, no pan, no scene change

A short negative prompt is still fine, but I would not depend on the negative prompt as the main control. Especially in low-step / distilled / low-CFG workflows, negative prompts often do less than people expect.

Practical rule:

Put the important “do not change this” information into the positive prompt, not only into the negative prompt.


Where ReActor fits

I would still keep ReActor in the toolbox, but I would narrow its job.

ReActor is useful when the face is:

  • visible
  • clean
  • not heavily occluded
  • not in complex physical interaction
  • not changing too much across frames

The ReActor repo describes it as a face-swap extension and also labels it SFW-friendly. So if interaction shots are triggering sensitivity, black output, or blocked results, I would not fight that as the main path right now.

I would use ReActor more like this:

Use case ReActor fit
clean portrait repair good
simple visible face possible
simple talking-head style shot possible
two characters interacting closely fragile
hands/arms crossing faces fragile
solving all identity drift not ideal

So I would not make ReActor responsible for the whole video. I would use it only where it is naturally strong.


Where I2I / frame repair fits

I2I or frame repair is good for isolated defects:

Problem Better approach
one frame has bad jewelry I2I / inpaint
one hand is wrong I2I / inpaint / regenerate shot
one face frame is slightly off targeted repair
face slowly morphs across the whole clip shorter clip / better source / better prompt / seed search
lighting changes mid-clip remove F2F, shorten clip, preserve lighting in prompt
interaction shot breaks simplify blocking, split into shorter shots

For interaction scenes, sometimes the best fix is not a stronger node. It is simpler shot design.

Example:

Hard version:
one long shot where two people interact, turn, touch, speak, and move

Easier version:
shot A: char1 reacts
shot B: char2 reacts
shot C: simple wider interaction
shot D: close-up repair if needed

Less glamorous, but much easier to control on 8GB.


Later branches: VACE, Fun Control, Stand-In, LoRA, 14B

VACE

VACE is relevant because it supports video editing/control-style workflows: text, image, video, masks, control signals, local replacement, motion transfer, etc.

So yes, it is conceptually close to video repair.

But I would not make it the next step before the simple video route is stable.

Why?

Because VACE adds more things to debug:

  • model choice
  • masks
  • video inputs
  • control inputs
  • node bypasses
  • edited regions
  • temporal consistency
  • more VRAM pressure
  • more workflow complexity

So I would classify it as:

possible later experiment
not first recovery path

Fun Control

Wan Fun Control / control-style workflows may help with motion, pose, depth, or structure.

But I would not treat them as a face-consistency tool first.

They are more like:

control body / pose / motion / structure

not:

guarantee same face in every frame

So again: later.

Stand-In

Stand-In is interesting because it is specifically about identity-preserving video generation.

So the direction is relevant.

But I would still not make it the next step for your current situation.

Reasons:

  • it is more research/workflow-like than a simple recovery step
  • it brings another stack of dependencies
  • its README notes that some ComfyUI integrations differ from the official version
  • some routes depend on heavier Wan bases
  • it is a better “future identity branch” than an immediate fix

So I would put it in the “keep an eye on this” category, not the “switch today” category.

Character LoRA / identity training

This may become useful if you want to make the same character again and again.

But as a next step, it is too much:

  • dataset preparation
  • captions
  • training settings
  • overfitting risk
  • model/version compatibility
  • more storage
  • more VRAM/time
  • harder debugging

So I would not use training as the immediate fix for this specific problem.

14B / GGUF / heavy routes

Quantized 14B routes can be interesting, but I would not make them the first recovery route on 8GB.

They may run, but running is not the same as being practical:

  • slower iteration
  • more configuration sensitivity
  • more memory edge cases
  • harder troubleshooting
  • longer feedback loop

For learning what is causing face drift, short 5B or 1.3B tests are usually more useful.


Should you move to another workflow just because it has sound nodes?

For the video side, I would say no — not yet.

A workflow having sound nodes does not automatically solve:

  • face consistency
  • F2F lighting bump
  • interaction scenes
  • ReActor sensitivity
  • hands/jewelry/nails
  • color drift
  • video repair

It may only move you into a larger workflow with more things to debug.

I would separate the branches:

video quality branch
= face, lighting, hands, motion, shot design

audio branch
= TTS, audio mux, lip-sync, Foley, sound effects

Since you already diagnosed a real video-side issue, I would stabilize that branch first.

Then add audio as a separate branch later.


Practical roadmap

Step Action Success condition
1 Duplicate the current working workflow old version remains safe
2 Disable F2F / last-frame path lighting bump does not return
3 Make a 2–3 sec I2V/TI2V test face mostly holds
4 Use static camera + one tiny action less drift
5 Add preservation-heavy positive prompt lighting/background/face hold better
6 Try several seeds at least one acceptable seed
7 Repair only isolated defects avoid rebuilding whole workflow
8 Only then test VACE/ReActor/control tools targeted use, not main path
9 Only after video baseline works, return to audio fewer variables

Main rule:

Keep the main branch boring.
Put risky experiments in copies.


Very compact version

If I compress all of this into one practical recommendation:

For the video side, I would not switch to a larger workflow yet. On 8GB VRAM, I would first stabilize a small no-F2F I2V/TI2V route: Wan2.2 5B with native offloading if possible, or a smaller/480P diagnostic route if needed. Keep clips very short, use static camera, ask for only one small action, put identity/lighting/background preservation into the positive prompt, and choose the best seed by face permanence first. Use ReActor, I2I repair, VACE, Fun Control, or Stand-In later as targeted repair/control branches — not as the first main solution.

That seems like the safest next direction from where you are now.