BokehDepth — Released Checkpoints

This repository hosts the released checkpoints for BokehDepth: Boosting Monocular Metric Depth Estimation via Bokeh Rendering (ICML 2026).

📄 Paper: https://arxiv.org/abs/2512.12425
🌐 Project page: https://fogradio.github.io/BokehDepth_Project/
💻 Code: https://github.com/fogradio/BokehDepth

BokehDepth is a two-stage framework. Stage-1 turns a single sharp image into a calibrated multi-strength bokeh stack (no depth map needed); Stage-2 fuses the resulting defocus cues to produce sharper, more reliable metric depth. The three files in this repository correspond exactly to the two-stage inference pipeline.

File	Stage	Role	Size
`bokeh_lora.bin`	Stage-1	Bokeh generation LoRA adapter on top of FLUX.1-Kontext	≈ 556 MB
`bokeh_lora_ft.bin`	Stage-1	Robustness-finetuned variant of `bokeh_lora.bin`	≈ 556 MB
`UDv2_dsfa_release.pth`	Stage-2	UniDepthV2 + DSFA depth estimator	≈ 5.19 GB

Stage-1 — Bokeh Generation LoRA

Stage-1 uses FLUX.1-Kontext (rectified-flow MMDiT) plus a lightweight bokeh cross-attention adapter. Heterogeneous optical settings (focal length, aperture, focus distance) collapse into a single calibrated scalar K from the thin-lens circle-of-confusion model, which captures the near-linear relation r ≈ K · Δdisp between blur radius and disparity offset. Conditioned on K, Stage-1 turns one sharp image into a multi-strength bokeh stack with no depth map at any point.

`bokeh_lora.bin` — base LoRA

The base Stage-1 checkpoint. Trained on the unified Stage-1 data pipeline that aligns real defocused photos, synthetic renderings, and paired datasets onto the shared K axis.

`bokeh_lora_ft.bin` — robustness fine-tune

A continued fine-tune of bokeh_lora.bin that additionally mixes in synthetic bokeh renderings produced by BokehMe from subsets of the standard monocular-depth datasets KITTI / Hypersim / NYU-v2 / vKITTI 2. Since these datasets cover many scenes where the foreground is ambiguous, low-contrast, or simply absent, the resulting checkpoint is noticeably more robust at generating clean bokeh on such "no-clear-subject" inputs (driving scenes, dense indoor clutter, distant cityscapes, etc.) while preserving the calibrated K-control of the base LoRA.

Both LoRAs are wrapped at inference time by BokehFluxControlAdapter (see bokeh-generation/model/bokeh_adapter_flux.py in the code repository) and are loaded with lora_rank=128, lora_alpha=128 over FLUX transformer blocks 0–56.

Stage-2 — UniDepthV2-DSFA

`UDv2_dsfa_release.pth`

The Stage-2 metric depth model: UniDepthV2 (ViT-L/14 DINOv2 backbone) with our Divided Space Focus Attention (DSFA) module inserted into the depth encoder. DSFA first runs spatial attention inside each frame conditioned on that frame's blur strength K_f, then runs focus attention across frames at matching spatial locations, modulated by FiLM. Each location can therefore read how its blur grows with K, which is the physical depth-from-defocus cue. Only reference-frame tokens are passed downstream, so the original DPT decoder and metric head stay untouched.

This checkpoint is the plug-and-play DSFA build dropped onto UniDepthV2 and trained jointly with the Stage-1 bokeh stack as input. Use it together with the config UniDepth/configs/config_v2_vitl14_DSFA_inference.json in the code repository.

How to use

# from the project root
bash run_inference.sh

run_inference.sh expects all three files to live exactly here, i.e. under weights/:

weights/
├── bokeh_lora.bin          # or bokeh_lora_ft.bin (see ADAPTER_CKPT env var)
├── bokeh_lora_ft.bin
└── UDv2_dsfa_release.pth

Override which Stage-1 LoRA is used with:

ADAPTER_CKPT=weights/bokeh_lora_ft.bin bash run_inference.sh   # robust default
ADAPTER_CKPT=weights/bokeh_lora.bin    bash run_inference.sh   # base LoRA

The Stage-2 weights path is fixed via WEIGHTS_PATH=weights/UDv2_dsfa_release.pth (default).

Citation

If you use these checkpoints, please cite:

@inproceedings{zhang2026bokehdepth,
  title     = {Boosting Monocular Metric Depth Estimation via Bokeh Rendering},
  author    = {Zhang, Hangwei and Fortes, Armando and Wei, Tianyi and Pan, Xingang},
  booktitle = {Proceedings of the International Conference on Machine Learning (ICML)},
  year      = {2026}
}

License & acknowledgements

Released under CC BY-NC 4.0 for research use only. Stage-1 builds on FLUX.1-Kontext (Black Forest Labs) and Stage-2 builds on UniDepthV2; both upstream licenses apply to their respective base weights. The robustness fine-tune additionally relies on synthetic bokeh produced by BokehMe on standard monocular-depth datasets (KITTI / Hypersim / NYU-v2 / vKITTI 2) — please respect each dataset's individual license when redistributing derived data.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for fogradio/BokehDepth

BokehDepth: Enhancing Monocular Depth Estimation through Bokeh Generation

Paper • 2512.12425 • Published Dec 13, 2025 • 2