CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion
Paper
โข
2512.19535
โข
Published
โข
10
None defined yet.
user,permission,token
nroggendorff,write,hf_...
pepper13,finegrained,hf_...
...,...,...
...