EvaNet: Towards More Efficient Image Fusion Assessment

The race to fuse infrared and visible images has been heating up like a GPU cluster in July - Jiangnan University and the University of Surrey just dropped a paper that doesn't build a better fusion method. Instead, they built a better judge.

The Scoreboard Nobody Trusted

Here's the dirty secret of infrared-visible image fusion (IVIF): everyone's been grading papers with a borrowed pencil. The metrics used to evaluate whether a fused image is any good - things like mutual information, gradient-based quality scores, and structural similarity - were designed for other vision tasks and duct-taped onto fusion assessment like an afterthought. It's the equivalent of judging a cooking competition using a rulebook written for figure skating. Sure, you'll get a number, but does it mean anything?

Chunyang Cheng and collaborators asked this question and apparently didn't love the answer. Their new framework, EvaNet, published in IEEE TPAMI (DOI: 10.1109/TPAMI.2026.3681958), doesn't just propose another metric. It reimagines how we evaluate fusion results entirely - and it does it 1,000 times faster than traditional approaches. That's not a typo. Three orders of magnitude. Your laptop could now do what used to require patience and a prayer.

EvaNet: Towards More Efficient Image Fusion Assessment

Divide, Conquer, and Actually Make Sense

The core move is deceptively elegant. Instead of squinting at a fused image and asking "does this look like both source images?" (which is what conventional metrics do), EvaNet first decomposes the fused result back into its infrared and visible components. Then it checks how much information from each original source survived the fusion process.

Think of it like this: if image fusion is making a smoothie out of strawberries and bananas, traditional metrics taste the smoothie and go "yep, that's fruit." EvaNet separates the strawberry flavor from the banana flavor and rates each one independently. Much more useful if you want to know whether your fusion method actually preserved the thermal signatures from infrared and the texture details from visible light.

The LLM Whisperer

Here's where it gets spicy. During training, the team uses contrastive learning - the self-supervised technique where a model learns by comparing similar and dissimilar pairs (Madhusudana et al., IEEE TIP, 2022). But they added a twist that feels very 2026: they brought in a large language model to provide perceptual scene assessments that guide the training process.

Yes, an LLM is coaching a lightweight vision network on what "good fusion" looks like. It's giving Ratatouille energy - the rat (LLM) guiding the chef (EvaNet) to culinary (visual) excellence, except nobody needs to hide under a hat.

The Consistency Problem (Or: Why Your Metrics Disagree With Your Eyes)

Perhaps the most underrated contribution is EvaNet's consistency evaluation framework - the first of its kind for image fusion. It measures whether a metric actually agrees with human visual perception, using no-reference quality scores and downstream task performance as ground truth.

This matters more than it sounds. A 2025 survey on IVIF (Liu et al., IEEE TPAMI, 2025) catalogued dozens of fusion methods, but comparing them remains a mess because different metrics tell contradictory stories. One metric says Method A wins; another crowns Method B. EvaNet's consistency framework finally gives us a way to ask: "Okay, but which metric is actually right?"

Why Should You Care (Even If You Never Fuse an Image)

Infrared-visible fusion powers real things you encounter: autonomous vehicles seeing pedestrians through fog, surveillance systems working after dark, medical imaging combining thermal and structural data for better diagnoses. A comprehensive review in Artificial Intelligence Review (2025) highlights that fidelity-robustness-efficiency trade-offs remain the central challenge - and you can't optimize trade-offs if your measuring stick is broken.

The same team behind EvaNet has been on an absolute tear, publishing GIFNet at CVPR 2025 (DOI: 10.1109/CVPR52734.2025.02617) for task-agnostic fusion and FusionBooster in IJCV (DOI: 10.1007/s11263-024-02266-6) for boosting fusion performance. EvaNet completes the trilogy: build better fusion, boost it, then evaluate it properly. The Avengers assembled, but for pixel-level assessment.

If you've ever worked with image enhancement - tools like combb2.io use similar principles to upscale and denoise images in-browser - you know that measuring quality is half the battle. A model that's 1,000x faster at scoring fusion quality doesn't just save compute. It makes real-time evaluation feasible, which means fusion methods can finally get feedback during inference, not just after.

Code is available at github.com/AWCXV/EvaNet. The scoreboard just got an upgrade.

References

Cheng, C., Xu, T., Wu, X.-J., Zhou, T., Li, H., Tang, Z., & Kittler, J. (2026). EvaNet: Towards More Efficient and Consistent Infrared and Visible Image Fusion Assessment. IEEE TPAMI. DOI: 10.1109/TPAMI.2026.3681958
Liu, J., Wu, G., Liu, Z., et al. (2025). Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption. IEEE TPAMI, 47, 2349-2369. DOI: 10.1109/TPAMI.2024.3521416
Cheng, C., Xu, T., Feng, Z., Wu, X.-J., et al. (2025). One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion. CVPR 2025, 28102-28112. DOI: 10.1109/CVPR52734.2025.02617
Cheng, C., Xu, T., Wu, X.-J., Li, H., Li, X., & Kittler, J. (2025). FusionBooster: A Unified Image Fusion Boosting Paradigm. IJCV, 133, 3041-3058. DOI: 10.1007/s11263-024-02266-6
Madhusudana, P. C., Birkbeck, N., Wang, Y., Adsumilli, B., & Bovik, A. C. (2022). Image Quality Assessment Using Contrastive Learning. IEEE TIP. DOI: 10.1109/TIP.2022.3181496
Advances and Challenges in Infrared-Visible Image Fusion: A Comprehensive Review. (2025). Artificial Intelligence Review. DOI: 10.1007/s10462-025-11426-0

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.

AIb2.io - AI Research Decoded