Adaptive Window Pruning for Efficient Local Motion Deblurring

Local motion blur commonly occurs in real-world photography due to the mixing between moving objects and stationary backgrounds during exposure. Existing image deblurring methods predominantly focus on global deblurring, inadvertently affecting the sharpness of backgrounds in locally blurred images and wasting unnecessary computation on sharp pixels, especially for high-resolution images. This paper aims to adaptively and efficiently restore high-resolution locally blurred images. We propose a local motion deblurring vision Transformer (LMD-ViT) built on adaptive window pruning Transformer blocks (AdaWPT). To focus deblurring on local regions and reduce computation, AdaWPT prunes unnecessary windows, only allowing the active windows to be involved in the deblurring processes. The pruning operation relies on the blurriness confidence predicted by a confidence predictor that is trained end-to-end using a reconstruction loss with Gumbel-Softmax re-parameterization and a pruning loss guided by annotated blur masks. Our method removes local motion blur effectively without distorting sharp regions, demonstrated by its exceptional perceptual and quantitative improvements (+0.24dB) compared to state-of-the-art methods. In addition, our approach substantially reduces FLOPs by 66\% and achieves more than a twofold increase in inference speed compared to Transformer-based deblurring methods.

Key Approach

We introduce LMD-ViT, a Transformer-based local motion deblurring method with an adaptive window pruning mechanism. We prune unnecessary windows based on the predicted blurriness confidence supervised by our blur region annotation. In this process, the feature maps are pruned at varying levels of granularity within blocks of different resolutions. Unlike global deblurring methods that modify global regions, LMD-ViT performs dense computing only on the active windows of blurry regions. Consequently, local blurs are efficiently removed without distorting sharp regions.

Architecture of LMD-ViT. LMD-ViT is built on a U-shape encoder-decoder structure with AdaWPT blocks. Each AdaWPT block can be further divided into an AdaWPT-F block and several AdaWPT-P blocks. The images with masks below each block depict visualizations of the window pruning effect. The masks indicate the unpruned windows on locally blurred patches and exhibit different levels of pruning granularity.

Structure of AdaWPT in training and inference phase. There are two kinds of AdaWPT: AdaWPT-F and AdaWPT-P. AdaWPT-F predicts the confidence of blurriness (``C'') and pruning decisions (``D''), and AdaWPT-P follows the pruning decisions. Both AdaWPT-F and AdaWPT-P prune windows in Transformer layers. ``X'' denotes the feature map.
Deblurring Results

MouseOver: LMD-ViT deblurred images

MouseOut: Locally blurred images

We referred to the project page of A-ViT when creating this project page.