CVPR 2023 - Stare at What You See: Masked Image Modeling without Reconstruction

In this episode we discuss Stare at What You See: Masked Image Modeling without Reconstruction
by Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo. The paper proposes a new approach to Masked Image Modeling (MIM) called MaskAlign. The authors argue that the features extracted by powerful teacher models already contain rich semantic correlations across regions in an intact image, eliminating the need for reconstruction. MaskAlign learns the consistency of visible patch features extracted by the student model and intact image features extracted by the teacher model, and uses a Dynamic Alignment (DA) module to tackle input inconsistency between them. The proposed approach achieves state-of-the-art performance with higher efficiency and is available on GitHub.

CVPR 2023 – Stare at What You See: Masked Image Modeling without Reconstruction