Multi-tailed vision transformer for efficient inference
Published in Neural Networks, 2024
This work introduces a novel multi-tailed vision transformer architecture that significantly improves computational efficiency while maintaining accuracy.
Key Contributions
- Designed multiple tails to generate visual sequences of different lengths for the Transformer encoder
- Employed a tail predictor to determine which tail produces the most accurate prediction for each image
- Achieved significant reduction in FLOPs with no accuracy degradation
- Demonstrated generalizability across downstream tasks including object detection
Status: Published in Neural Networks (2024), Vol. 174: 106235
Recommended citation: Yunke Wang, Bo Du, Wenyuan Wang, Chang Xu. "Multi-tailed vision transformer for efficient inference." Neural Networks, 2024, 174: 106235.
Download Paper