Multi-tailed vision transformer for efficient inference

Published in Neural Networks, 2024

This work introduces a novel multi-tailed vision transformer architecture that significantly improves computational efficiency while maintaining accuracy.

Key Contributions

  • Designed multiple tails to generate visual sequences of different lengths for the Transformer encoder
  • Employed a tail predictor to determine which tail produces the most accurate prediction for each image
  • Achieved significant reduction in FLOPs with no accuracy degradation
  • Demonstrated generalizability across downstream tasks including object detection

Status: Published in Neural Networks (2024), Vol. 174: 106235

Recommended citation: Yunke Wang, Bo Du, Wenyuan Wang, Chang Xu. "Multi-tailed vision transformer for efficient inference." Neural Networks, 2024, 174: 106235.
Download Paper