Transformers on Edge Devices? Monash U’s Energy-Saving Attention With Linear Complexity Reduces Compute Cost by 73% | Synced

2022-09-24 03:46:55 By : Mr. ydel ydel

AI Technology & Industry Review

56 Temperance St, #700 Toronto, ON M5H 3V5

In the new paper EcoFormer: Energy-Saving Attention with Linear Complexity, a Monash University research team presents EcoFormer, an attention mechanism with linear complexity that replaces expensive multiply-accumulate operations with simple accumulations and achieves a 73 percent energy footprint reduction on ImageNet.

While transformer architectures have achieved remarkable success in recent years thanks to their impressive representational power, their quadratic complexity entails a prohibitively high energy consumption that hinders their deployment in many real-life applications, especially on resource-constrained edge devices.

A Monash University research team addresses this issue in the new paper EcoFormer: Energy-Saving Attention with Linear Complexity, proposing an attention mechanism with linear complexity — EcoFormer — that replaces expensive multiply-accumulate operations with simple accumulations and achieves a 73 percent energy footprint reduction on ImageNet.

The team summarizes their main contributions as follows:

The basic idea informing this work is to reduce attention’s high cost by applying binary quantization to the kernel embeddings to replace energy-expensive multiplications with energy-efficient bit-wise operations. The researchers note however that conventional binary quantization methods focus only on minimizing the quantization error between the full-precision and binary values, which fails to preserve the pairwise semantic similarity among attention’s tokens and thus negatively impacts performance.

To mitigate this issue, the team introduces a novel binarization method that uses kernelized hashing with a Gaussian Radial Basis Function (RBF) to map the original high-dimensional query/key pairs to low-dimensional similarity-preserving binary codes. EcoFormer effectively leverages this binarization method to maintain semantic similarity in attention while approximating the self-attention in linear time with a lower energy cost.

In their empirical study, the team compared the proposed EcoFormer with standard multi-head self-attention (MSA) on ImageNet1K. The results show that EcoFormer can reduce energy consumption by 73 percent while incurring only a 0.33 percent performance tradeoff.

Overall, the proposed EcoFormer energy-saving attention mechanism with linear complexity represents a promising approach for alleviating the cost bottleneck that has limited the deployment of transformer models. In future work, the team plans to explore binarizing transformers’ value vectors in attention, multi-layer perceptrons and non-linearities to further reduce energy cost; and to extend EcoFormer to NLP tasks such as machine translation and speech analysis. The Code will be available on the project’s GitHub. The paper EcoFormer: Energy-Saving Attention with Linear Complexity is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

Machine Intelligence | Technology & Industry | Information & Analysis

Your email address will not be published. Required fields are marked *

Notify me of follow-up comments by email.

Notify me of new posts by email.

56 Temperance St, #700 Toronto, ON M5H 3V5

One Broadway, 14th Floor, Cambridge, MA 02142

75 E Santa Clara St, 6th Floor, San Jose, CA 95113

Contact Us @ global.general@jiqizhixin.com