TY - GEN
T1 - LOW-COMPLEXITY ATTENTION MODELLING VIA GRAPH TENSOR NETWORKS
AU - Xu, Yao Lei
AU - Konstantinidis, Kriton
AU - Li, Shengxi
AU - Stanković, Ljubiša
AU - Mandic, Danilo P.
N1 - Publisher Copyright:
© 2022 IEEE
PY - 2022
Y1 - 2022
N2 - The attention mechanism is at the core of modern Natural Language Processing (NLP) models, owing to its ability to focus on the most contextually relevant part of a sequence. However, current attention models rely on "flat-view" matrix methods to process tokens embedded in vector spaces; this results in exceedingly high parameter complexity which is prohibitive for practical applications. To this end, we introduce a novel Tensorized Graph Attention (TGA) mechanism, which leverages on the recent Graph Tensor Network (GTN) framework to efficiently process tensorized token embeddings via attention based graph filters. Such tensorized token embeddings are shown to effectively bypass the Curse of Dimensionality, reducing the parameter complexity of the attention mechanism from an exponential to a linear one in the embedding dimensions. The expressive power of the TGA framework is further enhanced by virtue of domain-aware graph convolution filters. Simulations across benchmark NLP paradigms verify the advantages of the proposed framework over existing attention models, at drastically lower parameter complexity.
AB - The attention mechanism is at the core of modern Natural Language Processing (NLP) models, owing to its ability to focus on the most contextually relevant part of a sequence. However, current attention models rely on "flat-view" matrix methods to process tokens embedded in vector spaces; this results in exceedingly high parameter complexity which is prohibitive for practical applications. To this end, we introduce a novel Tensorized Graph Attention (TGA) mechanism, which leverages on the recent Graph Tensor Network (GTN) framework to efficiently process tensorized token embeddings via attention based graph filters. Such tensorized token embeddings are shown to effectively bypass the Curse of Dimensionality, reducing the parameter complexity of the attention mechanism from an exponential to a linear one in the embedding dimensions. The expressive power of the TGA framework is further enhanced by virtue of domain-aware graph convolution filters. Simulations across benchmark NLP paradigms verify the advantages of the proposed framework over existing attention models, at drastically lower parameter complexity.
KW - Attention
KW - Compression
KW - Graph Neural Networks
KW - Tensor Decomposition
KW - Tensor-Train Decomposition
UR - https://www.scopus.com/pages/publications/85134001998
U2 - 10.1109/ICASSP43922.2022.9747875
DO - 10.1109/ICASSP43922.2022.9747875
M3 - 会议稿件
AN - SCOPUS:85134001998
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 3928
EP - 3932
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022
Y2 - 22 May 2022 through 27 May 2022
ER -