跳到主要导航 跳到搜索 跳到主要内容

On Scalar Embedding of Relative Positions in Attention Models

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Attention with positional encoding has been demonstrated as a powerful component in modern neural network models, such as transformers. However, why positional encoding works well in attention models remains largely unanswered. In this paper, we study the scalar relative positional encoding (SRPE) proposed in the T5 transformer. Such an encoding method has two features. First, it uses a scalar to embed relative positions. Second, the relative positions are bucketized using a fixed heuristic algorithm, and positions in the same bucket share the same embedding. In this work, we show that SRPE in attention has an elegant probabilistic interpretation. More specifically, the positional encoding serves to produce a prior distribution for the attended positions. The resulting attentive distribution can be viewed as a posterior distribution of the attended position given the observed input sequence. Furthermore, we propose a new SRPE (AT5) that adopts a learnable bucketization protocol and automatically adapts to the dependency range specific to the learning task. Empirical studies show that the AT5 achieves superior performance than the T5's SRPE.

源语言英语
主期刊名35th AAAI Conference on Artificial Intelligence, AAAI 2021
出版商Association for the Advancement of Artificial Intelligence
14050-14057
页数8
ISBN(电子版)9781713835974
DOI
出版状态已出版 - 2021
活动35th AAAI Conference on Artificial Intelligence, AAAI 2021 - Virtual, Online
期限: 2 2月 20219 2月 2021

出版系列

姓名35th AAAI Conference on Artificial Intelligence, AAAI 2021
16

会议

会议35th AAAI Conference on Artificial Intelligence, AAAI 2021
Virtual, Online
时期2/02/219/02/21

指纹

探究 'On Scalar Embedding of Relative Positions in Attention Models' 的科研主题。它们共同构成独一无二的指纹。

引用此