Computer Science > Computation and Language

arXiv:1803.02155 (cs)

[Submitted on 6 Mar 2018 (v1), last revised 12 Apr 2018 (this version, v2)]

Title:Self-Attention with Relative Position Representations

Authors:Peter Shaw, Jakob Uszkoreit, Ashish Vaswani

View PDF

Abstract:Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we present an alternative approach, extending the self-attention mechanism to efficiently consider representations of the relative positions, or distances between sequence elements. On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1.3 BLEU and 0.3 BLEU over absolute position representations, respectively. Notably, we observe that combining relative and absolute position representations yields no further improvement in translation quality. We describe an efficient implementation of our method and cast it as an instance of relation-aware self-attention mechanisms that can generalize to arbitrary graph-labeled inputs.

Comments:	NAACL 2018
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1803.02155 [cs.CL]
	(or arXiv:1803.02155v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1803.02155

Submission history

From: Peter Shaw [view email]
[v1] Tue, 6 Mar 2018 13:13:11 UTC (50 KB)
[v2] Thu, 12 Apr 2018 18:51:33 UTC (51 KB)

Computer Science > Computation and Language

Title:Self-Attention with Relative Position Representations

Submission history

Access Paper:

Current browse context:

References & Citations

2 blog links

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Self-Attention with Relative Position Representations

Submission history

Access Paper:

Current browse context:

References & Citations

2 blog links

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators