Computer Science > Machine Learning

arXiv:1506.03099 (cs)

[Submitted on 9 Jun 2015 (v1), last revised 23 Sep 2015 (this version, v3)]

Title:Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Authors:Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer

View PDF

Abstract:Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning. The current approach to training them consists of maximizing the likelihood of each token in the sequence given the current (recurrent) state and the previous token. At inference, the unknown previous token is then replaced by a token generated by the model itself. This discrepancy between training and inference can yield errors that can accumulate quickly along the generated sequence. We propose a curriculum learning strategy to gently change the training process from a fully guided scheme using the true previous token, towards a less guided scheme which mostly uses the generated token instead. Experiments on several sequence prediction tasks show that this approach yields significant improvements. Moreover, it was used successfully in our winning entry to the MSCOCO image captioning challenge, 2015.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1506.03099 [cs.LG]
	(or arXiv:1506.03099v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1506.03099

Submission history

From: Samy Bengio [view email]
[v1] Tue, 9 Jun 2015 20:33:47 UTC (117 KB)
[v2] Mon, 15 Jun 2015 15:29:22 UTC (117 KB)
[v3] Wed, 23 Sep 2015 16:35:42 UTC (117 KB)

Computer Science > Machine Learning

Title:Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Submission history

Access Paper:

Current browse context:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Submission history

Access Paper:

Current browse context:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators