Neural machine translation systems with rare word processing

US10133739B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10133739-B2
Application numberUS-201514921925-A
CountryUS
Kind codeB2
Filing dateOct 23, 2015
Priority dateOct 24, 2014
Publication dateNov 20, 2018
Grant dateNov 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural translation systems with rare word processing. One of the methods is a method training a neural network translation system to track the source in source sentences of unknown words in target sentences, in a source language and a target language, respectively and includes deriving alignment data from a parallel corpus, the alignment data identifying, in each pair of source and target language sentences in the parallel corpus, aligned source and target words; annotating the sentences in the parallel corpus according to the alignment data and a rare word model to generate a training dataset of paired source and target language sentences; and training a neural network translation model on the training dataset.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented translation system for translating natural language text from a source sentence in a source language to a target sentence in a target language, the translation system comprising one or more computers and one or more storage devices storing translation instructions and translation data, wherein: the translation data includes: a word dictionary; and a neural network translation model trained to process a sequence of inputs corresponding to words in the source sentence thereby generating a sequence of outputs corresponding to words in the target sentence, including emitting a respective unknown token in the sequence of outputs for each out-of-vocabulary (OOV) word that occurs in the target sentence, the model being operable to emit pointer tokens as a first type of unknown token and null tokens as a second type of unknown token, wherein pointer tokens are unknown tokens that identify a respective source word in the source sentence that corresponds to the unknown token, and null tokens are tokens that do not identify any source word in the source sentence that corresponds to the unknown token; and the translation instructions are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: for every pointer token in the target sentence emitted by the neural network translation model from the source sentence, replacing the pointer token according to the corresponding source word in the source sentence. 2. The translation system of claim 1 , wherein replacing every pointer token comprises, for each pointer token: using the word dictionary to perform a word translation from the respective source word in the source sentence that is identified by the pointer token and replacing the pointer token with the result of the word translation, or, if there is no such translation in the word dictionary for the respective source word in the source sentence that is identified by the pointer token, replacing the pointer token in the target sentence with the respective source word from the source sentence. 3. The translation system of claim 1 , wherein: the neural network translation model contains a deep encoder Long Short-Term Memory model and a deep decoder Long Short-Term Memory model, wherein: the encoder is trained to be operable to read the source sentence, one word at a time, to produce a large hidden state that summarizes the entire source sentence; and the decoder is initialized from a final hidden state of the encoder and is trained to be operable to generate a target translation, one word at a time, until the decoder emits an end-of-sentence symbol. 4. The translation system of claim 1 , wherein: the neural network translation model is a deep neural network. 5. The translation system of claim 1 , wherein: the neural network translation model is a large deep Long Short-Term Memory model. 6. The translation system of claim 1 , wherein: the neural network translation model is a six-layer deep Long Short-Term Memory model. 7. A method performed by one or more computers of a translation system, comprising: providing, to a neural network translation model, a sequence of inputs corresponding to words of a source sentence in a source language; obtaining, from the neural network translation model, as a result of the neural network translation model processing the sequence of inputs, a sequence of outputs corresponding to words of a target sentence in a target language, wherein the sequence of outputs includes a pointer token that represents an out-of-vocabulary (OOV) word that occurs in the target sentence, wherein the pointer token identifies a word from the source sentence that corresponds to the OOV word in the target sentence; determining whether a translation in the target language is available in a word dictionary for the word from the source sentence that corresponds to the OOV word in the target sentence; and in response to determining that a translation in the target language is available in the word dictionary for the word from the source sentence that corresponds to the OOV word in the target sentence, replacing the pointer token with the translation for the word from the source sentence that corresponds to the OOV word in the target sentence. 8. The method of claim 7 , wherein: the sequence of outputs further includes a null token that represents a second OOV word that occurs in the target sentence, and the null token does not identify any word from the source sentence that corresponds to the second OOV word in the target sentence. 9. The method of claim 7 , wherein the sequence of outputs further includes a second pointer token that represents a second OOV word that occurs in the target sentence, and the second pointer token identifies a second word from the source sentence that corresponds to the second OOV word in the target sentence; wherein the method further comprises: determining whether a translation in the target language is available in the word dictionary for the second word from the source sentence that corresponds to the second OOV word in the target sentence; and in response to determining that a translation in the target language is not available in the word dictionary for the second word from the source sentence that corresponds to the second OOV word in the target sentence, replacing the second pointer token in the target sentence with the second word from the source sentence in the source language. 10. The method of claim 7 , wherein: the neural network translation model is a deep neural network. 11. The method of claim 7 , wherein: the neural network translation model is a large deep Long Short-Term Memory model. 12. A non-transitory computer-readable medium having instructions stored thereon that, when executed by one or more computers, cause the one or more computers to perform operations comprising: providing, to a neural network translation model, a sequence of inputs corresponding to words of a source sentence in a source language; obtaining, from the neural network translation model, as a result of the neural network translation model processing the sequence of inputs, a sequence of outputs corresponding to words of a target sentence in a target language, wherein the sequence of outputs includes a pointer token that represents an out-of-vocabulary (OOV) word that occurs in the target sentence, wherein the pointer token identifies a word from the source sentence that corresponds to the OOV word in the target sentence; determining whether a translation in the target language is available in a word dictionary for the word from the source sentence that corresponds to the OOV word in the target sentence; and in response to determining that a translation in the target language is available in the word dictionary for the word from the source sentence that corresponds to the OOV word in the target sentence, replacing the pointer token with the translation for the word from the source sentence that corresponds to the OOV word in the target sentence. 13. The computer-readable medium of claim 12 , wherein: the sequence of outputs further includes a null token that represents a second OOV word that occurs in the target sentence, and the null token does not identify any word from the source sentence that corresponds to the second OOV word in the target sentence. 14. The computer-readable medium of claim 12 , wherein the sequence of outputs further includes a second pointer token that represents a second OOV word that occurs in the target sentence, and the second pointer token ide

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • Selecting, i.e. obtaining data of one kind from those record carriers which are identifiable by data of a second kind from a mass of ordered or randomly- distributed record carriers · CPC title

  • Example-based machine translation; Alignment · CPC title

  • using artificial neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10133739B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural translation systems with rare word processing. One of the methods is a method training a neural network translation system to track the source in source sentences of unknown words in target sentences, in a source language and a target language, respectively and includes deriving alignment …
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/242. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).