Language model translation and training method and apparatus

US10509864B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10509864-B2
Application numberUS-201815947915-A
CountryUS
Kind codeB2
Filing dateApr 9, 2018
Priority dateNov 30, 2017
Publication dateDec 17, 2019
Grant dateDec 17, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A language model training method and an apparatus using the language model training method are disclosed. The language model training method includes assigning a context vector to a target translation vector, obtaining feature vectors based on the target translation vector and the context vector, generating a representative vector representing the target translation vector using an attention mechanism for the feature vectors, and training a language model based on the target translation vector, the context vector, and the representative vector.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor implemented language model training method, comprising: assigning a context vector to a target translation vector; obtaining feature vectors based on the target translation vector and the context vector; generating a representative vector representing the target translation vector using an attention mechanism for the feature vectors; and training a language model based on the target translation vector, the context vector, and the representative vector. 2. The method of claim 1 , wherein the assigning of the context vector comprises: obtaining the target translation vector by preprocessing a target translation sentence to be translated. 3. The method of claim 2 , wherein the obtaining of the target translation vector comprises: obtaining the target translation sentence using speech recognition. 4. The method of wherein the assigning of the context vector comprises: assigning the context vector to the target translation vector for each word. 5. The method of claim 1 , wherein the obtaining of the feature vectors, comprises: obtaining the feature vectors by performing character embed ding, on the target translation vector and the context vector. 6. The method of aim 1 , wherein the generating of the representative vector comprises: obtaining a correlation among characters in the target translation vector by performing positional encoding on the feature vectors; and generating the representative vector based on the obtained correlation. 7. The method of claim 1 , wherein he generating of the representative vector comprises: generating the representative vector using forward estimation or backward estimation for the feature vectors. 8. The method of claim 7 , wherein the forward estimation comprises an estimation of which character follows a first character included in the feature vectors, and the backward estimation comprises an estimation of which character follows a second character included in the feature vectors. 9. The method of claim 1 , wherein the language model is based on a recurrent neural network (RNN) of a hierarchical structure. 10. The method of claim 9 , wherein the training of the language model comprises: updating a connection weight included in the RNN based on the target translation vector, the context vector, and the representative vector. 11. A language model training apparatus, comprising: a preprocessor configured to assign a context vector to a target translation vector; and a processor configured to: obtain feature vectors based on the target translation vector and the context vector, generate a representative vector representing the target translation vector using an attention mechanism for the feature vectors, and train a language model based on the target translation vector, the context vector, and the representative vector. 12. The language model training apparatus of claim 11 , wherein the preprocessor is further configured to obtain the target translation vector by preprocessing a target translation sentence to be translated. 13. The language model training apparatus of claim 12 , wherein the preprocessor is further configured to obtain the target, translation sentence using speech recognition. 14. The language model training apparatus of claim 11 , wherein the preprocessor is further configured to assign the context vector to the target translation vector for each word. 15. The language model training apparatus of claim 11 , further comprising a memory storing instructions, which when executed by the processor, cause the processor to perform the obtaining of the feature vectors based on the target translation vector and the context vector, perform the generation of the representative vector representing the target translation vector using the attention mechanism for the feature vectors, and perform the training of the language model based on the target translation vector, the context vector, and the representative vector. 16. The language model training apparatus of claim 11 , wherein the processor comprises: a language model trainer configured to: obtain the feature vectors based on the target translation vector and the context vector; generate the representative vector representing the target translation vector using the attention mechanism for the feature vectors; and train the language model based on the target translation vector, the context vector, and the, representative vector. 17. The language model training apparatus of claim 16 , wherein the language model trainer is further configured to obtain the feature vectors by performing character embedding on the target translation vector and the context vector. 18. The language model training apparatus of claim 16 , wherein the language model trainer is further configured to obtain a correlation among characters in the target translation vector by performing positional encoding on the feature vectors, and generate the representative vector based on the obtained correlation. 19. The language model training apparatus of claim 16 , wherein the language model trainer is further configured to generate the representative vector using forward estimation or backward estimation for the feature vectors. 20. The language model training apparatus of claim 19 , wherein the forward estimation comprises an estimation of which character follows a first character included in the feature vectors, and the backward estimation comprises an estimation of which character follows a second character included in the feature vectors. 21. The language model training apparatus of claim 16 , wherein the language model is based on a recurrent neural network (RNN) of a hierarchical structure. 22. The language model training apparatus of claim 21 , wherein the language model trainer is further configured to update a connection weight included in the RNN based on the target translation vector, the context vector, and the representative vector. 23. The method of claim 1 , wherein respective word-unit target translation vectors are generated for each word included in a target sentence. 24. The method of claim 1 , wherein each of the feature vectors is a vector corresponding to abstracted speech information. 25. The method of claim 24 , wherein the context vector is a query vector, and the attention mechanism comprises an attention function that maps the query vector to an output vector.

Assignees

Inventors

Classifications

  • G06F40/216Primary

    using statistical methods · CPC title

  • Data-driven translation · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Statistical methods, e.g. probability models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10509864B2 cover?
A language model training method and an apparatus using the language model training method are disclosed. The language model training method includes assigning a context vector to a target translation vector, obtaining feature vectors based on the target translation vector and the context vector, generating a representative vector representing the target translation vector using an attention me…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/216. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 17 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).