Structure aware transformers for natural language processing
US-2024370714-A1 · Nov 7, 2024 · US
US10509864B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10509864-B2 |
| Application number | US-201815947915-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 9, 2018 |
| Priority date | Nov 30, 2017 |
| Publication date | Dec 17, 2019 |
| Grant date | Dec 17, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A language model training method and an apparatus using the language model training method are disclosed. The language model training method includes assigning a context vector to a target translation vector, obtaining feature vectors based on the target translation vector and the context vector, generating a representative vector representing the target translation vector using an attention mechanism for the feature vectors, and training a language model based on the target translation vector, the context vector, and the representative vector.
Opening claim text (preview).
What is claimed is: 1. A processor implemented language model training method, comprising: assigning a context vector to a target translation vector; obtaining feature vectors based on the target translation vector and the context vector; generating a representative vector representing the target translation vector using an attention mechanism for the feature vectors; and training a language model based on the target translation vector, the context vector, and the representative vector. 2. The method of claim 1 , wherein the assigning of the context vector comprises: obtaining the target translation vector by preprocessing a target translation sentence to be translated. 3. The method of claim 2 , wherein the obtaining of the target translation vector comprises: obtaining the target translation sentence using speech recognition. 4. The method of wherein the assigning of the context vector comprises: assigning the context vector to the target translation vector for each word. 5. The method of claim 1 , wherein the obtaining of the feature vectors, comprises: obtaining the feature vectors by performing character embed ding, on the target translation vector and the context vector. 6. The method of aim 1 , wherein the generating of the representative vector comprises: obtaining a correlation among characters in the target translation vector by performing positional encoding on the feature vectors; and generating the representative vector based on the obtained correlation. 7. The method of claim 1 , wherein he generating of the representative vector comprises: generating the representative vector using forward estimation or backward estimation for the feature vectors. 8. The method of claim 7 , wherein the forward estimation comprises an estimation of which character follows a first character included in the feature vectors, and the backward estimation comprises an estimation of which character follows a second character included in the feature vectors. 9. The method of claim 1 , wherein the language model is based on a recurrent neural network (RNN) of a hierarchical structure. 10. The method of claim 9 , wherein the training of the language model comprises: updating a connection weight included in the RNN based on the target translation vector, the context vector, and the representative vector. 11. A language model training apparatus, comprising: a preprocessor configured to assign a context vector to a target translation vector; and a processor configured to: obtain feature vectors based on the target translation vector and the context vector, generate a representative vector representing the target translation vector using an attention mechanism for the feature vectors, and train a language model based on the target translation vector, the context vector, and the representative vector. 12. The language model training apparatus of claim 11 , wherein the preprocessor is further configured to obtain the target translation vector by preprocessing a target translation sentence to be translated. 13. The language model training apparatus of claim 12 , wherein the preprocessor is further configured to obtain the target, translation sentence using speech recognition. 14. The language model training apparatus of claim 11 , wherein the preprocessor is further configured to assign the context vector to the target translation vector for each word. 15. The language model training apparatus of claim 11 , further comprising a memory storing instructions, which when executed by the processor, cause the processor to perform the obtaining of the feature vectors based on the target translation vector and the context vector, perform the generation of the representative vector representing the target translation vector using the attention mechanism for the feature vectors, and perform the training of the language model based on the target translation vector, the context vector, and the representative vector. 16. The language model training apparatus of claim 11 , wherein the processor comprises: a language model trainer configured to: obtain the feature vectors based on the target translation vector and the context vector; generate the representative vector representing the target translation vector using the attention mechanism for the feature vectors; and train the language model based on the target translation vector, the context vector, and the, representative vector. 17. The language model training apparatus of claim 16 , wherein the language model trainer is further configured to obtain the feature vectors by performing character embedding on the target translation vector and the context vector. 18. The language model training apparatus of claim 16 , wherein the language model trainer is further configured to obtain a correlation among characters in the target translation vector by performing positional encoding on the feature vectors, and generate the representative vector based on the obtained correlation. 19. The language model training apparatus of claim 16 , wherein the language model trainer is further configured to generate the representative vector using forward estimation or backward estimation for the feature vectors. 20. The language model training apparatus of claim 19 , wherein the forward estimation comprises an estimation of which character follows a first character included in the feature vectors, and the backward estimation comprises an estimation of which character follows a second character included in the feature vectors. 21. The language model training apparatus of claim 16 , wherein the language model is based on a recurrent neural network (RNN) of a hierarchical structure. 22. The language model training apparatus of claim 21 , wherein the language model trainer is further configured to update a connection weight included in the RNN based on the target translation vector, the context vector, and the representative vector. 23. The method of claim 1 , wherein respective word-unit target translation vectors are generated for each word included in a target sentence. 24. The method of claim 1 , wherein each of the feature vectors is a vector corresponding to abstracted speech information. 25. The method of claim 24 , wherein the context vector is a query vector, and the attention mechanism comprises an attention function that maps the query vector to an output vector.
using statistical methods · CPC title
Data-driven translation · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Statistical methods, e.g. probability models · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.