System and method for reinforcement learning based controlled natural language generation

US11586830B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11586830-B2
Application numberUS-202016891311-A
CountryUS
Kind codeB2
Filing dateJun 3, 2020
Priority dateJun 3, 2020
Publication dateFeb 21, 2023
Grant dateFeb 21, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for reinforcement learning based controlled natural language generation is disclosed. The system includes a token generator subsystem to generate an initial output phrase including a sequence of output tokens. The system includes trained models associated with corresponding predefined tasks. Each trained model includes an attention layer to compute attention-based weights for each output token. The trained models include a scoring layer to generate a phrase sequence level score for the output phrase. The trained models include a reward generation layer to generate dense rewards for each output token based on the attention-based weights and the phrase sequence level score. The trained models include a feedback score generation layer to generate a feedback score based on the dense rewards and reward weights assigned to the dense rewards of the corresponding trained models. The feedback score generation layer provides the feedback score iteratively to the token generator subsystem.

First claim

Opening claim text (preview).

We claim: 1. A system for reinforcement learning based controlled natural language generation comprising: a token generator subsystem configured to generate an initial output phrase comprising a sequence of a plurality of output tokens based on a sequence of a plurality of input tokens corresponding to a natural language text; and an output validation subsystem comprises a plurality of trained models associated with a corresponding plurality of predefined tasks wherein each of the plurality of trained model comprises: an attention layer configured to compute a plurality of attention-based weights for each of the corresponding plurality of output tokens generated by the token generator subsystem and wherein the attention layer is configured to assign a plurality of encoding codes based on a plurality of embeddings corresponding to one or more attributes-based output tokens and one or more content-based output tokens of the plurality of output tokens; a scoring layer configured to generate a phrase sequence level score for the initial output phrase comprising the plurality of output tokens generated by the token generator subsystem; a reward generation layer configured to generate a plurality of dense rewards for each of the plurality of output tokens based on the plurality of attention-based weights computed by the attention layer and the phrase sequence level score generated by the scoring layer; and a feedback score generation layer configured to: generate a feedback score based on the plurality of dense rewards generated by the reward generation layer and a plurality of reward weights assigned to the plurality of dense rewards of the corresponding plurality of trained models; and provide the feedback score iteratively to the token generator subsystem to generate a resultant output phrase based on comparison with the initial output phrase. 2. The system as claimed in claim 1 , wherein the plurality of output tokens comprises a plurality of words. 3. The system as claimed in claim 1 , wherein the natural language text comprises a text of a first style. 4. The system as claimed in claim 3 , wherein the initial output phrase comprises a text of a second style different from the first style. 5. The system as claimed in claim 1 , wherein the token generator subsystem is configured to generate the output phrase by sampling from a vocabulary at each timestamp. 6. The system as claimed in claim 1 , wherein the plurality of trained models associated with the corresponding plurality of predefined tasks comprises a sentiment analysis, a content analysis, and a fluency analysis. 7. The system as claimed in claim 1 , wherein the attention layer is configured to compute the plurality of attention-based weights by applying higher attention on the plurality of encoding codes corresponding to the one or more attributes based output tokens than the plurality of encoding codes corresponding to the one or more content based output tokens. 8. The system as claimed as claim 1 , wherein the feedback score generation layer is configured to generate a feedback score by computing a weighted average of the plurality of dense rewards of the corresponding plurality of trained models. 9. The system of claim 1 , wherein the reinforcement learning based natural language generation comprises text style transfer, machine translation and summarization. 10. A method comprising: generating, by a token generator subsystem, an initial output phrase comprising a sequence of a plurality of output tokens based on a sequence of a plurality of input tokens corresponding to a natural language text; computing, by an attention layer of a plurality of trained models, a plurality of attention-based weights for each of the corresponding plurality of output tokens wherein computing the plurality of attention-based weights comprises assigning a plurality of encoding codes based on a plurality of embeddings corresponding to one or more attributes-based output tokens and one or more content-based output tokens of the plurality of output tokens; generating, by a scoring layer of the plurality of trained models, a phrase sequence level score for the initial output phrase comprising the plurality of output tokens; generating, by a reward generation layer of the plurality of trained models, a plurality of dense rewards for each of the plurality of output tokens based on the plurality of attention-based weights and the phrase sequence level score; generating, by a feedback score generation layer of the plurality of trained models, a feedback score based on the plurality of dense rewards and a plurality of reward weights assigned to the plurality of dense rewards of the corresponding plurality of trained models; and providing, by the feedback score generation layer of the plurality of trained models, the feedback score iteratively to the token generator subsystem to generate a resultant output phrase based on comparison with the initial output phrase. 11. The method as claimed in claim 10 , wherein computing the plurality of attention-based weights comprises applying higher attention on the plurality of encoding codes corresponding to the one or more attributes-based output tokens than the plurality of encoding codes corresponding to the one or more content-based output token. 12. The method as claimed in claim 10 , wherein generating the initial output phrase comprises generating the output phrase by sampling from a vocabulary at each timestamp. 13. The method as claimed in claim 10 , wherein generating the feedback score comprises computing a weighted average of the plurality of dense rewards of the corresponding plurality of trained models.

Assignees

Inventors

Classifications

  • Machine learning · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • G06F40/40Primary

    Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

  • Natural language generation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11586830B2 cover?
A system for reinforcement learning based controlled natural language generation is disclosed. The system includes a token generator subsystem to generate an initial output phrase including a sequence of output tokens. The system includes trained models associated with corresponding predefined tasks. Each trained model includes an attention layer to compute attention-based weights for each outp…
Who is the assignee on this patent?
Coinbase Inc, Pm Labs Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 21 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).