Systems and methods for end-to-end deep reinforcement learning based coreference resolution

US11630953B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11630953-B2
Application numberUS-201916960014-A
CountryUS
Kind codeB2
Filing dateJul 25, 2019
Priority dateJul 25, 2019
Publication dateApr 18, 2023
Grant dateApr 18, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are embodiments for end-to-end reinforcement learning based coreference resolution models to directly optimize coreference evaluation metrics. Embodiments of a reinforced policy gradient model are disclosed to incorporate reward associated with a sequence of coreference linking actions. Furthermore, maximum entropy regularization may be used for adequate exploration to prevent a model embodiment from prematurely converging to a bad local optimum. Experiments on datasets compared with state-of-the-art methods verified the effectiveness of embodiments.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for training a coreference resolution model comprising: [a] inputting a document comprising a set of text into a policy network to identify mentions in the document; [b] given a current identified mention in the document, using the policy network to obtain a probability distribution of a set of actions in which the set of actions comprise linking the current identified mention with a prior identified mention or not linking the current identified mention to any prior identified mention; [c] selecting an action from the set of actions using the probability distribution of actions; [d] based upon the selected action, updating a coreference graph for the document, in which the coreference graph comprises mentions as nodes and links representing coreference connections between mentions; [e] responsive to the document having another mention, selecting it as the current identified mention and returning to step [b]; [f] responsive to the document not having another mention, outputting the coreference graph for the document; [g] using the outputted coreference graph and ground truth coreference information for the document, computing a reward based upon one or more metrics; [h] using a trajectory of selected actions and the reward to compute a gradient; and [i] updating the policy network using the gradient. 2. The computer-implemented method of claim 1 wherein the policy network is pre-trained using training steps comprising: inputting a set of documents into the policy network that identifies mentions in the documents and generates a coreference graph for each document; using corresponding ground-truth coreference graphs for the document to compute a loss relative to the generated coreference graphs obtained from the policy network; using the loss to update the policy network; and iterating the above training steps until a stopping condition is reached, the step condition comprises one or more criteria from number of epochs, error level, or number of iterations. 3. The computer-implemented method of claim 1 further comprising: repeating the steps of [a]-[f] for the document to obtain a set of coreference graphs and a corresponding set of trajectories of actions for each document in an iterative operation; obtaining a sample set of coreference graphs from the set of coreference graphs; computing a reward for each coreference graph from the sample set of coreference graphs; and using the rewards and trajectories of actions in the sample set to compute a gradient. 4. The computer-implemented method of claim 1 wherein inputting a document comprising a set of text into a policy network to identify mentions in the document comprises: generating, using a character and word embeddings encoder, a plurality of embeddings with each embedding as a concatenation of fixed pretrained word embeddings and convolutional neural network (CNN) character embeddings; computing and concatenating, using a bidirectional Long short-term memory (LSTM) layer, contextualized representation of each word in the input document from two directions; performing iterative operations comprising: generating, with head-finding attention, span representation from the concatenated contextualized representations of each word; obtaining a mention score using a mention feed-forward neural network with a self-attention mechanism based on the generated span representation; obtaining an antecedent score using an antecedent feed-forward neural network with the self-attention mechanism based on the generated span representation; and obtaining a coreference score based on at least the obtained mention score and the generated antecedent score; and computing, using a masked softmax layer, a probability distribution for each mention based at least on the coreference score. 5. The computer-implemented method of claim 4 wherein the probability distribution is only over candidate antecedents for each mention, with probability distribution for mentions after the current mention in the document masked by the masked softmax layer. 6. The computer-implemented method of claim 4 wherein the self-attention mechanism averages over a previous iteration's representations weighted by a normalized coreference scores. 7. The computer-implemented method of claim 4 wherein the generated span representations with probability scores less than a predetermined threshold are pruned from coreference decisions. 8. A system for training a coreference resolution model, comprising at least one processor, and a memory storing instructions, wherein the instructions when executed by the at least one processor, cause the at least one processor to perform the computer-implemented method of claim 1 . 9. A computer-implemented method for coreference resolution using a coreference resolution model comprising: receiving a document comprising a set of words; generating, using a character and word embeddings encoder, a plurality of embeddings with each embedding as a concatenation of fixed word embeddings and convolutional neural network (CNN) character embeddings; computing and concatenating, using a bidirectional Long short-term memory (LSTM) layer, contextualized representation of each word in the document from two directions; performing iterative operations comprising: generating, with head-finding attention, span representation from the concatenated contextualized representations for a current mention; obtaining a mention score using a mention feed-forward neural network with a self-attention mechanism based on the generated span representation; obtaining an antecedent score using an antecedent feed-forward neural network with the self-attention mechanism based on the generated span representation; and obtaining a coreference score for the current mention based on at least the obtained mention score and the obtained antecedent score; computing, using a masked softmax layer, probability distribution over a set of actions in which the set of actions comprise linking the current identified mention with a prior identified mention or not linking the current identified mention to any prior identified mention for the current mention based at least on the coreference score; selecting an action from the set of actions using the probability distribution of actions; and based upon the selected action, updating a coreference graph for the document, in which the coreference graph comprises mentions as nodes and links representing coreference connections between mentions. 10. The computer-implemented method of claim 9 wherein the coreference resolution model is pretrained using steps comprising: inputting a training document into the coreference resolution model to generate a set of coreference graphs and a corresponding set of trajectories of actions for each document in an iterative operation; obtaining a sample set of coreference graphs from the set of coreference graphs; computing a reward for each coreference graph from the sample set of coreference graphs; using the rewards and trajectories of actions in the sample set to compute a gradient; and using the gradient to update parameters of the coreference resolution model. 11. The computer-implemented method of claim 10 wherein the gradient further comprising an entropy regularization parameter to control exploration of the set of trajectories of actions. 12. The computer-implemented method of claim 11 wherein the set of trajectories of actions are sampled based on a current policy when the entropy regularization parameter is set as 0. 13. The computer-implemented method of claim 11 wherein

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Reinforcement learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

  • Probabilistic or stochastic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11630953B2 cover?
Described herein are embodiments for end-to-end reinforcement learning based coreference resolution models to directly optimize coreference evaluation metrics. Embodiments of a reinforced policy gradient model are disclosed to incorporate reward associated with a sequence of coreference linking actions. Furthermore, maximum entropy regularization may be used for adequate exploration to prevent …
Who is the assignee on this patent?
Baidu Usa Llc, Baidu Com Times Tech Beijing Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 18 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).