Constructing imaginary discourse trees to improve answering convergent questions
US-2019347297-A1 · Nov 14, 2019 · US
US12380343B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12380343-B2 |
| Application number | US-202016989866-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 10, 2020 |
| Priority date | Aug 10, 2020 |
| Publication date | Aug 5, 2025 |
| Grant date | Aug 5, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and apparatus for complementary evidence identification in natural language inference. A given question is obtained and a set of N passages is obtained from a database. A probability is determined, for each passage of the set of N passages, of a corresponding passage being a supportive passage for the given question and the set of N passages is ranked based on the determined probabilities. M passages that are ranked 1 to M of the set of N passages are selected. A set of L passages is selected based on a plurality of scores, each score assigned to a set of candidate passages of the set of N passages, each score being based on the determined probabilities, the selected M passages, and a weighted regulation parameter. The set of L passages is provided to a computerized machine learning system to answer the question based on the set of L passages.
Opening claim text (preview).
What is claimed is: 1. A method for natural language inference, the method comprising: obtaining a given question for input to a hardware processor; obtaining, using the hardware processor, a set of N passages from an electronic database; determining, using the hardware processor, for each passage of the set of N passages, a probability of a corresponding passage being a supportive passage for the given question, including applying a BERT model to estimate the probability of the passage p; being supporting evidence to the given question q, where a concatenation of g and p; is input into the BERT model, and one or more hidden states of a last layer are used to represent q and p; in vector space, denoted as q and p i , respectively; wherein a fully connected layer f(⋅) followed by sigmoid activation is added to an end of the BERT model; and a scalar Prob(p i |q) is output to estimate a relevancy of passage p i to the given question q, wherein each passage of the set of N passages is designated as p j and the given question is designated as q; beam search ranking, using the hardware processor, the set of N passages based on the determined probabilities and a score function; selecting, using the hardware processor, M passages that are ranked 1 to M of the set of N passages, where M is a hyperparameter that corresponds to a beam size; selecting, using the hardware processor, a set of L passages based on a plurality of scores, each score assigned to a set of candidate passages of the set of N passages, each score being based on the determined probabilities, the selected M passages, and a weighted regulation parameter; providing, using the hardware processor, a set of M highest ranked passages of the set of L passages to a computerized machine learning system to answer the question based on the set of L passages; and answering the question with the computerized machine learning system, wherein the score function for finding a best passage is defined by a summation of: a summation of probabilities of a given passage given a specified question; a first hyperparameter multiplied by a cosine of a summation of multiplications of an encoded question vector and an encoded passage vector; and a second hyperparameter multiplied by a summation of losses between a first encoded passage vector and a second encoded passage vector. 2. The method of claim 1 , wherein the set of N passages comprises a mixture set of passages P=P + ∪P − with one or more passages p∈P + being relevant to the given question and one or more passages p∈P − being irrelevant to the given question, wherein each passage of the set of N passages is designated as p i . 3. The method of claim 1 , wherein: the selected set of L passages is designated as P sel and is similar to the given question q such that P sel has a probability of Σ p i ÅP sel Pr(p i |q) that is higher than an average probability for an unselected set of passages, wherein each passage of the set of N passages is designated as p i and the given question is designated as q; P sel covers all facts asked by the given question q such that a joint set of passages in P sel has a high similarity to the given question q and maximizes cos(Σ i∈{i|p i ∈P sel } p i , q); and P sel covers passages p i having diversity based on an average distance between any pair of passages p i in P sel . 4. The method of claim 3 , wherein the diversity is attained by maximizing Σ i,j∈{i,j|p i ,p j ∈P sel ,i≠j l 1 (p i , p j ) where l 1 (⋅,⋅) denotes an L 1 distance; and wherein coverage is attained by maximizing cos(Σ i∈{i|p i ∈P sel } p i ,q). 5. The method of claim 1 , further comprising training the BERT model using a supervised training objective function based on a set of labeled training examples where, for each training instance (q, P), {p i } + ={p i }, ∀ i ∈{i|p i ∈P + }; {p i } − ={p i }, ∀ i ∈{i|p i ∈P − }; and {p i }={p i } + ∪{p i } − are defined. 6. The method of claim 5 , wherein the supervised training objective function comprises a sum of a cross-entropy loss corresponding to a relevance condition that is at least one of a measure of a similarity and a measure of a dissimilarity between a question vector and each passage vector; a weighted cosine-embedding loss for a coverage condition that is a measure of a similarity between the question vector and a subspace spanned by one or more selected passage vectors; and a weighted regularization of a diversity condition that is a measure of an overall distance among supporting passage vectors. 7. The method of claim 6 , wherein the supervised training objective function is defined as: ℒ ( { p i } ; q ; y ) = ℒ sup ( { p i } ; q ; y ) + αℒ c ( { p i } ; q ; y ) + βℒ d ( { p i } + )
Supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Machine learning · CPC title
Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.