Adapted domain specific class means classifier
US-10296846-B2 · May 21, 2019 · US
US11941357B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11941357-B2 |
| Application number | US-202117355731-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 23, 2021 |
| Priority date | Jun 23, 2021 |
| Publication date | Mar 26, 2024 |
| Grant date | Mar 26, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various embodiments of the present invention provide methods, apparatus, systems, computing devices, computing entities, and/or the like for performing text similarity determination. Certain embodiments of the present invention utilize systems, methods, and computer program products that perform text similarity determination by using at least one of Word Mover's Similarity measures, Relaxed Word Mover's Similarity measures, and Related Relaxed Word Mover's Similarity measures.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method comprising: generating, using one or more processors, a maximal word similarity score for a reference text data object and a target text data object, wherein: (i) the maximal word similarity score describes a maximal value of a transition cost value indicative of a measure of cost to transform a first embedded representation associated with one or more target words of the target text data object into a second embedded representation associated with one or more reference words of the reference text data object, and (ii) the transition cost value is determined based at least in part on, for each word pair comprising a reference word and a target word, a word-wise flow value and, a word-wise similarity value; generating, using the one or more processors, a predicted similarity score for the reference text data object and the target text data object based at least in part on the maximal word similarity score; and initiating, using the one or more processors, the performance of one or more prediction-based actions based at least in part on the predicted similarity score. 2. The computer-implemented method of claim 1 , wherein maximizing the transition cost value is performed in accordance with a maximization constraint requiring that a sum of each word-wise flow value for a particular reference word of the one or more reference words is equal to a document-wide word weight value for the particular reference word in the reference text data object. 3. The computer-implemented method of claim 2 , wherein the document-wide word weight value is determined based at least in part on: (i) a term frequency value of the particular reference word in the reference text data object, and (ii) a sum of each term frequency value for the one or more reference words in the reference text data object. 4. The computer-implemented method of claim 1 , wherein maximizing the transition cost value is performed in accordance with a maximization constraint requiring that a sum of each word-wise flow value for a particular target word of the one or more target words is equal to a document-wide word weight value for the particular target word in the target text data object. 5. The computer-implemented method of claim 4 , wherein the document-wide word weight value is determined based at least in part on: (i) a term frequency value of the particular target word in the target text data object, and (ii) a sum of each term frequency value for the one or more target words in the target text data object. 6. The computer-implemented method of claim 1 , wherein: the target text data object is selected from a plurality of candidate target text data objects, and the computer-implemented method comprises: generating, using the one or more processors and for each candidate target text data object of the plurality of candidate target text data objects other than the target text data object, a candidate maximal word similarity score; and generating, using the one or more processors, a ranked similarity list based at least in part on the maximal word similarity score and each candidate maximal word similarity score. 7. The computer-implemented method of claim 6 , wherein maximizing the transition cost value is performed in accordance with a maximization constraint requiring that a sum of each word-wise flow value for a particular target word of the one or more target words is equal to a document-wide word weight value for the particular target word in the target text data object. 8. The computer-implemented method of claim 7 , wherein the document-wide word weight value is determined based at least in part on: (i) a term frequency value of the particular target word in the target text data object, and (ii) a sum of each term frequency value for the one or more target words in the target text data object. 9. The computer-implemented method of claim 1 , wherein: the target text data object is selected from a plurality of candidate target text data objects, the plurality of candidate target text data objects are associated with a graph hierarchical structure, and generating the predicted similarity score comprises: generating a raw predicted similarity score for the target text data object based at least in part on the maximal word similarity score, traversing the graph hierarchical structure in accordance with a set of breadth first search iterations to identify to determine one or more sibling relationships for the target text data object, wherein each sibling relationship is associated with a second target text data object of the plurality of candidate target text data objects, and assigning a zero-valued predicted similarity score to the target text data object if at least one of the one or more sibling relationships is associated with a second target text data object that has a second raw predicted similarity score that exceeds the raw predicted similarity score of the target text data object. 10. The computer-implemented method of claim 1 , wherein determining each word-wise similarity value that is associated with a particular reference word and a particular target word comprises: determining whether the particular target word is in a threshold-satisfying target word list for the particular target word; and in response to determining that the particular target word is not in the threshold-satisfying target word list, determining the word-wise similarity value based at least in part on a predefined minimal word-wise similarity value. 11. A computing system comprising one or more processors and at least one memory including program code, the at least one memory and the program code configured to, with the one or more processors, cause the computing system to at least: generate a maximal word similarity score for a reference text data object and a target text data object, wherein: (i) the maximal word similarity score describes a maximal value of a transition cost value indicative of a measure of cost to transform a first embedded representation associated with one or more target words of the target text data object into a second embedded representation associated with one or more reference words of the reference text data object, and (ii) the transition cost value is determined based at least in part on, for each word pair comprising a reference word and a target word, a word-wise flow value and a word-wise similarity value; generate a predicted similarity score for the reference text data object and the target text data object based at least in part on the maximal word similarity score; and initiate the performance of one or more prediction-based actions based at least in part on the predicted similarity score. 12. The computing system of claim 11 , wherein maximizing the transition cost value is performed in accordance with a maximization constraint requiring that a sum of each word-wise flow value for a particular reference word of the one or more reference words is equal to a document-wide word weight value for the particular reference word in the reference text data object. 13. The computing system of claim 12 , wherein the document-wide word weight value is determined based at least in part on: (i) a term frequency value of the particular reference word in the reference text data object, and (ii) a sum of each term frequency value for the one or more reference words in the reference text data object. 14. The computing system of claim 11 , wherein maximizing the transition cost value is performed in accordance with a maximization constraint requiring that a sum of each word-wise flow value for a particular t
Recognition of textual entities · CPC title
by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation · CPC title
Matching criteria, e.g. proximity measures · CPC title
Hierarchical processing, e.g. outlines · CPC title
Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.