Method and apparatus for determining item name, computer device, and storage medium
US-2022254143-A1 · Aug 11, 2022 · US
US12086551B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12086551-B2 |
| Application number | US-202117356037-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 23, 2021 |
| Priority date | Jun 23, 2021 |
| Publication date | Sep 10, 2024 |
| Grant date | Sep 10, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer implemented method determines differences between documents. The method includes parsing a first document and a second document into respective distinct instances of content. The distinct instances of content are classified into different categories. Category specific matching algorithms are applied to each of the respective instances of content to determine a similarity score for each of the respective instances of content. Semantic differences between the first document and the second document are analyzed as a function of the similarity scores. A characterization of the semantic differences is generated.
Opening claim text (preview).
The invention claimed is: 1. A computer implemented method of determining differences between documents, the method comprising: parsing a first document and a second document into respective distinct instances of content; classifying the distinct instances of content into different semantic categories including text, images, and tables; applying category specific matching algorithms to content within each of the respective instances of content to determine a similarity score for each of the respective instances of content to match the respective instances, wherein the category specific category matching algorithms comprise machine learning models trained on labeled respective category training data; analyzing semantic differences between the content within matching respective instances of the first document and the second document as a function of the similarity scores; and generating a characterization of the semantic differences. 2. The method of claim 1 wherein generating a characterization of the semantic differences comprises generating a difference label for pairs of respective instances of matched content. 3. The method of claim 1 wherein generating a characterization of the semantic differences of the degree of differences comprises generating added and removed labels for respective instances of content for unmatched content. 4. The method of claim 1 wherein the semantic differences comprise added, re-ordered, deleted, and modified, and where generating a characterization of the semantic differences comprises generating a count of the semantic differences for each type of semantic difference. 5. The method of claim 1 wherein the similarity score for respective instances of content is determined as a function of similarity of the respective instances of content and similarity of context of the respective instances of content. 6. The method of claim 1 wherein classifying the distinct instances of content into different categories comprises classifying the instances of content into one of a text, an image, or a table category. 7. The method of claim 6 wherein text is further classified into section headings, sections, headers, footers, titles, authors, references, and captions. 8. The method of claim 1 wherein the similarity score of the respective instances of content is a function of each respective instance of content's position with respect to other local identified instances of content. 9. The method of claim 8 wherein image embeddings are compared to determine contexts for respective instances of content comprising images. 10. The method of claim 1 wherein applying category specific matching algorithms to each of the respective instances of content to determine a similarity score for respective instances of content comprises for each category specific matching algorithm: comparing each instance of content of the specific category in the first document to each instance of content of the specific category in the second document; generating a similarity score for each pair of respective instances of content; and selecting the pair with the highest similarity score as a match. 11. The method of claim 10 wherein the category specific matching algorithm comprises a text matching algorithm, and wherein applying the text matching algorithm to text instances of content comprises recursively: matching sequences of text from the respective instances of text; unmatching sequences of text and evaluating longer sequences of text for matches; and matching the longer sequences of text. 12. The method of claim 1 wherein characterizing the semantic differences is performed for each matched instance of content and for each unmatched instance of content. 13. The method of claim 1 wherein respective instances are matched based on having similar content and on being in similar locations within the respective first and second documents. 14. A machine-readable storage device having instructions for execution by a processor of a machine to cause the processor to perform operations to perform a method, the operations comprising: parsing a first document and a second document into respective distinct instances of content; classifying the distinct instances of content into different semantic categories including text, images, and tables; applying category specific matching algorithms to content within each of the respective instances of content to determine a similarity score for each of the respective instances of content to match the respective instances, wherein the category specific category matching algorithms comprise machine learning models trained on labeled respective category training data; analyzing semantic differences between the content within matching respective instances of the first document and the second document as a function of the similarity scores; and generating a characterization of the semantic differences. 15. The device of claim 14 wherein generating a characterization of the semantic differences comprises generating a difference label for pairs of respective instances of matched content and wherein generating a characterization of the semantic differences of the degree of differences comprises generating added and removed labels for respective instances of content for unmatched content. 16. The device of claim 14 wherein the semantic differences comprise added, re-ordered, deleted, and modified, and where generating a characterization of the semantic differences comprises generating a count of the semantic differences for each type of semantic difference. 17. The device of claim 14 wherein applying category specific matching algorithms to each of the respective instances of content to determine a similarity score for respective instances of content comprises for each category specific matching algorithm: comparing each instance of content of the specific category in the first document to each instance of content of the specific category in the second document; generating a similarity score for each pair of respective instances of content; and selecting the pair with the highest similarity score as a match. 18. The device of claim 17 wherein the category specific matching algorithm comprises a text matching algorithm, and wherein applying the text matching algorithm to text instances of content comprises recursively: matching sequences of text from the respective instances of text; unmatching sequences of text and evaluating longer sequences of text for matches; and matching the longer sequences of text. 19. A device comprising: a processor, and a memory device coupled to the processor and having a program stored thereon for execution by the processor to perform operations comprising: parsing a first document and a second document into respective distinct instances of content; classifying the distinct instances of content into different semantic categories including text, images, and tables, applying category specific matching algorithms to content within each of the respective instances of content to determine a similarity score for each of the respective instances of content to match the respective instances, wherein the category specific category matching algorithms comprise machine learning models trained on labeled respective category training data; analyzing semantic differences between the content within matching respective instances of the first document and the second document as a function of the similarity scores; and generating a characterization of the semantic dif
Multiple classes · CPC title
Document matching, e.g. of document images · CPC title
Machine learning · CPC title
Parsing · CPC title
Generative networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.