Embedding Natural Language Context in Structured Documents Using Document Anatomy
US-2020175114-A1 · Jun 4, 2020 · US
US11630869B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11630869-B2 |
| Application number | US-202016806438-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 2, 2020 |
| Priority date | Mar 2, 2020 |
| Publication date | Apr 18, 2023 |
| Grant date | Apr 18, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment provides a method, including: obtaining at least two documents, wherein one of the at least two documents comprises a revision different than another of the at least two documents; identifying, within each of the at least two documents, portions corresponding to groups of text containing a conceptual unit; assigning at least a subset of the identified portions to a category type corresponding to a topic of a given portion, wherein the assigning comprises (i) generating a semantic tag for the identified portions in the subset and (ii) tagging the identified portions in the subset with the semantic tag; and determining changes between the at least two documents, wherein the determining comprises (iii) aligning given portions across the at least two documents based upon a relationship between the given portions across the at least two documents, (iv) identifying semantic differences between the aligned portions, and (v) identifying any remaining unaligned portions.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: obtaining at least two documents, wherein one of the at least two documents comprises a revision different than another of the at least two documents; identifying, within each of the at least two documents, processing units corresponding to contextually-related and positionally-connected groups of textual conceptual units, wherein the identifying comprises assigning each of the processing units to a category type and labelling each of the processing units with the category type; assigning at least a subset of the identified processing units to a category type corresponding to a topic of a given portion, wherein the assigning comprises (i) generating a semantic tag for each of the identified processing units in the subset and (ii) tagging each of the identified processing units in the subset with the semantic tag, wherein the semantic tag corresponds to a label of the category type for and identifies a topic of a given of the identified processing units; enriching the at least a subset of the identified processing units with custom attributes, wherein the custom attributes define areas of focus of change that correspond to processing units having changes that are to be identified as differences, wherein the custom attributes are defined in a dictionary form; and determining changes between the at least two documents, wherein the determining comprises (iii) aligning, utilizing rules learned using a classifier, given processing units across the at least two documents based upon a relationship between the given processing units across the at least two documents, (iv) for given processing units across the at least two documents having a custom attribute, identifying a change as a change and for given processing units across the at least two documents not having an associated custom attribute, identifying semantic differences between the aligned processing units, and (v) identifying any remaining unaligned processing units, wherein the aligning comprises identifying given processing units across the at least two documents having a same semantic tag, wherein changes between the at least two documents corresponding to changes not related to a target category are indicated as no change. 2. The method of claim 1 , comprising receiving, from a user, a query requesting identification of a change between the at least two documents related to a particular category type of interest. 3. The method of claim 2 , wherein the identifying is performed responsive to receiving the user query. 4. The method of claim 2 , wherein the generating a semantic tag is based upon terms included in the received query. 5. The method of claim 2 , comprising providing, responsive to the determining a change, a natural language identification of a change corresponding to the user query. 6. The method of claim 1 , comprising learning alignment rules by generating a decision tree classifier that is trained utilizing supervised data comprising a training set of (i) portions and (ii) a change status of the processing units; and wherein the defined rules are used in aligning the processing units across the at least two documents. 7. The method of claim 1 , comprising providing an explanation of the determined changes, the explanation identifying a rule used to determine a change. 8. The method of claim 1 , wherein the unaligned processing units are identified as at least one of: added processing units and removed processing units; and wherein the aligned processing units having semantic differences are identified as differences. 9. An apparatus, comprising: at least one processor; and a computer readable storage medium having computer readable program code embodied therewith and executable by the at least one processor, the computer readable program code comprising: computer readable program code configured to obtain at least two documents, wherein one of the at least two documents comprises a revision different than another of the at least two documents; computer readable program code configured to identify, within each of the at least two documents, processing units corresponding to contextually-related and positionally-connected groups of textual conceptual units, wherein the identifying comprises assigning each of the processing units to a category type and labelling each of the processing units with the category type; computer readable program code configured to assign at least a subset of the identified processing units to a category type corresponding to a topic of a given portion, wherein the assigning comprises (i) generating a semantic tag for each of the identified processing units in the subset and (ii) tagging each of the identified processing units in the subset with the semantic tag, wherein the semantic tag corresponds to a label of the category type for and identifies a topic of a given of the identified processing units; computer readable program code configured to enrich the at least a subset of the identified processing units with custom attributes, wherein the custom attributes define areas of focus of change that correspond to processing units having changes that are to be identified as differences, wherein the custom attributes are defined in a dictionary form; and computer readable program code configured to determine changes between the at least two documents, wherein the determining comprises (iii) aligning, utilizing rules learned using a classifier, given processing units across the at least two documents based upon a relationship between the given processing units across the at least two documents, (iv) for given processing units across the at least two documents having a custom attribute, identifying a change as a change and for given processing units across the at least two documents not having an associated custom attribute, identifying semantic differences between the aligned processing units, and (v) identifying any remaining unaligned processing units, wherein the aligning comprises identifying given processing units across the at least two documents having a same semantic tag, wherein changes between the at least two documents corresponding to changes not related to a target category are indicated as no change. 10. A computer program product, comprising: a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code executable by a processor and comprising: computer readable program code configured to obtain at least two documents, wherein one of the at least two documents comprises a revision different than another of the at least two documents; computer readable program code configured to identify, within each of the at least two documents, processing units corresponding to contextually-related and positionally-connected groups of textual conceptual units, wherein the identifying comprises assigning each of the processing units to a category type and labelling each of the processing units with the category type; computer readable program code configured to assign at least a subset of the identified processing units to a category type corresponding to a topic of a given portion, wherein the assigning comprises (i) generating a semantic tag for each of the identified processing units in the subset and (ii) tagging each of the identified processing units in the subset with the semantic tag, wherein the semantic tag corresponds to a label of the category type for and identifies a topic of a given of the identified processing units; computer readable program code configured to enrich the at least a subset of the identified processing units with custom attributes, wherein the custom attributes define areas of focu
Related publications grouped by family.
Answers are generated from the same data shown on this page.