Identification of changes between document versions

US11630869B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11630869-B2
Application numberUS-202016806438-A
CountryUS
Kind codeB2
Filing dateMar 2, 2020
Priority dateMar 2, 2020
Publication dateApr 18, 2023
Grant dateApr 18, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides a method, including: obtaining at least two documents, wherein one of the at least two documents comprises a revision different than another of the at least two documents; identifying, within each of the at least two documents, portions corresponding to groups of text containing a conceptual unit; assigning at least a subset of the identified portions to a category type corresponding to a topic of a given portion, wherein the assigning comprises (i) generating a semantic tag for the identified portions in the subset and (ii) tagging the identified portions in the subset with the semantic tag; and determining changes between the at least two documents, wherein the determining comprises (iii) aligning given portions across the at least two documents based upon a relationship between the given portions across the at least two documents, (iv) identifying semantic differences between the aligned portions, and (v) identifying any remaining unaligned portions.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: obtaining at least two documents, wherein one of the at least two documents comprises a revision different than another of the at least two documents; identifying, within each of the at least two documents, processing units corresponding to contextually-related and positionally-connected groups of textual conceptual units, wherein the identifying comprises assigning each of the processing units to a category type and labelling each of the processing units with the category type; assigning at least a subset of the identified processing units to a category type corresponding to a topic of a given portion, wherein the assigning comprises (i) generating a semantic tag for each of the identified processing units in the subset and (ii) tagging each of the identified processing units in the subset with the semantic tag, wherein the semantic tag corresponds to a label of the category type for and identifies a topic of a given of the identified processing units; enriching the at least a subset of the identified processing units with custom attributes, wherein the custom attributes define areas of focus of change that correspond to processing units having changes that are to be identified as differences, wherein the custom attributes are defined in a dictionary form; and determining changes between the at least two documents, wherein the determining comprises (iii) aligning, utilizing rules learned using a classifier, given processing units across the at least two documents based upon a relationship between the given processing units across the at least two documents, (iv) for given processing units across the at least two documents having a custom attribute, identifying a change as a change and for given processing units across the at least two documents not having an associated custom attribute, identifying semantic differences between the aligned processing units, and (v) identifying any remaining unaligned processing units, wherein the aligning comprises identifying given processing units across the at least two documents having a same semantic tag, wherein changes between the at least two documents corresponding to changes not related to a target category are indicated as no change. 2. The method of claim 1 , comprising receiving, from a user, a query requesting identification of a change between the at least two documents related to a particular category type of interest. 3. The method of claim 2 , wherein the identifying is performed responsive to receiving the user query. 4. The method of claim 2 , wherein the generating a semantic tag is based upon terms included in the received query. 5. The method of claim 2 , comprising providing, responsive to the determining a change, a natural language identification of a change corresponding to the user query. 6. The method of claim 1 , comprising learning alignment rules by generating a decision tree classifier that is trained utilizing supervised data comprising a training set of (i) portions and (ii) a change status of the processing units; and wherein the defined rules are used in aligning the processing units across the at least two documents. 7. The method of claim 1 , comprising providing an explanation of the determined changes, the explanation identifying a rule used to determine a change. 8. The method of claim 1 , wherein the unaligned processing units are identified as at least one of: added processing units and removed processing units; and wherein the aligned processing units having semantic differences are identified as differences. 9. An apparatus, comprising: at least one processor; and a computer readable storage medium having computer readable program code embodied therewith and executable by the at least one processor, the computer readable program code comprising: computer readable program code configured to obtain at least two documents, wherein one of the at least two documents comprises a revision different than another of the at least two documents; computer readable program code configured to identify, within each of the at least two documents, processing units corresponding to contextually-related and positionally-connected groups of textual conceptual units, wherein the identifying comprises assigning each of the processing units to a category type and labelling each of the processing units with the category type; computer readable program code configured to assign at least a subset of the identified processing units to a category type corresponding to a topic of a given portion, wherein the assigning comprises (i) generating a semantic tag for each of the identified processing units in the subset and (ii) tagging each of the identified processing units in the subset with the semantic tag, wherein the semantic tag corresponds to a label of the category type for and identifies a topic of a given of the identified processing units; computer readable program code configured to enrich the at least a subset of the identified processing units with custom attributes, wherein the custom attributes define areas of focus of change that correspond to processing units having changes that are to be identified as differences, wherein the custom attributes are defined in a dictionary form; and computer readable program code configured to determine changes between the at least two documents, wherein the determining comprises (iii) aligning, utilizing rules learned using a classifier, given processing units across the at least two documents based upon a relationship between the given processing units across the at least two documents, (iv) for given processing units across the at least two documents having a custom attribute, identifying a change as a change and for given processing units across the at least two documents not having an associated custom attribute, identifying semantic differences between the aligned processing units, and (v) identifying any remaining unaligned processing units, wherein the aligning comprises identifying given processing units across the at least two documents having a same semantic tag, wherein changes between the at least two documents corresponding to changes not related to a target category are indicated as no change. 10. A computer program product, comprising: a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code executable by a processor and comprising: computer readable program code configured to obtain at least two documents, wherein one of the at least two documents comprises a revision different than another of the at least two documents; computer readable program code configured to identify, within each of the at least two documents, processing units corresponding to contextually-related and positionally-connected groups of textual conceptual units, wherein the identifying comprises assigning each of the processing units to a category type and labelling each of the processing units with the category type; computer readable program code configured to assign at least a subset of the identified processing units to a category type corresponding to a topic of a given portion, wherein the assigning comprises (i) generating a semantic tag for each of the identified processing units in the subset and (ii) tagging each of the identified processing units in the subset with the semantic tag, wherein the semantic tag corresponds to a label of the category type for and identifies a topic of a given of the identified processing units; computer readable program code configured to enrich the at least a subset of the identified processing units with custom attributes, wherein the custom attributes define areas of focu

Assignees

Inventors

Classifications

  • Recognition of textual entities · CPC title

  • G06F16/93Primary

    Document management systems · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Sequence data queries, e.g. querying versioned data · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11630869B2 cover?
One embodiment provides a method, including: obtaining at least two documents, wherein one of the at least two documents comprises a revision different than another of the at least two documents; identifying, within each of the at least two documents, portions corresponding to groups of text containing a conceptual unit; assigning at least a subset of the identified portions to a category type …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/93. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 18 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).