Method and system for tree-based text representation and comparison
US-2024320995-A1 · Sep 26, 2024 · US
US12505693B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12505693-B2 |
| Application number | US-202318124392-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 21, 2023 |
| Priority date | Mar 21, 2023 |
| Publication date | Dec 23, 2025 |
| Grant date | Dec 23, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for facilitating electronic textual representation and comparison is disclosed. The method includes receiving, via a graphical user interface, a comparison request that includes a first electronic document and a second electronic document; parsing the first electronic document and the second electronic document to classify textual data; generating, by using the classified textual data, a first tree structure for the first electronic document and a second tree structure for the second electronic document; constructing a first hierarchy dictionary for the first tree structure and a second hierarchy dictionary for the second tree structure; determining differences between the first electronic document and the second electronic document by using the first tree structure, the first hierarchy dictionary, the second tree structure, and the second hierarchy dictionary; and generating graphical representations that depicts the differences and textual representations that summarize the differences.
Opening claim text (preview).
What is claimed is: 1 . A method for facilitating electronic textual representation and comparison, the method being implemented by at least one processor, the method comprising: receiving, by the at least one processor via a graphical user interface, at least one comparison request, each of the at least one comparison request including a first electronic document and a second electronic document, wherein the first electronic document and the second electronic document are digital computer files, and wherein the receiving is initiated by a command obtained by an interaction with at least one from among a graphical icon and an audio indicator of the graphical user interface; parsing, by the at least one processor, the first electronic document and the second electronic document to classify textual data, wherein the classifying of textual data includes: extracting, by the at least one processor, the textual data and at least one corresponding textual characteristic from the first electronic document and from the second electronic document; determining, by the at least one processor using a predefined rule, a respective attribute for each of a plurality of textual components of the textual data based on the corresponding textual characteristic; consecutively merging, by the at least one processor, the plurality of textual components into at least one text block based on a similarity of corresponding respective attributes; assigning, by the at least one processor, a respective integer to each of the at least one text block based on the corresponding attribute; and storing, by the at least one processor, each block of the at least one text block and the corresponding respective integer in a text dictionary, wherein the corresponding respective integer comprises a key for identifying each of the at least one text block; automatically generating, by the at least one processor using the classified textual data, a first tree structure that corresponds to the first electronic document and a second tree structure that corresponds to the second electronic document, wherein the first tree structure and the second tree structure are generated using a same organizational hierarchy to facilitate a comparing of the first electronic document with the second electronic document, and wherein the organizational hierarchy is based on the classifying of textual data operations; constructing, by the at least one processor, a first hierarchy dictionary that corresponds to the first tree structure and a second hierarchy dictionary that corresponds to the second tree structure; determining, by the at least one processor via a breadth first search algorithm, at least one difference between the first electronic document and the second electronic document by using the first tree structure and the second tree structure, wherein the breadth first search algorithm determines the at least one difference by: appending a respective root node from each of the first tree structure and the second tree structure to a corresponding respective queue; transmitting a last node in each respective queue to a current node of a respective corresponding tree structure from among the first tree structure and the second tree structure; extracting at least one first child node from the current node of the first tree structure and extracting at least one second child node from the current node of the second tree structure; storing the at least one first child node and the at least one second child node in a list; calculating a respective similarity score between each respective at least one first child node and each respective at least one second child node, wherein the respective similarity score relates to how closely associated text of each respective at least one first child node matches associated text of each respective at least one second child node; identifying pairs of child nodes, whose similarity score exceeds a predetermined threshold, as matching; removing the matching pairs of child nodes from the list; identifying remaining child nodes from the list that are associated with the first tree structure as being removed; and identifying remaining child nodes from the list that are associated with the second tree structure as being added; and generating, by the at least one processor, at least one graphical representation that depicts the at least one difference and at least one textual representation that summarizes the at least one difference. 2 . The method of claim 1 , wherein the at least one comparison request includes instructions to compare the first electronic document with the second electronic document, the first electronic document and the second electronic document including data in a natural language format. 3 . The method of claim 1 , wherein the first tree structure relates to a first structural hierarchy that represents the first electronic document, the first structural hierarchy including a plurality of first linguistic components; and wherein the second tree structure relates to a second structural hierarchy that represents the second electronic document, the second structural hierarchy including a plurality of second linguistic components. 4 . The method of claim 1 , wherein the at least one difference includes a type of change that corresponds to at least one from among a document structure and a document text, the type of change including at least one from among a modification change, an addition change, a removal change, an addition with all descendants change, a removal with all descendants change, an order swap change, a parent change, and a modify and reorder change. 5 . The method of claim 1 , wherein generating the first tree structure and the second tree structure further comprises: initializing, by the at least one processor, at least one tree node based on a quantity of the at least one text block, wherein the at least one tree node includes the at least one first child node and the at least one second child node; linking, by the at least one processor, each of the at least one tree node to each of the at least one text block; and categorizing, by the at least one processor, each of the at least one tree node based on a predetermined guideline and the corresponding respective integer, the predetermined guideline relating to a parent and child classification. 6 . The method of claim 5 , wherein constructing the first hierarchy dictionary and the second hierarchy dictionary further comprises: assigning, by the at least one processor, a header type to a corresponding attribute from the respective attributes that is linked to at least one uncategorized node; building, by the at least one processor, the text dictionary, wherein the text dictionary includes merging rules for a plurality of different header types, the text dictionary relating to the first hierarchy dictionary and the second hierarchy dictionary; and iteratively cataloging, by the at least one processor using the header type, each of the at least one uncategorized node based on the text dictionary. 7 . The method of claim 6 , further comprising: identifying, by the at least one processor using the breadth first search algorithm, at least one partially matching pair of nodes from the first tree structure and the second tree structure based on the similarity score and the predetermined threshold; tagging, by the at least one processor, the at least one partially matching pair; and appending, by the at least one processor, the at least one partially matching pair together with the tag in the corresponding respective queue. 8 . A computing device configured to implement an execution of a method for facilitating electronic textual representation and comparison, the
Hierarchical processing, e.g. outlines · CPC title
Tree-structured documents (parsing G06F40/205; validation G06F40/226) · CPC title
Parsing · CPC title
Calculation of difference between files · CPC title
Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.