Automatic context adaptive enterprise search and result generation on behalf of a user
US-2020167431-A1 · May 28, 2020 · US
US12198459B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12198459-B2 |
| Application number | US-202117534744-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 24, 2021 |
| Priority date | Nov 24, 2021 |
| Publication date | Jan 14, 2025 |
| Grant date | Jan 14, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In implementations of systems for generating indications of relationships between electronic documents, a processing device implements a relationship system to segment text of electronic documents included in a document corpus into segments. The relationship system determines a subset of the electronic documents that includes electronic document pairs having a number of similar segments that is greater than a threshold number. The similar segments are identified using locality sensitive hashing. The electronic document pairs are classified as related documents or unrelated documents using a machine learning model that receives a pair of electronic documents as an input and generates an indication of a classification for the pair of electronic documents as an output. Indications of relationships between particular electronic documents included in the subset are generated based at least partially on the electronic document pairs that are classified as related documents.
Opening claim text (preview).
What is claimed is: 1. In a digital medium environment, a method implemented by a processing device, the method comprising: segmenting, by the processing device, text of electronic documents included in a document corpus into segments; determining, by the processing device, a subset of the electronic documents that includes electronic document pairs having a number of similar segments that is greater than a threshold number, the similar segments identified using locality sensitive hashing; classifying, by the processing device and using a machine learning model, the electronic document pairs as semantically similar documents or not semantically similar documents, the machine learning model being used to receive a pair of electronic documents as an input and generate an indication of a classification for the pair of electronic documents as an output; computing, by the processing device, containment scores for the electronic document pairs based on the number of the similar segments and a length of a shorter electronic document included in each of the electronic document pairs; and generating, by the processing device, indications of relationships between particular electronic documents included in the subset based at least partially on the electronic document pairs that are classified as semantically similar documents and the containment scores. 2. The method as described in claim 1 , wherein the relationships between the particular electronic documents include a version relationship, an aggregation relationship, a repurposed relationship, or a similarity relationship. 3. The method as described in claim 1 , further comprising determining, by the processing device, a maximum spanning tree from a graph that includes a node for each electronic document included in the electronic document pairs that are classified as semantically similar documents, and the indications of the relationships between the particular electronic documents are generated at least partially based on the maximum spanning tree. 4. The method as described in claim 3 , wherein the nodes included in the graph are connected by edges, the edges having weights, and the weights of the edges being based on insertions and deletions in the electronic document pairs that are classified as semantically similar documents. 5. The method as described in claim 1 , wherein the machine learning model is trained to classify the electronic document pairs as semantically similar documents or not semantically similar documents using training data that describes two-dimensional heatmaps generated from pairs of electronic document training samples. 6. The method as described in claim 5 , wherein the two-dimensional heatmaps include first two-dimensional heatmaps for lexical similarity between the segments that are included in the pairs of the electronic document training samples and second two-dimensional heatmaps for Jaccard similarity between entities included in the segments. 7. The method as described in claim 1 , further comprising generating, by the processing device, indications of semantic similarity for electronic documents included in the subset using a hierarchical attention network trained on training data to receive first and second electronic documents as an input and generate an indication of sematic similarity for the first and second electronic documents as an output. 8. The method as described in claim 7 , further comprising clustering, by the processing device, the electronic documents included in the subset into similarity groups based on the indications of semantic similarity, and the indications of the relationships between the particular electronic documents are generated at least partially based on the similarity groups. 9. The method as described in claim 1 , wherein the indications of the relationships between the particular electronic documents include at least one of a change summary, an explanation of similarity, or a relative ordering between the particular electronic documents. 10. One or more computer-readable storage media comprising instructions stored thereon that, responsive to execution by a processing device, causes the processing device to perform operations including: segmenting text of electronic documents included in a document corpus into segments; determining a subset of the electronic documents that includes electronic document pairs having a number of similar segments that is greater than a threshold number, the similar segments identified using locality sensitive hashing; classifying, using a machine learning model, the electronic document pairs as semantically similar documents or not semantically similar documents, the machine learning model being used to receive a pair of electronic documents as an input and generate an indication of a classification for the pair of electronic documents as an output; computing containment scores for the electronic document pairs based on the number of the similar segments and a length of a shortest electronic document included in each of the electronic document pairs; forming a graph having a node for each electronic document included in the electronic document pairs that are classified as semantically similar documents; determining a maximum spanning tree from the graph; and generating indications of relationships between particular electronic documents included in the subset based at least partially on the maximum spanning tree and the containment scores. 11. The one or more computer-readable storage media as described in claim 10 , wherein the relationships between the particular electronic documents include at least one of a version relationship, an aggregation relationship, a repurposed relationship, or a similarity relationship. 12. The one or more computer-readable storage media as described in claim 10 , wherein the operations further include generating indications of semantic similarity for electronic documents included in the subset using a hierarchical attention network trained on training data to receive first and second electronic documents as an input and generate an indication of sematic similarity for the first and second electronic documents as an output. 13. The one or more computer-readable storage media as described in claim 12 , wherein the operations further include clustering the electronic documents included in the subset into similarity groups based on the indications of semantic similarity, and the indications of the relationships between the particular electronic documents are generated at least partially based on the similarity groups. 14. A system comprising: a processing device; and computer-readable storage media storing instructions that are executable by the processing system to perform operations including: segmenting text of electronic documents included in a document corpus into segments; determining a subset of the electronic documents that includes electronic document pairs having a number of similar segments that is greater than a threshold number, the similar segments identified using locality sensitive hashing; classifying, using a machine learning model, the electronic document pairs as semantically similar documents or not semantically similar documents, the machine learning model being used to receive a pair of electronic documents as an input and generate an indication of a classification for the pair of electronic documents as an output; computing containment scores for the electronic document pairs based on the number of the similar segments and a length of a shorter electronic document included in each of the electronic document pairs; and generating indicat
Classification techniques · CPC title
Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title
Document matching, e.g. of document images · CPC title
Syntactic or semantic context, e.g. balancing · CPC title
Classification of content, e.g. text, photographs or tables · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.