Creation of component templates based on semantically similar content

US11610066B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11610066-B2
Application numberUS-202217570157-A
CountryUS
Kind codeB2
Filing dateJan 6, 2022
Priority dateFeb 14, 2020
Publication dateMar 21, 2023
Grant dateMar 21, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods and products for accessing a set of electronic document templates, identifying instances of common document content such as content items which are semantically similar, and generating component templates containing the common content. Semantically similar content may be identified by analyzing content for factors such as expressed sentiment, included keyphrases, recognizable entities, expressed topics, assigning values to content based on these factors, and determining similarity based on comparisons of the assigned values. Component templates may also be generated based on types of content that include identical text or images, content that has a predefined level of similarity rather than being identical, content that has common rules, scripting logic or variables, metadata, etc. The component templates may be generated automatically, or in response to user instructions.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for creation of component templates, comprising: one or more computer processors and one or more computer memories, the one or more computer processors adapted to: access a set of electronic document templates stored in the one or more computer memories identify a plurality of content instances contained in the set of electronic document templates, determine, for each of the plurality of content instances, corresponding values representing one or more semantic factors, compare, for each pair of the plurality of content instances, the corresponding values representing the one or more semantic factors to determine a degree of similarity of the pair of content instances with respect to each of the one or more semantic factors, determine, for each pair of the plurality of content instances, an overall degree of similarity based on the degrees of similarity with respect to each of the semantic factors; and create a component template corresponding to two or more electronic document templates of the set of electronic document templates, wherein the component template contains one or more content instances in the two or more electronic document templates that are determined to be semantically similar, and store the component template in the one or more computer memories. 2. The system of claim 1 , wherein the identification engine is configured to, for each of the content instances, store the values for each of the one or more semantic factors in a corresponding vector. 3. The system of claim 2 , wherein the overall degree of similarity between the content instances is determined by comparing the respective semantic factor values of the vectors corresponding to the content instances, determining the degree of similarity of the content instances with respect to each semantic factor, and summing the degrees of similarity of the content instances for all of the semantic factors to generate an overall similarity score. 4. The system of claim 3 , wherein, for each of the vectors corresponding to the content instances, the value for each semantic factor is multiplied by a weighting factor corresponding to the semantic factor. 5. The system of claim 3 , wherein the overall similarity score between the content instances is compared to a threshold similarity value and the content instances are determined to be similar if the overall similarity score meets or exceeds the threshold similarity value. 6. The system of claim 1 , wherein the one or more computer processors are adapted to, for each electronic document template of the set of electronic document templates: analyze content therein and thereby determine a sentiment associated with the electronic document template, recognize entities identified in the electronic document template, identify keyphrases contained in the electronic document template, and identify topics contained in the electronic document template; and identify semantically similar content instances by comparing the identified sentiments, the recognized entities, the identified keyphrases, and the identified topics in each electronic document template of the set of electronic document templates. 7. The system of claim 6 , wherein identifying the semantically similar content instances comprises computing a semantic distance between two electronic document templates of the set of electronic document templates, wherein the semantic distance is determined based on one or more of: a first similarity value representative of a similarity between sentiments associated with the two electronic document templates of the set of electronic document templates, a second similarity value representative of recognized entities identified in the two electronic document templates of the set of electronic document templates, a third similarity value representative of a similarity between identified keyphrases contained in the two electronic document templates of the set of electronic document templates, and a fourth similarity value representative of a similarity between identified topics contained in the two electronic document templates of the set of electronic document templates. 8. A computer program product comprising a non-transitory computer-readable medium storing instructions executable by one or more processors to perform: accessing a set of electronic document templates; identifying a plurality of content instances contained in the set of electronic document templates; determining, for each of the plurality of content instances, corresponding values representing one or more semantic factors; comparing, for each pair of the plurality of content instances, the corresponding values representing the one or more semantic factors to determine a degree of similarity of the pair of content instances with respect to each of the one or more semantic factors; determining, for each pair of the plurality of content instances, an overall degree of similarity based on the degrees of similarity with respect to each of the semantic factors; creating a component template corresponding to two or more electronic document templates of the set of electronic document templates, wherein the component template contains one or more content instances in the two or more electronic document templates that are determined to be semantically similar, and storing the component template. 9. The computer program product of claim 8 , wherein the instructions are further executable by the one or more processors to analyze, for each electronic document template of the set of electronic document templates, content therein and determine a sentiment associated with the electronic document template, the instructions being further executable by the one or more processors to identify semantically similar content instances by comparing at least the sentiment associated with each electronic document template of the set of electronic document templates. 10. The computer program product of claim 8 , wherein the instructions are further executable by the one or more processors to analyze, for each electronic document template of the set of electronic document templates, content therein and recognize entities identified therein, the instructions being further executable by the one or more processors to identify semantically similar content instances by comparing at least the recognized entities identified in each electronic document template of the set of electronic document templates. 11. The computer program product of claim 8 , wherein the instructions are further executable by the one or more processors to analyze, for each electronic document template of the set of electronic document templates, content therein and identify keyphrases contained therein, the instructions being further executable by the one or more processors to identify semantically similar content instances by comparing at least the identified keyphrases in each electronic document template of the set of electronic document templates. 12. The computer program product of claim 8 , wherein the instructions are further executable by the one or more processors to analyze, for each electronic document template of the set of electronic document templates, content therein and identify topics contained therein, the instructions being further executable by the one or more processors to identify semantically similar content instances by comparing at least the identified topics in each electronic document template of the set of electronic document templates. 13. The computer program product of claim 8 , wherein the instructions are further executable by the one or more processors to, for each elec

Assignees

Inventors

Classifications

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Templates · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11610066B2 cover?
Systems, methods and products for accessing a set of electronic document templates, identifying instances of common document content such as content items which are semantically similar, and generating component templates containing the common content. Semantically similar content may be identified by analyzing content for factors such as expressed sentiment, included keyphrases, recognizable e…
Who is the assignee on this patent?
Open Text Holdings Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 21 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).